`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled REAL-TIME VIDEO COMPRESSION Techniques and Algorithms, call
`number QA76.575.W47 1997, Copy 2, The attached — Cover Page, Title Page, Copyright Page,
`Table of Contents Pages, Preface page, Chapters 1,2,3 and Chapter 7 - are a true and complete
`representation from that work.
`
`THIS IS TO CERTIFY FURTHER, that work is marked with a Library of Congress
`Copyright Office stamp dated November 25, 1996.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`1
`
`60:14e.40
`Deirdre Scott
`Business Enterprises Officer
`Office of Business Enterprises
`Library of Congress
`
`101 Independence Avenue, SE Washington, DC 20540-4917 Tel 202.707.5650 www.loc.gov; duplicationservices@loc.gov
`
`Page 1
`
`NETFLIX, INC
`Exhibit 1010
`IPR2018-01630
`
`
`
`QA 76
`.575
`.W47
`1997
`Copy 2
`
`FT MEADE
`GenCol 1
`
`REAL-TIME
`VIDEO
`OMPRESSION
`Techniques and Algorithms
`
`Raymond Westwater
`Borko Furht
`
`Forward
`3D-OCT
`
`QuarMaer
`
`Compressed
`video sequence
`
`Video
`cube
`
`Quantization
`tables
`
`Huffman!
`table
`
`Decompressed
`video cube
`
`flog uant
`
`Page 2
`
`
`
`REAL-TIME VIDEO COMPRESSION
`Techniques and Algorithms
`
`by
`
`Raymond Westwater
`Borko Furht
`Florida Atlantic University
`
`KLUWER ACADEMIC PUBLISHERS
`Boston /Dordrecht / London
`
`Page 3
`
`
`
`Distributors for North America:
`Kluwer Academic Publishers
`101 Philip Drive
`Assinippi Park
`Norwell, Massachusetts 02061 USA
`
`Distributors for all other countries:
`Kluwer Academic Publishers Group
`Distribution Centre
`Post Office Box 322
`3300 All Dordrecht, THE NETI IERLANDS
`
`(t\,,
`
`N()I2 519965
`
`N4'
`
`Library of Congress Cataloging-in-Publication Data
`
`A C.I.P. Catalogue record for this book is available
`from the Library of Congress.
`
`Copyright © 1997 by Kluwer Academic Publishers
`
`All rights reserved. No part of this publication may be reproduced, stored in a
`retrieval system or transmitted in any form or by any means, mechanical, photo-
`copying, recording, or otherwise, without the prior written permission of the
`publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell,
`Massachusetts 02061
`
`Printed on acid-free paper.
`
`Printed in the United States of America
`
`Page 4
`
`
`
`CONTENTS
`
`PREFACE
`
`1. THE PROBLEM OF VIDEO COMPRESSION
`1.1 Overview of Video Compression Techniques
`1.2 Applications of Compressed Video
`1.3 Image and Video Formats
`1.4 Overview of the Book
`
`2. THE MPEG VIDEO COMPRESSION STANDARD
`2. 1 MPEG Encoder and Decoder
`2.2 MPEG Data Stream
`
`3. THE H.261/H.263 COMPRESSION STANDARD FOR
`VIDEO TELECOMMUNICATIONS
`31 Picture Formats for H.261/H.263 Video Codecs
`3.211.261/H.263 Video Encoder
`3.3 H.261/H.263 Video Decoder
`
`4. THE XYZ VIDEO COMPRESSION ALGORITHM
`4.1 XYZ Compression Algorithm
`4.2 XYZ Decompression Algorithm
`
`5. THE DISCRETE COSINE TRANSFORM
`5.1 Behavior of the DCT
`5.2 Fast One-Dimensional DCT Algorithms
`5.3 Two-Dimensional DCT Algorithms
`5.4 Inverse DCT Algorithms
`
`vii
`
`1
`
`6
`8
`12
`
`15
`15
`18
`
`23
`24
`25
`28
`
`29
`29
`32
`
`37
`37
`40
`47
`50
`
`Page 5
`
`
`
`vi
`
`Techniques for Real-Time Compression
`
`5.5 Three-Dimensional DCT Algoriltuns
`
`6. QUANTIZATION
`6.1 Defining an Invariant Measure of Error
`6.2 Calculation of Transform Variances
`6.3 Generating Quantizer Factors
`6.4 Adding Human Visual Factors
`
`7. ENTROPY CODING
`7.1 Huffman Coding
`7.2 Use of Entropy Coding in JPEG and MPEG
`7.3 Adaptive Huffman Coding
`
`51
`
`57
`58
`62
`65
`67
`
`73
`73
`76
`78
`
`8. VLSI ARCHITECTURES OF THE XYZ VIDEO CODEC
`8.1 Complexity of the Video Compression Algorithms
`8.2 From Algorithms to VLSI Architectures
`8.3 Classification of Video Codec VLSI Architectures
`8.4 Implementation of the XYZ Video Compression Algorithm
`8.5 Adaptive XYZ Codec Using Mesh Architecture
`8.6 XYZ Codec Based on Fast 3D DCT Coprocessor
`
`83
`83
`86
`87
`90
`103
`111
`
`9. EXPERIMENTAL RESULTS USING
`XYZ COMPRESSION
`9.1 PC Implementation
`9.2 MasPar Implementation
`9.3 Non-Adaptive XYZ Compression
`
`10. CONCLUSION
`
`BIBLIOGRAPHY
`
`INDEX
`
`123
`124
`138
`144
`
`151
`
`155
`
`163
`
`Page 6
`
`
`
`vii
`
`PREFACE
`
`This book is on real-time video compression. Specifically, the book introduces the
`XYZ video compression technique, that operates in three dimensions, eliminating
`the overhead of motion estimation. First, video compression standards, MPEG and
`H.261/H.263, are described. They both use asymmetric compression algorithms,
`based on motion estimation. Their encoders are much more complex than decoders.
`The XYZ technique uses a symmetric algorithm, based on the Three-Dimensional
`Discrete Cosine Transform (3D-DCT). 3D-DCT was originally suggested for
`compression about twenty years ago, however at that time the computational
`complexity of the algorithm was to high, it required large buffer memory, and was
`not as effective as motion estimation. We have resurrected the 3D-DCT based video
`to the original
`compression algorithm by developing several enhancements
`algorithm. These enhancements made the algorithm feasible for real-time video
`compression in applications such as video-on-demand, interactive multimedia, and
`videoconfewricing. The demonstrated results, presented in the book, suggest that
`the XYZ video compression technique is not only a fast algorithm, but also provides
`superior compression ratios and high quality of the video compared to existing
`standard techniques, such as MPEG and H.261/H.263. The elegance of the XYZ
`technique is in its simplicity, which leads to inexpensive VLSI implementation of a
`XYZ codec.
`
`We would like to thank Jim Prince for conducting experiments in developing
`visually weighted quantizers for the XYZ algorithm, as well as a number of students
`from Florida Atlantic University, who participated in these experiments. We also
`want to thank Drs. Roy Levow, K. Genesan, and Matthew Evett, professors from
`Florida Atlantic University, Dr. Steve Rosenbaum from Cylex Systems, and Joshua
`Greenberg for constructive discussions during this project.
`
`Raymond Westwater and Borko Furht
`Boca Raton, July 1996.
`
`Page 7
`
`
`
`1
`
`THE PROBLEM OF
`VIDEO COMPRESSION
`
`The problem of real-time video compression is a difficult and important one, and
`has inspired a great deal of research activity, This body of knowledge has been, to a
`substantial degree, embodied into the MPEG and H.261/I4263 motion video
`standards. However, some important questions remain unexplored. This book
`describes one possible alternative to these standards that has superior compression
`full
`its
`for
`less computational power
`characteristics while requiring far
`implementation.
`
`Since about 1989, moving digital video images have been integrated with programs.
`The difficulty in implementing moving digital video is the tremendous bandwidth
`required for the encoding of video data, For example, a quarter screen image (320 x
`240 pixels) playing on an RGB video screen at full speed of 30 frames per second
`(fps) requires storage and transmission of 6.9 million bytes per second. This data
`rate is simply prohibitive, and so means of compressing digital video suitable for
`real-time playback arc a necessary step for the widespread introduction of digital
`motion video applications.
`Many digital video compression algorithms have been developed and implemented.
`The compression ratios of these algorithms varies according to the subjective
`acceptable level of error, the definition of the word compression, and who is making
`the claim. Table 1.1 summarizes video compression algorithms, their typical
`compression ratios reported in the literature, and their characteristics.
`
`1
`
`Page 8
`
`
`
`2
`
`Chapter 1
`
`Compression
`Algorithm
`
`Intel RTV/Indeo
`
`Typical
`Compression
`Ratio
`3:1
`
`Intel PLV
`
`IBM Photomotion
`
`Motion JPEG
`
`Fractals
`
`Wavelets
`
`12:1
`
`3:1
`
`10:1
`
`10:1
`
`20:1
`
`H.261/H263
`
`50:1
`
`MPEG
`
`30:1
`
`Characteristics
`
`_
`
`A 128X240 data stream is interpolated to
`256X240. Color is subsampled 4:1. A
`simple 16 bit codebook is used without
`error correction. Frame differencing is
`used.
`A native 256X240 stream is encoded using
`vector quantization and motion
`compensation. Compression requires
`specialized equipment.
`An optimal 8-bit color palette is determined,
`and run-length encoding and frame
`differencing are used.
`Uses 2-D DCT to encode individual
`frames. Gives good real-time results with
`inexpensive but special-purpose
`equipment. This technique supports
`random-access since no frame
`differencing is used.
`Fractals compress natural scenes well, but
`require tremendous computing power.
`2-D and 3-D wavelets have been used in
`the compression of motion video. Wavelet
`compression is low enough in complexity to
`compress entire images, and therefore
`does not suffer from the boundary artifacts
`seen in DCT-based techniques.
`Real-time compression and decompression
`algorithm for video telecommunications. It
`is based on
`2-D DCT with simple motion estimation
`between frames.
`Uses 2-D DCT with motion estimation and
`interpolation between frames. The MPEG
`standard is difficult and expensive to
`compress, but plays back in real-time with
`inexpensive equipment.
`
`Table 1.1 Overview of video compression algorithms.
`
`An ideal video compression technique should have the following characteristics:
`
`• Will produce levels of compression rivaling MPEG without objectionable
`artifacts.
`• Can be played back in real time with inexpensive hardware support.
`
`Page 9
`
`
`
`The Problem of Video Compression
`
`3
`
`• Can degrade easily under network overload or on a slow platform.
`• Can be compressed in real time with inexpensive hardware support.
`
`1.1 OVERVIEW OF VIDEO COMPRESSION TECHNIQUES
`
`The JPEG still picture compression standard has been extremely successful, having
`been implemented on virtually all platforms. This standard is fairly simple to
`implement, is not computationally complex, and gels 10:1 to 15:1 compression
`ratios without significant visual artifacts. This standard is based upon entropy
`encoding of quantized coefficients of the discrete cosine transformation of 8x8
`blocks of pixel data,
`
`the JPEG compression and
`the block diagram of both
`Figure 1.1 shows
`decompression algorithms. A single frame is subdivided into 8x8 blocks, each of
`which is independently processed. Each block is transformed into DCT space,
`resulting in an 8x8 block of DCT coefficients. These coefficients are then quantized
`by integer division by constants. The quantizing constant for each DCT coefficient
`is chosen to produce minimal visual artifacts, while maximally reducing the
`representational entropy of the coefficients. The quantized coefficients are then
`entropy coded into a compressed data stream. The reduced entropy of the quantized
`coefficients is reflected in the higher compression ratio of the data.
`The Motion JPEG (M-JPEG) uses the JPEG compression for each frame. It provides
`random access to individual frames. however the compression ratios are too low
`(same as in JPEG), because the technique does not take advantage of the similarities
`between adjacent frames.
`The MPEG moving compression standard is an attempt to extend DCT-based
`compression into moving pictures. MPEG encodes frames by estimating the motion
`difference between the frames, and encoding the differences into roughly JPEG
`format. Unfortunately, motion estimation is computationally complex, requires
`to encode, and adds considerable complexity to the
`specialized equipment
`algorithm. Figure 1.2 illustrates the MPEG compression algorithm for predictive
`frames.
`
`Page 10
`
`
`
`4
`
`Chapter 1
`
`Source
`brings data
`IMM1111.11•
`
`■■■■■■■■
`
`■■■■■■■■
`
`Ste niacin
`
`Compressed
`Image Dale
`
`JPEG Encoder
`
`Forward
`Discrete
`Coups
`
`guantrzer
`
`Compressed
`Image Data
`
`JPEG Decoder
`
`Reconstructed
`Image Dela
`
`Entropy
`Decoder
`
`Dequentizer
`
`'worse
`OCT
`
`Tette
`Sped A cation
`
`Tette
`Specie cabon
`
`Figure 1.1 JPEG compression and decompression algorithms.
`One of the most promising new technologies is wavelet-based compression [1,1K95].
`Figure 1.3 illustrates a simple wavelet transform: subband decomposition. The
`image as a whole is subdivided into frequency subbands, which are then
`individually quantized. One of the most attractive features of this system is that it is
`applied to the image as a whole, thereby avoiding the edge artifacts associated with
`the block-based DCT compression schemes.
`
`The wavelet transform can be applied to the time dimension as well. Experience has
`shown that this decomposition does not give as good compression results as motion
`compensation, As there are no other compression algorithms capable of such high
`compression ratios, MPEG is considered the existing "state-of-the-art".
`The XYZ algorithm is a natural extension of the research that has been done in
`video compression. Much work has been done in the development of transform-
`based motion video compression algorithms, and in the development of quantizing
`factors based on the sensitivity of the human eye to various artifacts of compression.
`
`Page 11
`
`
`
`The Problem of Video Compression
`
`5
`
`Compressed
`Video sequence
`
`Entropy
`Encoder
`
`Reference
`Fran*
`
`Figure 1.2 MPEG compression algorithm for predictive frames.
`MPEG adds motion estimation to the JPEG model.
`
`Figure 1.3 Octave-band or wavelet decomposition of a still image into unequal subbands.
`
`XYZ compression is an alternative extension of DCT encoding to moving pictures.
`Sequences of eight frames are collected into a three-dimensional block to which a
`three-dimensional DCT will be applied. The transformed data is then quantized.
`
`Page 12
`
`
`
`6
`
`Chapter I
`
`These quantizing constants are demonstrated to cause artifacts which are minimally
`visible. The resulting data stream is then entropy coded. This process strongly
`resembles the JPEG encoding process, as illustrated in Figure 1.4.
`
`Quantization Ar.
`
`Entropy
`Encoder
`
` Compressed Video
`Sequence
`"—OP'
`
`Table
`Specification
`
`8 Quantization
`Tablas
`
`Figure 1.4 XYZ compression algorithm.
`
`This algorithm is built upon a considerable body of published work. The three-
`dimensional DCT has been used to encode errors after motion estimation has been
`performed [RP77], and true three-dimensional DCT-based compression algorithms
`have been developed where the quantizers were based upon minimization of
`introduced mean square error [NA77]. -These algorithms have fallen into disfavor
`because they were considered to require excessive computation, required too much
`buffer memory, and were not as effective as motion estimation. This book refutes
`these arguments.
`
`Work in visibility of artifacts produced by quantization has also been done [CR90].
`Visibility of two-dimensional quantization artifacts has been thoroughly explored
`for the DCT transforms space. The XYZ algorithm extends this work to
`quantization of three-dimensional DCT coefficients.
`
`1.2 APPLICATIONS OF COMPRESSED VIDEO
`
`Video compression techniques made feasible a number of applications. Four distinct
`applications of the compressed video can be summarized as: (a) consumer broadcast
`television, (b) consumer playback, (c) desktop video, and (d) videoconferencing.
`
`Page 13
`
`
`
`The Problem of Video Compression
`
`7
`
`Consumer broadcast television, which includes digital video delivery to homes,
`typically requires a small number of high-quality compressors and a large number
`of low-cost decompressors. Expected compression ratio is about 50:1.
`
`Consumer playback applications, such as CD-ROM libraries and interactive games,
`also require a small number of compressors and a large number of low-cost
`decompressors. The required compression ratio is about 100:1.
`
`includes systems for authoring and editing video
`Desktop video, which
`presentations, is a symmetrical application requiring the same number of encoders
`and decoders. The expected compression ratio is in the range from 5:11 to 50:1.
`
`Videconferencing applications also require the same number of encoders and
`decoders, and the expected compression ratio is about 100:1.
`
`Application
`
`Bandwidth
`
`Standard
`
`5-10 Kbps
`
`26-64 Kbps
`
`none
`
`H.263
`
`64-128 Kbps
`
`H.261
`
`>= 384 Kbps
`
`H.261
`
`1-2 Mbps
`
`3-10 Mbps
`
`MPEG-1
`
`MPEG-2
`
`Analog
`Videophone
`Low Bitrate
`Video
`Conferencing
`Basic Video
`Telephony
`
`Video
`Conferencing
`Interactive
`Multimedia
`Digital TV -
`NTSC
`High Dennititm
`Television
`
`Size
`
`170x128
`
`128x96
`176x144
`176x144
`352x288
`352x288
`
`352x240
`
`720x480
`
`Frame Rate
`[frames/sec]
`2-5
`
`15-30
`
`10-20
`
`15-30
`
`15-30
`
`30
`
`30-60
`
`15-80 Mbps
`
`MPEG-2
`
`1200x800
`
`Table 1.2 Applications of the compressed video and current video compression standards.
`
`Table 1.2 summarizes applications of the compressed video, by specifying current
`standards used in various applications, the required bandwidth, and typical frame
`sizes and frame rates.
`
`Page 14
`
`
`
`8
`
`Chapter I
`
`1.3
`
`IMAGE AND VIDEO FORMATS
`
`A digital image represents a two-dimensional array of samples, where each sample
`intensity can be
`is called a pixel. Precision determines how many levels of
`represented, and is expressed as the number of bits/sample. According to precision,
`images can be classified into: (a) binary images, represented by 1 bit/sample, (b)
`computer graphics, represented by 4 bits/sample, (c) grayscale images, represented
`by 8 bits/sample, and color images, represented with 16, 24 or more bits/sample.
`
`According to the trichmmatic theory, the sensation of color is produced by
`selectively exciting three classes of receptors in the eye. In a RGB color
`representation system, shown in Figure 1.5, a color is produced by adding three
`primary colors: red, green, and blue (ROB). The straight line, where R=G=B,
`specifies the gray values ranging from black to white.
`
`R
`
`Orsyscele vale*
`R-G-B
`
`Figure 1.5 The RGB representation of color images.
`Another representation of color images, YUV representation, describes luminance
`and chrominance components of an image. The luminance component provides a
`grayscale version of the image, while two chrominance components give additional
`information that converts the grayscale image to a color image. The YUV
`representation is more natural for image and video compression. The exact
`transformation from ROB to YUV representation, specified by the CCIR 601
`standard, is given by the following equations:
`
`Page 15
`
`
`
`The Problem of Video Compression
`
`9
`
`Y = 0.299R + 0.587G+ 0.114B
`
`U = 0.564(B — Y)
`
`V = 0.713(B— Y)
`
`(1.1)
`
`(1.2)
`
`(1.3)
`
`where Y is the luminance component, and U and V are two chrominance
`components.
`
`An approximate RGB to YUV transformation is given as:
`
`Y = 0.3R + 0.6G + 0.1B
`
`U = B — Y
`
`V = R —Y
`
`(1.4)
`
`(1.5)
`
`(1.6)
`
`This transformation has a nice feature that, when R+G+B, then Y=R=G=B, and
`U=V=O. In this case, the image is a grayscale image.
`
`Color conversion from RGB to YUV requires seVeral"multiplications, which can be-
`computationally expensive. An approximation, proposed in [W+941, can be
`calculated by performing bit shifts and adds instead multiplication operations. This
`approximation is defines as:
`
`R G
`+ —
`Y -= —
`2
`4
`
`B
`+ —
`2
`
`B —Y
`2
`
`R
`
`—Y
`2
`
`V -=
`
`(1.7)
`
`(1.8)
`
`(1.9)
`
`Page 16
`
`
`
`10
`
`Chapter 1
`
`This approximation also gives a simplified YUV to RGB transformation, expressed
`by:
`
`R=Y+2V
`
`G=Y—(U+V)
`
`B=Y+2U
`
`(1.10)
`
`(1.11)
`
`(1.12)
`
`Another color format, referred to as YCbCr format, is intensively used for image
`compression. In YCbCr format, Y is the same as in a YUV system, however U and
`V components are scaled and zero-shifted to produce Cb and Cr, respectively, as
`follows:
`
`Cb =
`
` + 0.5
`2
`
`V
`Cr=— + 0.5
`1.6
`
`(1.13)
`
`(1.14)
`
`In this way, chrorninance components Ch and Cr are always in the range [0,1].
`
`Computer Video Formats
`Resolutions of an image system refers to its capability to reproduce fine detail.
`Higher resolution requires more complex imaging systems to represent these images
`in real time. In computer systems, resolution is characterized with number of pixels.
`Table 1.3 summarizes popular computer video formats, and related storage
`requirements.
`
`Television Formats
`In television systems, resolution refers to the number of line pairs resolved on the
`face of the display screen, expressed in cycles per picture height, or cycles per
`picture width. For example, the NTSC broadcast system' in North America and
`Japan, denoted as 525/59.94, bas about 483 picture lines.
`The HDTV system will approximately double the number of lines of current
`broadcast television at approximately the same field rate. For example, a 1050x960
`HDTV system will have 960 total lines. Spatial and temporal characteristics of
`conventional television systems (such as NTSC, SECAM, and PAL), and high-
`
`Page 17
`
`
`
`The Problem of Video Compression
`
`11
`
`definition TV systems (HDTV) are presented in Tables 1.4 and 1.5, respectively
`[BF91].
`
`Computer Video
`Format
`
`Resolution
`(pixels)
`
`Colors (bits)
`
`CGA - Color
`Graphics Adapter
`
`EGA - Enhanced
`Graphics Adapter
`
`VGA- Video
`Graphics Adapter
`
`88514/A Display
`Adapter Mode
`
`XGA - Extended
`Graphics Array (a)
`XGA - Extended
`Graphics Array (b)
`
`SVGA -Super VGA
`
`320x200
`
`4 (2 bits)
`
`640x350
`
`16 (4 bits)
`
`640x480
`
`256 (8 bits)
`
`1024x768
`
`256 (8 bits)
`
`640x480
`
`65,000 (24 bits)
`
`1024x768
`
`256 (8 bits)
`
`1024x768
`
`65,000 (24 bits)
`
`Storage
`Capacity
`Per Image
`128,000 bits =
`16 KB
`896,000 bits=
`112 KB
`2,457,600 bits=
`307,2 KB
`6,291,456 bits=
`786.432 KB
`6,291,456 bits=
`786.432 KB
`6,291,456 bits=
`786.432 KB
`2.36 MB
`
`Table 1.3 Characteristics of various computer video formats,
`
`System
`
`Total
`Lines
`
`Active
`Lines
`
`Vertical
`Resolution
`
`HDTV
`USA
`HDTV
`Europe
`NTSC
`
`1050
`
`960
`
`675
`
`1250
`
`1000
`
`700
`
`525
`
`484
`
`242
`
`PAL
`
`625
`
`575
`
`290
`
`SECAM
`
`625
`
`575
`
`290
`
`Optimal
`Viewing
`Distance
`[in]
`2.5
`
`2.4
`
`7.0
`
`6.0
`
`6.0
`
`Aspect
`Ratio
`
`Horizontal
`Resolution
`
`Total
`Picture
`Elements
`
`16/9
`
`600
`
`720,000
`
`16/9
`
`700
`
`870,000
`
`4/3
`
`4/3
`
`4/3
`
`330
`
`425
`
`465
`
`106,000
`
`165,000
`
`180,000
`
`Table 1.4 Spatial characteristics of television systems [BF91].
`
`Page 18
`
`
`
`12
`
`Chapier I
`
`System
`
`HDTV
`USA
`HDTV
`Europe
`NTSC
`
`PAL
`
`SECAM
`
`Total
`Channel
`Width
`[MHz'
`
`Video
`Basehand
`Y [MHz]
`
`Video
`Bascband
`R-Y
`[MHz]
`
`Video
`noseband
`B•Y
`[MHz]
`
`Sunning
`Rate
`Camera
`[Hz]
`
`9.0
`
`12.0
`
`6.0
`
`8.0
`
`8.0
`
`10.0
`
`14.0
`
`4.2
`
`5.5
`
`6.0
`
`5.0
`
`7.0
`
`1.0
`
`1.8
`
`2.0
`
`5.0
`
`7.0
`
`0.6
`
`1.8
`
`2.0
`
`59.94
`
`50
`
`59.94
`
`50
`
`50
`
`Scanning
`Rate
`HDTV
`Display
`1Hz]
`59.94
`
`100
`
`NA
`
`NA
`
`NA
`
`Scanning
`Rate
`Convent,
`Display
`libl
`59.94
`
`50
`
`.
`
`59.94
`
`50
`
`50
`
`Table 1.5 Temporal characteristics of television systems [BF91].
`
`1.4 OVERVIEW OF THE BOOK
`
`This book is divided into ten chapters:
`
`I. Video compression. This current chapter
`introduces
`the problem of
`compressing motion video, illustrates the motivation for the 3-D solution chosen in
`the book, and briefly describes the proposed solution. Image and video formats are
`introduced as well.
`2. MPEG. This chapter describes the MPEG compression standard. Important
`contributions in the field and related work are emphasized.
`3. H.261/H.263. This chapter describes the compression standard for video
`telecommunications.
`4. XYZ compression. The XYZ video compression algorithm is described in
`detail in this chapter. Both encoder and decoder are presented, as well as an
`example of compressing 8x8x8 video block.
`5. 3-1) DCT. The theory of the Discrete Cosine Transform is developed and
`extended to three dimensions. A fast 3-D algorithm is developed.
`6. Quantization. Discussion is presented on the issues of determining optimal
`quantizers using various error criteria. A model of Human Visual System is used to
`develop factors that weigh the DCT coefficients according to their relative visibility.
`7. Entropy coding. A method for encoding the quantized coefficients is
`developed based on the stochastic behavior of the pixel data
`
`Page 19
`
`
`
`The Problem of Video Compression
`
`13
`
`8. VLSI architectures
`for XYZ codec.
`Issues concerning
`real-time
`implementation of the XYZ compression algorithm are analyzed including the
`complexity of the algorithm and mapping the algorithm into various VLSI
`architectures.
`9. Results. Obtained results of an implementation of the XYZ compression
`algorithm arc presented.
`10. Conclusion. Summary of contributions are outlined, emphasizing the real-time
`features of the compression algorithm, visual quality, and compression ratio.
`Directions for future research are given as well.
`
`Page 20
`
`
`
`2
`
`THE MPEG VIDEO
`COMPRESSION STANDARD
`
`The Motion Picture Experts' Group was assembled by the International Standards
`Organization (ISO) to establish standards for the compression, encoding, and
`decompression of motion video. MPEG-1 lIS92b] is a standard supporting
`compression of image resolutions of approximately 352x288 at 30 fps into a data
`stream of 1.5 Mbps. This data rate is suitable for pressing onto CD-ROM. The
`MPEG-2 standard [IS93b] supports compression of broadcast television (704x576 at
`30 fps) and HDTV (1920x1152 at 60 fps) of up to 60 Mpixels/sec (appx. 700 Mb) at
`compression ratios of roughly three times those expected of moving IPEG [IS92a]
`(playback rates of up to 80 Mbps).
`
`The MPEG standard specifies the functional organization of a decoder. The data
`stream is cached in a buffer to reduce the effect of jitter in delivery and decode, and
`is &multiplexed into a video stream, an audio stream, and additional user-defined
`streams. The video stream is decoded into a "video sequence" composed of the
`sequence header and groups of pictures.
`
`2.1 MPEG ENCODER AND DECODER
`
`The specification of the MPEG encoder defines many compression options. While
`all of these options must be supported by the decoder, the selection of which options
`to support in compression is left to the discretion of the implementer. An MPEG
`encoder may choose compression options balancing the need for high compression
`15
`
`Page 21
`
`
`
`16
`
`Chanter 2
`
`ratios against the complexity of motion compensation or adaptive quantization
`calculations. Decisions will be affected by such factors as:
`
`• A need for real-time compression. MPEG algorithms are complex, and there
`may not he sufficient time to implement exotic options on a particular platform.
`• A need for high compression ratios. For highest possible compression ratios at
`highest possible quality, every available option must he exercised.
`• A need for insensitivity to transmission error. MPEG-2 supports recovery from
`transmission errors. Some error recovery mechanisms are implemented by the
`encoder,
`• Fast algorithms. Development of fast algorithms may make compressio❑
`options available that would otherwise be impractical.
`Availability of specialized hardware. Dedicated hardware may increase the
`performance of the encoder to the point that additional compression options can
`he considered.
`
`In the MPEG standard, frames in a sequence are coded using three different
`algorithms, as illustrated in Figure 2.1.
`
`Forward predlonon F.4 7)
`
`1
`3
`
`B
`
`4
`
`S
`
`6
`
`B
`
`P
`
`B
`
`7
`
`a
`
`9
`
`B
`
`B
`
`Bidirectional prediction
`Baal.)
`
`Bldlrecflonal prediction
`8=1(I,P)
`
`Figure 2.1 Types of frames in the MPEG standard.
`
`Page 22
`
`
`
`The MPEG Video Compression Standard
`
`17
`
`I frames (infra frames) are self-contained and coded using a DCT-based technique
`similar to JPEG. I frames are used as random access points in MPEG streams, and
`they give the lowest compression ratios within MPEG.
`
`P frames (predicted frames) are coded using forward predictive coding, where the
`actual frame is coded with reference to a pervious frame (I or P). This process is
`similar to H.261/H.263 predictive coding, except the previous frame is not always
`the closest previous frames, as in H.261/H.263 coding. The compression ratio of P
`frames is significantly higher than of I frames.
`
`B frames (bidirectional or interpolated frames) are coded using two reference
`frames, a past and a future frame (which can be I or P frames). Bidirectional, or
`interpolated coding provides the highest amount of compression [Fur95b].
`
`I, P, and B frames are described in more detail in Section 8.2. Note that in Figure
`2.1, the first three B frames (2,3, and 4) are bidirectionally coded using the past
`frame I (frame 1), and the future frame P (frame 5). Therefore, the decoding order
`will differ from the encoding order. The P frame 5 must be decoded before B frames
`2,3, and 4, and I frame 9 before B frames 6,7, and 8. If the MPEG sequence is
`transmitted over
`the network,
`the actual
`transmission order should be
`{1,5,2„3,4,8,6,7,8}.
`
`The MPEG application determines a sequence. of 1, P, and B, frames. If there is a
`need for fast random access, the best resolution would be achieved by coding the
`whole sequence as I frames (MPEG becomes identical to Motion JPEG). However,
`the highest compression ratio can be achieved by incorporating a large number of B
`frames.
`
`The block diagram of the MPEG encoder is given in Figure 2.2, while the MPEG
`decoder is shown in Figure 2.3.
`
`I frames are created similarly to JPEG encoded pictures, while P and B frames are
`encoded in terms of previous and future frames. The motion vector is estimated, and
`the difference between the predicted and actual blocks (error terms) are calculated.
`The error terms are then DCT encoded and the entropy encoder is used to produce
`the compact code.
`
`Page 23
`
`
`
`18
`
`Chapter 2
`
`
`
`'Decide Picture Type (I/P/B)
`
`Picture
`
`B
`
`'DCT, determine
`quantize factor
`
`'Quantize
`
`Dequantize
`
`IDCT
`
`[Estimate Motion
`
`Construct Motion
`Compensated
`
`Last UP
`Frame
`
`Next I/P
`Frame
`
`DCT, determine
`quantize factor
`
`Quantize
`
`Dequantize
`
`Pressed
`frame
`
`•
`VLC, Huffman
`encode
`
`Motion
`Vectors
`
`Error
`Term
`
`IDCT
`
`VLC, Huffman
`encode
`
`Cache B frames
`until next I/P
`frame
`
`Estimate Motion
`
`Construct Motion
`Compensated
`Frame
`
`DCT, determine
`quantize factor
`
`Quantize
`
`Mot
`Vectors
`
`Error
`Term
`
`VLC, Huffman
`encode
`
`encoded bitstream
`
`Figure 2.2 The block diagram of the MPEG encoder.
`
`2.2 MPEG DATA STREAM
`
`The MPEG specification defines a "video sequence" composed of a video sequence
`header and many Group-Of-Pictures (GOP), as illustrated in Figure 2.4. The video
`sequence header defines the video format, picture dimensions, aspect ratio, frame
`rate, and delivered data rate. Supported video formats include CCIR601,
`HDTV(16:9), and VGA. Supported chroma formats include "4:2:0" (YUV) and
`
`Page 24
`
`
`
`The MPEG Video Compression Standard
`
`19
`
`"4:4:4" (RGB). A suggested buffer size for the video sequence is also specified, a
`number intended to buffer jitter caused by differences in decode time,
`
`Audio Stream 1,
`
`MPEG Stream MPEG
`Biistrearn
`Buffer
`
`System
`Stream
`Decoder
`
`Video Stream
`
`User Stream
`
`Picture Bitstream
`
`Decoder
`
`Video
`Stream
`
`Sequence
`Header
`
` Picture
`Bitstream
`Buffer
`
` iij
`
`GOP
`Header
`
`Reorde
`Pictures
`
`Ordered
`Pictures
`
`Reference
`Picture
`
`Reference
`Picture
`
`IIP
`Pictures
`Figure 2.3 The block diagram of the MPEG decoder.
`
`A GOP contains pictures that may be encoded into one of three supported
`compression formats. The GOP header contains a starting time for the group, and
`can therefore be used as a point of random access. Each frame within the GOP is
`numbered, and its number coupled with the GOP start time and the playback frame
`rate determines its playback time. Each picture is subdivided into "slices" and then
`into "macroblocks". A macroblock is composed of four 8x8 blocks of luminance
`data, and typically two 8x8 blocks of chrominance data, one Cr and one Cb.
`
`Set
`Quantrzers,
`Buffer Size
`
`Set G cup
`Start Time
`
`Clock
`Tick
`
`Playback
`Synchonize
`
`Picture
`Buffer
`
`DCT data
`
`Inverse
`DCT
`
`Dequanlize
`
`Buffered
`Picture
`Bitstream
`
`Huffman,
` VLC
`Decode
`
`*Compensation
`
`1 Motion vector Motion
`/
`
`Page 25
`
`
`
`20
`
`Chapter 2
`
`Sequence
`Fleecier
`
`GOP
`
`GOP
`
`GOP
`
`GOP
`
`GOP
`
`MPEG Sequence
`
`GOP
`Header
`
`Picture
`
`Picture
`
`Slice
`Header
`
`MSC.*
`block
`
`MRS,.
`block
`
`Ware
`bleak
`header
`
`Picture
`
`Picture
`
`Group of Pictures
`
`Ma CPO
`block
`
`Macro-
`block
`
`Slice
`
`Cr
`
`Macroblock
`
`Figure 2.4 MPEG data stream.
`
`I Picture Format
`The I (Intraframe) picture format substantially corresponds to the JPEG format.
`These pictures are encoded by t