`“Wage
`6*“ \71dco
`
`Processmg
`
`1
`
`DISH 1019
`
`
`
`'9.L..Ill.
`
`[ht-“Kl!
`-mafiwflwfiwowbmH—stmwAfignoHumounnmm...A..
`
`
`
`3§§nun-‘5‘..3;suit—wig“.
`
`.IILilli.-
`
`2
`
`
`
`Academic Press Series in
`Communications, Networking, and Multimedia
`
`EDITOR-IN-CHIEF
`
`Jerry D. Gibson
`Southern Methodist University
`
`This series has been established to bring together a variety of publications that represent the latest in cutting-edge research,
`theory, and applications of modern communication systems. All traditional and modern aspects of communications as
`well as all methods of computer communications are to be included. The series will include professional handbooks,
`books on communication methods and standards, and research books for engineers and managers in the world-Wide
`communications industry.
`
`
`
`
`
`3
`
`
`
`This book is printed on acid—free paper. ®
`
`Copyright © 2000 by Academic Press
`
`All rights reserved.
`
`No part of this publication may be reproduced or transmitted in any form or by any means. electronic or mechanical, including
`photocopy. recording. or any information storage and retrieval system. without permission in writing from the publisher.
`Requests for permission to make copies of any part of the work should be mailed to the following address: Permissions Department,
`Harcourt. inc.. 6277 Sea Harbor Drive. Orlando. Florida. 32887-6777.
`Explicit Permission from Academic Press is not required to reproduce a maximum of two figures or tables from an Academic Press
`article in another scientific or research publication provided that the material has not been credited to another source and that full
`credit to the Academic Press article is given.
`
`ACADEMIC PRESS
`
`A Harcourt Science and Technology Company
`525 B Street, Suite 1900, San Diego, CA 92101-4495, USA
`http://www.academicpress.com
`
`Academic Press
`Harcourt Place, 32 Jamestown Road, London, NW1 7BY, UK
`http://www.hbul<.co.uk/ap/
`
`Library of Congress Catalog Number: 99-69120
`
`ISBN: 0-]2-119790-5
`
`Printed in Canada
`
`000102030405FR987654321
`
`
`
`4
`
`
`
`6.1
`
`1 Basic Concepts and
`Techniques of Video Coding
`and the H.261 Standard
`
`
`—
`
`'-
`
`.ett
`'iu'Of Texas
`
`-
`
`Introduction ................................................................................... 555
`1
`Introduction to Video Compression ....................................................... 556
`2
`3 Video Compression Application Requirements .......................................... 558
`4 Digital Video Signals and Formats ......................................................... 560
`4.1 Sampling of Analog Video Signals ' 4.2 Digital Video Formats
`5 Video Compression Techniques ............................................................ 563
`5.1 Entropy and Predictive Coding - 5.2 Block Transform Coding: The Discrete Cosine
`Transform ' 5.3 Quantization ' 5.4 Motion Compensation and Estimation
`6 Video Encoding Standards and H.261 ..................................................... 569
`6.1 The H.261 Video Encoder
`
`7 Closing Remarks .............................................................................. 573
`References ...................................................................................... 573
`
`" {induction
`
`
`
`'I EC! of video coding is of fundamental importance to
`in engineering and the sciences. Video engineer-
`
`.f-‘kly becoming a largely digital discipline. The digi-
`issinn of television signals via satellites is common-
`
`d widespread HDTV terrestrial transmission is slated
`in 1999. Video compression is an absolute require-
`
`_
`the growth and success of the low—bandwidth trans-
`.f'digital video signals. Video encoding is being used
`
`digital video communications, storage, processing, ac-
`.Bnd reproduction occur. The transmission of high—
`Imcdia information over high-speed computer net—
`
`Cfntral problem in the design of Quality of Services
`
`1' digital transmission providers. The Motion Pictures
`
`P (MPEG) has already finalized two video coding
`"MPEG-1 and MPEG—2, that define methods for the
`
`ion 0f digital video information for multimedia and
`‘ifi‘f vn'l'lats. MREG—4 is currently addressing the trans—
`iéafiery 10w_ bitrate video. MPEG-7 is addressing the
`
`d :2 0f Video storage and retrieval services (Chap-
`
`.
`-
`dISCUSS Video storage and retrieval). A central
`
`aspect to each of the MPEG standards are the video encoding
`and decoding algorithms that make digital video applications
`practical. The MPEG Standards are discussed in Chapters 6.4
`and 6.5.
`
`Video compression not only reduces the storage requirements
`or transmission bandwidth of digital video applications, but it
`also affects many system performance tradeoffs. The design and
`selection of a video encoder therefore is not only based on its
`ability to compress information. Issues such as bitrate versus
`distortion criteria, algorithm complexity, transmission channel
`characteristics, algorithm symmetry versus asymmetry, video
`source statistics, fixed versus variable rate coding, and standards
`compatibility should be considered in order to make good en-
`coder design decisions.
`The growth of digital video applications and technology in
`the past few years has been explosive, and video compression
`is playing a central role in this success. Yet, the video coding
`discipline is relatively young and certainly will evolve and change
`significantly over the next few years. Research in video coding has
`great vitality and the body of work is significant. It is apparent
`that this relevant and important topic will have an immense
`affect on the future of digital video technologies.
`
`_
`
`.' ' hieadmui: I‘m;
`“'0“ if! any form reserved.
`
`
`
`555
`
`
`
`5
`
`
`
`I
`
`556
`
`Handbook of Image and Vid- II.
`
`.
`
`2 Introduction to Video Compression
`
`Video or visual communications require significant amounts
`of information transmission. Video compression, as consid—
`ered here, involves the bitrate reduction of a digital video signal
`carrying visual information. Traditional video-based compres—
`sion, like other information compression techniques, focuses
`on eliminating the redundant elements of the signal. The de-
`gree to which the encoder reduces the bitrate is called its cod-
`ing efiiciency; equivalently, its inverse is termed the compression
`ratio:
`
`coding efficiency : (compression ratio)—1
`: encoded bitrate/decoded bitrate.
`
`(1)
`
`Compression can be a lossless or lossy operation. Because of
`the immense volume of video information, lossy operations are
`mainly used for video compression. The loss of information or
`distortion measure is usually evaluated with the mean square
`error (MSE), mean absolute error (MAE) criteria, or peak signal—
`to—noise ratio (PSNR):
`
`1
`MSE = ——
`
`M N
`
`MN 22,
`
`A
`[HIE j) — 1(1', D]2
`
`MAE=—
`MN [:1 j=l
`
`|I(i,j)—I(i,j)l
`
`2H
`PSNR=20l0g10 W 3
`
`(2)
`
`for an image I and its reconstructed image f, with pixel indices
`15 i 5 Mand l 5 j 5 N, image size N X Mpixels, and
`n bits per pixel. The MSE, MAE, and PSNR as described here
`
`are global measures and do not necessiltilv .
`I.
`
`cation of the reconstructed image quality. in gate a.
`the human observer determines the quality Grihe .
`I
`
`e r.
`image and video quality. The concc
`-
`
`.-
`in efficiency is one of the most
`tecghnical evaluation of video encofdltlz-lrts].a211:::ai Em ‘
`'
`
`quality assessment ofcompressed images “digs? .
`
`in Section 8.2.
`m _
`
`Video signals contain information i
`-
`
`,
`dimensions are modeled as spatial aiidhij;d;m.
`
`video encoding. Digitalvideo compreSSion “1:11:33 .
`
`imize information redundancy independently in e'a'
`
`Themajorinternationalvideo compression 513m] -
`;-
`MPEG—2, H.261) use this approach. Figure 1 sche
`picts a generalized video compression system that .
`,., ..
`
`i
`I
`the spatial and temporal encoding of a digital Email
`Each image in the sequence [k is defined as in
`
`spatial encoder operates on image blocks, typicallyo
`of 8 X 8 pixels each. The temporal encoder getter-all
`'
`
`on 16 X 16 pixel image blocks. The system is design-i
`modes of operation, the intraframe mode and the
`'
`mode.
`
`
`I
`
`
`The single-layer feedback structure of this getter
`is representative of the encoders that are recom
`the International Standards Organization (ISO). and:
`
`tional Telecommunications Union (ITU) video coding '
`MPEG—1,MPEG—2/H.262, and H.261 [1—3]. The fee
`is used in the interframe mode of operation and ;
`prediction error between the blocks of the current
`the current prediction frame. The prediction is gene _
`motion compensator. The motion estimation unit c
`vectors for each 16 X 16 block. The motion vectors
`
`'
`
`
`
`
`__
`
`ously reconstructed frame are fed to the motion com '._ ..
`create the prediction.
`
`II
`
`
`
`Prediction
`
`Variable
`Spatial
`Error Er
`
`
`
`Operator
`Length Coder
`T
`
`VLC Transmit
`
`
`
`Prediction
`Encoded
`
`
`Pk
`Inverse
`Intraframe
`
`Quantizer
`Sub—block
`Intraframe — Open
`
`Interframe - Closed
`Q“
`or
`
`Encoded
`Interframe
`
`Prediction
`Motion
`
`
`Error and
`Compensation
`Motion
`
`Vector
`
`
`
`
`
`MVk
`Motion Vectors
`Variable
`Length Coder
`
`
`VLC
`
`
`
`
`1
`k
`
`M0tion
`Estimation
`
`FIGURE 1
`
`Generalized video compression system.
`
`
`
`6
`
`
`
`sic C0“
`
`cgplS and Techniques of Video Coding and the H.261 Standard
`
`557
`
`
`
`
`'eilli-l'amme mode spatially encodes an entire current frame
`. dic basiS, 3.3., every 15 frames, to ensure that system-
`
`made not continuously propagate. The intraframe mode
`’ for; used to spatially encode a block whenever the inter-
`
`mending mode cannot meet its performance threshold.
`
`frame versus interframe mode selection algorithm is
`'el dad in this diagram. It is responsible for controlling the
`1 n of the encoding functions, data flows, and output data
`
`for each mode.
`antraframe encoding mode does not receive any input
`
`.g feedback loop. 1k is spatially encoded, and losslessly
`.l, ii)? the variable length coder (VLC) forming Ike, which
`
`mitted to the decoder. The receiver decodes I“, produc—
`Iz'ejconstructed image subblock Ik- During the interframe
`" "mode. the current frame prediction Pk is subtracted from
`
`meat frame input [k to form the current prediction error
`prediction error is then spatially and VLC encoded to
`
`--,\and it is transmitted along with the VLC encoded mo-
`'tors M Vk. The decoder can reconstruct the current frame
`
`dog the previously reconstructed frame it.-. (stored in
`an ear), the current frame motion vectors, and the predic-
`
`'...':r. The motions vectors M Vk operate on fk_1 to generate
`'.'__nt prediction frame Pk. The encoded prediction error
`-
`_
`
`«eded to produce the reconstructed prediction error E k.
`-- 'ction error is added to the prediction to form the cur-
`
`e ftp The functional elements ofthe generalized model
`
`'be'd here in detail.
`
`
`
`__al operator: this element is generally a unitary two—
`__ nsional linear transform, but in principle it can be
`
`nitary operator that can distribute most of the signal
`
`3}! into a small number of coefficients, i.e., decorrelate
`ignal data. Spatial transformations are successively ap—
`_J ed'to small image blocks in order to take advantage of
`
`. high degree of data correlation in adjacent image pix-
`:The most widely used spatial operator for image and
`
`. Coding is the discrete cosine transform (DCT). It is
`- ied to 8 x 8 pixel image blocks and is well suited for
`
`transformations because it uses real computations
`SE Implementations, provides excellent decorrela—
`
`':51811al components, and avoids generation of spu—
`'°mPonents between the edges of adjacent image
`
`
`ml”? the Spatial or transform operator is applied to
`-:;:If0rderbto arrange the signal into a more suit—
`
`. The :1“ 311. sequent lossy and lossless coding oper—
`r qugi-nt'lzer operates on the transform generated
`
`.l’l'edilcti: Isa lossy operation that can result in a Sig—
`_
`'thiskind nfln the bitrate. The quantization method
`1:31; The s“: Video encoder is usually scalar and non—
`
`fltion as 31' quentizer SlmpllfiCS the complexuy of
`
`-n_unifom°°mpated 'to vector quantization (VQ).
`quantization interval is Sized according
`
`
`
`to the distribution of the transform coefficients in order
`
`to minimize the bitrate and the distortion created by the
`quantization process. Alternatively, the quantization in—
`terval size can be adjusted based on the performance of
`the human Visual System (HVS). The Joint Pictures Expert
`Group (IPEG) standard includes two (luminance and color
`difference) l-IVS sensitivity weighted quantization matri—
`ces in its "Examples and Guidelines” annex. IPEG coding
`is discussed in Sections 5.5 and 5.6.
`
`3. Variable length coding: The lossless VLC is used to ex—
`ploit the “symbolic” redundancy contained in each block
`oftransform coefficients. This step is termed “entropy cod-
`ing" to designate that the encoder is designed to minimize
`the source entropy. The VLC is applied to a serial bit stream
`that is generated by scanning the transform coefficient
`block. The scanning pattern should be chosen with the
`objective of maximizing the performance of the VLC. The
`MPEG encoder for instance, describes a zigzag scanning
`pattern that is intended to maximize transform zero coef-
`ficient run lengths. The H.261 VLC is designed to encode
`these run lengths by using a variable length Hufifman code.
`
`The feedback loop sequentially reconstructs the encoded spa-
`tial and prediction error frames and stores the results in order
`to create a current prediction. The elements required to do this
`are the inverse quantizer, inverse spatial operator, delayed frame
`memory, motion estimator, and motion compensator.
`
`1.
`
`Inverse operators: The inverse operators Q‘1 and T“ are
`applied to the encoded current frame It.“ or the current
`prediction error Big in order to reconstruct and store the
`frame for the motion estimator and motion compensator
`to generate the next prediction frame.
`2. Delayed frame memory: Both current and previous frames
`must be available to the motion estimator and motion
`
`compensator to generate a prediction frame. The number
`of previous frames stored in memory can vary based upon
`the requirements of the encoding algorithm. MPEG-1 de—
`fines a B frame that is a bidirectional encoding that requires
`that motion prediction be performed in both the forward
`and backward directions. This necessitates storage of mul—
`tiple frames in memory.
`3. Motion estimation: The temporal encoding aspect of this
`system relies on the assumption that rigid body motion is
`responsible for the differences between two or more suc—
`cessive frames. The objective of the motion estimator is to
`estimate the rigid body motion between two frames. The
`motion estimator operates on all current frame 16 x 16
`image blocks and generates the pixel displacement or mo—
`tion vector for each block. The technique used to generate
`motion vectors is called block-matching motion estimation
`and is discussed further in Section 5.4. The method uses
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the current frame Ii. and the previous reconstructed frame
`
`
`
`
`
`7
`
`
`
`558
`
`it.-. as input. Each block in the previous frame is assumed
`to have a displacement that can be found by searching for
`it in the current frame. The search is usually constrained
`to be within a reasonable neighborhood so as to minimize
`the complexity of the operation. Search matching is usu—
`ally based on a minimum MSE or MAE criterion. When a
`match is found, the pixel displacement is used to encode
`the particular block. If a search does not meet a minimum
`MSE or MAE threshold criterion. the motion compen-
`sator will indicate that the current block is to be spatially
`encoded by using the intraframe mode.
`4. Motion compensation: The motion compensator makes
`use of the current frame motion estimates M Vk and the
`previously reconstructed frame lit--1 to generate the cur-
`rent frame prediction Pk. The current frame prediction is
`constructed by placing the previous frame blocks into the
`current frame according to the motion estimate pixel dis-
`placement. The motion compensator then decides which
`blocks will be encoded as prediction error blocks using
`motion vectors and which blocks will only be spatially en—
`coded.
`
`The generalized model does not address some video compres-
`sion system details such as the bit—stream syntax (which supports
`different application requirements), or the specifics of the en-
`coding algorithms. These issues are dependent upon the video
`compression system design.
`Alternative video encoding models have also been researched.
`Three-dimensional (3—D) video information can be compressed
`directly using VQ or 3-D wavelet encoding models. VQ encodes
`a 3-D block ofpixels as a codebook index that denotes its “closest
`or nearest neighbor" in the minimum squared or absolute error
`sense. However, the VQ codebook size grows on the order as the
`number of possible inputs. Searching the codebook space for the
`nearest neighbor is generally very computationally complex, but
`structured search techniques can provide good bitrates, quality,
`and computational performance. lice-structured VQ (TSVQ)
`[13] reduces the search complexity from codebook size N to log
`N. with a corresponding loss in average distortion. Thesimplicity
`of the VQ decoder (it only requires a table lookup for the trans-
`mitted codebook index) and its bitrate-distortion performance
`make it an attractive alternative for specialized applications. The
`complexity of the codebook search generally limits the use of
`VQ in real—time applications. Vector quantizers have also been
`proposed for interframe, variable bitrate, and subband video
`compression methods [4].
`Three—dimensional wavelet encoding is a topic of recent inter—
`est. This video encoding method is based on the discrete wavelet
`transform methods discussed in Section 5.4. The wavelet trans—
`form is a relatively new transform that decomposes a signal into
`a nmltiresolution representation. The multiresolution decompo-
`sition makes the wavelet transform an excellent signal analysis
`tool because signal characteristics can be viewed in a variety of
`
`Handbook of Image and Videa
`time-frequency scales. The wavelet tran
`
`sform is imp} .
`practice by the use of multiresolution o
`t' subband filt‘
`The wavelet filterbank is well suited lb
`I Video an“,
`of its ability to adapt to the multires
`
`olution Chara.
`video signals. Wavelet transform cnco
`.
`dings are “at .
`.
`
`chical in their time- frequency l‘cprcs
`entation and _
`able for progressive transmission [6].
`ThEY have alm-
`
`to possess excellent burrito-distortion characteristics.
`Direct three-dimensrmtal Video compresstWS'.‘ .
`
`from a major drawback for real-time enmding a. d.
`sion. in order to encode a sequence of images in o: '-
`the sequence must be buffered. This introdums a bu ;;
`
`computational delay that can be very noticeable in.
`‘
`.1
`interactive video communications.
`"
`Video compression techniques treating visual info
`
`accordance With I-lVS models have recently been hi
`
`These methods are termed “second-generation or ob
`methods, and attempt to achieve very large compm
`by imitating the operations of the HVS. The HVS .r
`
`also be incorporated into more traditional video com
`techniques by reflecting visual perception into vario'
`
`of the coding algorithm. HVS weightings have been d
`the DCT AC coefficients quantizer used in the MPEG- um
`
`A discussion of these techniques can be found in Chap r.
`.‘
`Digital video compression is currently enjoying tre
`
`growth, partially because of the great advances in V
`
`and microcomputer technology in the past decade. Th
`
`nature of video communications necessitates the use of
`iaIt
`purpose and specialized high—performance hardwared k
`
`the near future, advances in design and manufacturingt
`gies will create hardware devices that will allow greater. -_
`
`ability, interactivity, and interoperability of video appli
`These advances will challenge future video compresst
`
`nology to support format—free implementations.
`
`3 Video Compression Application
`Requirements
`
`
`
`A wide variety of digital video applications currently "
`
`range from simple low—resolution and low-bandwidth ‘
`tions (multimedia, PicturePhone) to very high-rafting” -.
`
`high-bandwidth (HDTV) demands. This section will P
`quirements of current and future digital video 2‘9le
`the demands they place on the video cornlart-‘Si‘ic’i:II “9""
`
`As a way to demonstrate the importance of “id." __
`sion, the transmission of digital video television stgii
`
`sented. The bandwidth required by a digital tclmtISl"II
`approximately one-half the number of picture Elm-13
`
`els) displayed per second. The analog pixel Size in d
`
`dimension is the distance between scanning lines: 3"
`zontal dimension is the distance the scanning SP0t mo
`
`
`
`8
`
`
`
`:1 sic Concepts and Techniques of Video Coding and the H.261 Standard
`Mr .
`
`559
`
`
`
`Bw = (cycles/frame)(FR)
`
`= (cycles/line) (NL)(FR)
`
`(0.5)(aspect rati0)(FR)(NL)(RH)
`0.84
`
`= (0.8)(FR)(ML)(RH)’
`
`(3)
`
`= system bandwidth,
`
`=.—‘ number of frames transmitted per second (fps),
`= number of scanning lines per frame,
`
`:
`.= horizontal resolution (lines), proportional
`
`'
`to pixel resolution.
`
`
`‘onal Television Systems Committee (NTSC) aspect ra—
`y3, the constant 0.5 is the ratio of the number of cycles to
`
`Jumber oflines, and the factor 0.84 is the fraction ofthe hor-
`arming interval that is devoted to signal transmission.
`NTSC transmission standard used for television broad-
`
`. the United States has the following parameter values:
`
`' 29.97 fps, N; = 525 lines, and RH = 340 lines. This
`
`video system bandwidth Bw of 4.2 MHz for the NTSC
`tsystem. In order to transmit a color digital video signal,
`
`
`
`uiring 8 bits. The NTSC picture frame has 720 x 480 x 2
`inance and color pixels. In order to transmit this in—
`
`n for an NTSC broadcast system at 29.97 frames/s, the
`. Bbandwidth is required:
`
`
`
`
`la] Bw 2 l/zbitrate = 1/,(2997 fps) x (24 bits/pixel)
`
`x (720 x 480 x 2 pixels/frame)
`= 249 MHz.
`
`
`resents an increase of ~59 times the available system
`th. and ~41 times the full transmission channel band-
`'fijMHZ) for current NTSC signals. HDTV picture res-
`
`equities up to three times more raw bandwidth than
`'Plfi'! (Two transmission channels totaling 12 MHz are
`
`0r terrestrial HDTV transmissions.) It is clear from
`_
`«If PIG that terrestrial television broadcast systems will
`' m“ digital transmission and digital video compression
`
`e the overall bitrate reduction and image quality re-
`r I‘IDTV signals.
`_.
`PIE not only points out the significant bandwidth
`Plus Fm digital video information, but also indirectly
`
`,P the issue of digital video quality requirements. The
`
`"cmccll bitrate and quality or distortion is a funda—
`
`-
`
`
`
`
`
`mental issue facing the design of video compression systems.
`To this end, it is important to fully characterize an applica—
`tion’s video communications requirements before designing or
`selecting an appropriate video compression system. Factors that
`should be considered in the design and selection of a video com—
`pression system include the following items.
`
`1. Video characteristics: video parameters such as the dy-
`namic range, source statistics, pixel resolution, and noise
`content can affect the performance of the compression sys—
`tem.
`
`t0
`
`Transmission requirements: transmission bitrate require-
`ments determine the power of the compression system.
`Very high transmission bandwidth, storage capacity, or
`quality requirements may necessitate lossless compression.
`Conversely, extremely low bitrate requirements may dic-
`tate compression systems that trade off image quality for
`a large compression ratio. Progressive transmission is a key
`issue for selection of the compression system. It is gen-
`erally used when the transmission bandwidth exceeds the
`compressed video bandwidth. Progressive coding refers to
`a multiresolution, hierarchical, or subband encoding of
`the video information. It allows for transmission and re-
`
`construction ofeach resolution independently from low to
`high resolution. In addition, channel errors affect system
`performance and the quality of the reconstructed video.
`Channel errors can affect the bit stream randomly or in
`burst fashion. The channel error characteristics can have
`
`different effects on different encoders, and they can range
`from local to global anomalies. In general, transmission
`error correction codes (ECC) are used to mitigate the ef-
`fect of channel errors, but awareness and knowledge ofthis
`issue is important.
`3. Compression system characteristics and performance: the
`nature of video applications makes many demands on the
`_video compression system. Interactive video applications
`such as videoconferencing demand that the video com-
`pression systems have symmetric capabilities. That is, each
`participant in the interactive video session must have the
`same video encoding and decoding capabilities, and the
`system performance requirements must be met by both
`the encoder and decoder. In contrast, television broad-
`cast video has significantly greater performance require—
`ments at the transmitter because it has the responsibility
`of providing real-time high quality compressed video that
`meets the transmission channel capacity. Digital video sys—
`tem implementation requirements can vary significantly.
`Desktop televideo conferencing can be implemented by
`using software encoding and decoding, or it may require
`specialized hardware and transmission capabilities to pro—
`vide a high—quality performance. The characteristics ofthe
`application will dictate the suitability of the video com—
`pression algorithm for particular system implementations.
`
`
`
`9
`
`
`
`J
`
`||
`|
`
`.-'L
`
`560
`
`The importance of the encoder and system implementa—
`tion decision cannot be overstated; system architectures
`and performance capabilities are changing at a rapid pace
`and the choice of the best solution requires careful analysis
`of the all possible system and encoder alternatives.
`4. Rate—distortion requirements: the rate—distortion require-
`ment is a basic consideration in the selection of the video
`
`encoder. The video encoder must be able to provide the
`bitrate(s) and video fidelity (or range of video fidelity)
`required by the application. Otherwise, any aspect of the
`system may not meet specifications. For example, if the bi—
`trate specification is exceeded in order to support a lower
`MSE, a larger than expected transmission error rate may
`cause a catastrophic system failure.
`5. Standards requirements: video encoder compatibility with
`existing and future standards is an important considera-
`tion if the digital video system is required to interoperate
`with existing or future systems. A good example is that ofa
`desktop videoconferencing application supporting a num-
`ber of legacy video compression standards. This results in
`requiring support of the older video en coding standards on
`new equipment designed for a newer incompatible stan—
`dard. Videoconferencing equipment not supporting the
`old standards would not be capable or as capable to work
`in environments supporting older standards.
`
`These factors are displayed in Table 1 to demonstrate video
`compression system requirements for some common video com-
`munications applications. The video compression system de—
`signer at a minimum should consider these factors in making
`a determination about the choice of video encoding algorithms
`and technology to implement.
`
`
`Handbook of Image and Video P
`
`4 Digital Video Signals and Formats 1
`
`Video compression techniques make use of Signal m —
`order to be able to utilize the body of digital Sign;
`
`sis/processing theory and techniques that have bitten d ...
`over the past fifty or so years. The design of a Video '
`
`sion system, as represented by the generalized model in
`in Section 2, requires a knowledge of the Signal Chara-
`and the digital processes that are used to create the diglq .
`
`signal. It is also highly desirable to understand video'.
`systems, and the behavior ofthe HVS.
`
`
`
`4.1 Sampling of Analog Video Signals
`
`Digital video information is generated by sampling
`sity of the original continuous analog video signal
`in three dimensions. The spatial component of the v1
`
`ha] is sampled in the horizontal and vertical dimensio '
`
`and the temporal component is sampled in the time d1”
`(1). This generates a series of digital images or image
`
`I(i, j, k). Video signals that contain colorized informs
`
`whose intensities are likewise sampled in three dimens‘
`sampling process inherently quantizes the video signallm.
`the digital word precision used to represent the intensi
`I"
`
`
`
`video representation can be reproduced with arbitrary c};
`to the original analog video signal. The topic ofvideo'- _ "
`
`and interpolation is discussed in Chapter 7.2.
`
`
`pling theorem. This theorem defines the conditions 11nd
`
`
`TABLE 1 Digital video application requirements
`
` Application
`
`
`Bitrate Req.
`Distortion Req.
`
`Network video
`on demand
`
`Video phone
`
`Desktop multimedia
`video CDROM
`
`1.5 Mbps
`10 Mbps
`
`High
`medium
`
`64 Kbps
`
`High distortion
`
`1.5 Mbps
`
`High distortion
`to medium
`
`Transmission Req.
`Internet
`100 Mbps
`LAN
`lSDN p x 64
`
`PC channel
`
`Computational Req.
`MPEG—1
`MPEG-2
`
`H.261 encoder
`H.261 decoder
`MPEG—1 decoder
`
`Standards Req.
`MPEG-1
`MPEG-2
`MPEG-7
`H.261
`
`MPEG—1
`MPEG-2
`MPEG—7
`MPEG—2,
`H.261
`MPEG-1,
`MPEG—4:
`H.263
`MPEG-4,
`H.263
`MPEG—2
`
`
`
`10 Mbps
`
`1.5 Mbps
`
`Medium
`distortion
`High distortion
`
`Fast ethernet
`100 Mbps
`Ethernet
`
`Hardware decoders
`
`Hardware decoders
`
`Software decoder
`
`MPEG—2 decoder
`
`MPEG4 decoder
`
`MPEG2
`
`MPEG-2 decoder
`
`MPEG-2
`
`64 Kbps
`
`10 Mbps
`
`Very high
`distortion
`Low distortion
`
`20 Mbps
`
`Low distortion
`
`20 Mbps
`
`Low distortion
`
`POTS and
`internet
`Fixed service
`satellites
`12—MHZ
`terrestrial link
`PC channel
`
`10
`
`Desktop LAN
`videoconference
`Desktop WAN
`videoconference
`
`Desktop dial—up
`videoconference
`Digital satellite
`television
`
`HDTV
`
`DVD
`
`10
`
`
`
`561
`
`1 0"
`
` ncgpLS and Techniques of Video Coding and the H.261 Standard
`
`
`ILI
`
`
`
`
`
`
`
`
`
`-fs
`
`fa
`
`0
`(3)
`
`IL]
`
`I
`
`-f
`
`—3/2f,
`
`-f,
`
`-'/2f,
`
`-f3
`
`~f
`
`-3/2fs 1”: -fs
`
`f5 M,
`
`L
`
`3/2],
`
`f
`
`fa
`
`f,
`
`3/2f,
`
`f
`
`0
`(b)
`
`IL,I
`
`0
`(C)
`
`FIGURE 2 Nyquist sampling theorem, with magnitudes ofFourier spectra for (a) input I; (b) sampled
`input 15, with f, > 21);; (c) sampled input 15, with f, < ZfB.
`
`
`
`‘analog signals can be “perfectly” reconstructed. If these
`115 are not met, the resulting digital signal will contain
`omPcmems which introduce artifacts into the recon-
`
`The Nyquist conditions are depicted graphically for
`ensional case in Fig. 2.
`
`_ dimensional signal I is sampled at rate f;. It is band-
`
`% are all real—world signals) in the frequency domain
`1"" frequency bound of f3. According to the Nyquist
`
`Goren-1, if a bandlimited signal is sampled, the result—
`
`-- 3-5PECEI'un1 is made up of the original signal spectrum
`
`ePilicates of the original spectrum spaced at integer
`
`“he sampling frequency f5. Diagram (a) in Fig. 2
`
`F magnitude |L| of the Fourier spectrum for I. The
`"e of the Fourier spectrum IL,| for the sampled sig—
`
`"-'°‘:Vn for two caseS. Diagram (b) presents the case
`
`a, imglnal signal I can be reconstructed by recovering
`- sPeflral island. Diagram (c) displays the case where
`
`Sampling criteria has not been met and spectral
`hut; The spectral overlap is termed aliasing and oc—
`
`£d< zfs- When f5 > 2 f3, the original signal can be
`by 11$ng a low—pass digital filter whose passband
`
`is designed to recover ILl. These relationships provide a basic
`framework for the analysis and design of digital signal process-
`ing systems.
`Two-dimensional or spatial sampling is a simple extension of
`the one-dimensional case. The Nyquist criteria has to be obeyed
`in both dimensions; i.e., the sampling rate in the horizontal direc-
`tion must be two times greater than the upper frequency bound
`in the horizontal direction, and the sampling rate in the vertical
`direction must be two times greater than the upper frequency
`bound in the vertical direction. In practice, spatial sampling grids
`are square so that an equal number of samples per unit length in
`each direction are collected. Charge coupled devices (CCDs) are
`typically used to spatially sample analog imagery and video. The
`sampling grid spacing of these devices is more than sufficient
`to meet the Nyquist criteria for most resolution and applica-
`tion requirements. The electrical characteristics of CCDs have a
`greater effect on the image or video quality than its sampling grid
`size.
`
`Temporal sampling ofvideo signals is accomplished by captur-
`ing a spatial or image frame in the time dimension. The temporal
`samples are captured at a uniform rate of ~60 fields/s for NTSC
`
`11
`
`
`
`
`
`11
`
`
`
`562
`
`television and 24 fps for a motion film recording. These sampling
`rates are significantly smaller than the spatial sampling rate. The
`maximum temporal frequency that can be reconstructed accord—
`ing to the Nyquist frequency criteria is 30 Hz in the case oftelevi