`Nideo
`ecoleSSret
`
`DISH 1021
`
`1
`
`DISH 1021
`
`
`
`=ae—se
`cssneeeaeace
`
`eeealdadtie!et
`
`2
`
`
`
`Academic Press Series in
`Communications, Networking, and Multimedia
`
`EDITOR-IN-CHIEF
`
`Jerry D. Gibson
`Southern Methodist University
`
`This series has been established to bring togethera variety of publications that represent the latest in cutting-edge research,
`theory, and applications of modern communication systems. All traditional and modern aspects of communications as
`well as all methods of computer communications are to be included. Theseries will include professional handbooks,
`books on communication methods and standards, and research books for engineers and managers in the world-wide
`communications industry.
`
`
`
`
`
`3
`
`
`
`This bookis printed on acid-free paper. @
`
`Copyright © 2000 by Academic Press
`
`All rights reserved.
`
`Nopartof this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
`photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
`Requests for permission to make copies of any part of the work should be mailed to the following address: Permissions Department,
`Harcourt, Inc., 6277 Sea Harbor Drive, Orlando,Florida, 32887-6777,
`Explicit Permission from Academic Press is not required to reproduce a maximum oftwo figures or tables from an Academic Press
`article in anotherscientific or research publication provided that the material has not been credited to another source and thatfull
`credit to the Academic Pressarticle is given.
`
`ACADEMIC PRESS
`A Harcourt Science and Technology Company
`525BStreet, Suite 1900, San Diego, CA 92101-4495, USA
`http://www.academicpress.com
`
`
`
`Academic Press
`Harcourt Place, 32 Jamestown Road, London, NW1 7BY, UK
`http://www.hbuk.co.uk/ap/
`
`Library of Congress Catalog Number: 99-69120
`
`ISBN: 0-12-119790-5
`
`Printed in Canada
`
`00
`
`01
`
`02
`
`03
`
`04
`
`05
`
`FR 9
`
`8 7654321
`
`4
`
`
`
`6.1
`_ Basic Concepts and
`Techniques of Video Coding
`and the H.261 Standard
`
`
`
`Introduction ..... 0... eee ceececnecssceeeeeeseeeescccsecessessuuucssuenuuaueuceesnavenees 555
`1
`Introduction to Video Compression ...............cccceceeeeeeeceueeeessseveiaeceseecetees 556
`2
`3 Video Compression Application Requirements ............0cccseecneeeecveveeseeeerawess 558
`4 Digital Video Signals and Formats ..............c:cccceccesseuscsnecsnsesseuetevceeeseuers 560
`4,1 Sampling of Analog Video Signals * 4.2 Digital Video Formats
`5 Video Compression Techniques .............ccscccsecescceseepaeeueeeeseeccsersetareneeenes 563
`5.1 Entropy and Predictive Coding * 5.2 Block Transform Coding: The Discrete Cosine
`Transform * 5.3 Quantization * 5.4 Motion Compensation and Estimation
`6 Video Encoding Standards and H.26] ............cccecececcecseeeeeeecoseseeeteesauseeats 569
`6.1 The H.261 Video Encoder
`sr aeseeseeseecegeeseseseascsiesee eee sese se seieesesisieeis lenis tae «eecieeiseee sien 573
`7 Closing Remarks......
`References...........04.
`Pee ee ete ee eee ce rereeieregeie ieee cesses “pee Me eeeeeeus 573
`
`nett
`sityof Texas
`
`
`ubject of video coding is of fundamental importance to
`in engineering and the sciences. Video engineer-
`
`ckly becoming a largely digital discipline. The digi-
`issionoftelevision signals via satellites is common-
`
`d widespread HDTVterrestrial transmissionis slated
`
`in 1999, Video compression is an absolute require-
`
`the growth and success of the low-bandwidth trans-
`fdigital video signals. Video encoding is being used
`digital video communications, storage, processing, ac-
`
`and teproduction occur. The transmission of high-
`imedia information over high-speed computernet-
`
`Central problem in the design of Quality of Services
`
`r digital transmission providers. The Motion Pictures
`
`iP (MPEG)has already finalized two video coding
`
`YMPEG-1 and MPEG-2,that define methodsfor the
`tonof digital video information for multimedia and
`
`Fmats. MPEG-4 is currently addressing the trans-
`nof
`Very low bitrate video. MPEG-7 is addressing the
`
`tion of video storage and retrieval services (Chap-
`
`Id 9.2 discuss video storage and retrieval). A central
`
`Pisce tt Atadeniic Pro
`Stlon jn any form reserved.
`
`
`
`aspect to each of the MPEGstandards are the video encoding
`and decoding algorithms that make digital video applications
`practical. The MPEG Standards are discussed in Chapters 6.4
`and 6.5.
`Video compression notonly reducesthe storage requirements
`or transmission bandwidth ofdigital video applications, butit
`also affects many system performance tradeoffs. The design and
`selection of a video encoder therefore is not only based onits
`ability to compress information. Issues such as bitrate versus
`distortion criteria, algorithm complexity, transmission channel
`characteristics, algorithm symmetry versus asymmetry, video
`sourcestatistics, fixed versus variable rate coding, and standards
`compatibility should be considered in order to make good en-
`coderdesign decisions.
`The growth of digital video applications and technology in
`the past few years has been explosive, and video compression
`is playing a central role in this success. Yet, the video coding
`disciplineis relatively young and certainly will evolve and change
`significantly over the next few years. Research in video coding has
`great vitality and the body ofworkis significant. It is apparent
`that this relevant and important topic will have an immense
`affect on the future ofdigital video technologies.
`
`555
`
`
`
`5
`
`
`
`Handbookof Image and VideoWh
`are global measures and do not Necessarily 9}
`a
`
`cation of the reconstructed image quality, In aa a
`the human observer determines the quality oft i
`image and video quality. The conceptofdistor er :
`
`
`ing efficiencyis one of the most fundamental a
`technical evaluation of video encoders. The to ae
`p
`quality assessment ofcompressed images and he of
`a
`in Section 8.2.
`
`Video signals contain information j
`:
`dimensions are modeledas spatial ondaa :
`
`videoencoding.Digitalvideo compressionmy,2hdl! C
`
`
`imize information redundancy independently in ext
`
`The major international video compressionstandar:
`MPEG-2, H.261) use this approach.Figure } schematies
`picts a generalized video compression system thatimpli
`
`the spatial and temporal encodingof a digital ima
`4
`Each image in the sequence J is defined as in
`
`spatial encoder operates on imageblocks, typically 9)
`of 8 x 8 pixels each. The temporal encoder generally.
`
`on 16 x 16 pixel image blocks. Thesystem is designe
`modes of operation, the intraframe mode and the
`
`mode.
`
`The single-layer feedback structureofthis generali
`is representative of the encoders that are recom
`the International Standards Organization (ISO) and.
`
`tional Telecommunications Union (ITU) video coding st
`MPEG-1, MPEG-2/H.262, and H.261 [1-3]. The feed
`
`is used in the interframe mode of operation and ¢
`
`prediction error between the blocks of the current
`
`the current prediction frame. The prediction is generé
`motion compensator. The motion estimation unit create
`vectors for each 16 x 16 block, The motion vectors
`ously reconstructed frame are fed to the motion compen
`create the prediction.
`
`956
`
`2 Introduction to Video Compression
`
`Video or visual communications require significant amounts
`of information transmission. Video compression, as consid-
`ered here, involves the bitrate reduction ofa digital video signal
`carrying visual information. Traditional video-based compres-
`sion, like other information compression techniques, focuses
`on eliminating the redundant elements of the signal. The de-
`gree to which the encoder reducesthebitrate is called its cod-
`ing efficiency; equivalently, its inverse is termed the compression
`ratio:
`
`codingefficiency = (compression ratio)’
`= encodedbitrate/decodedbitrate.
`
`(1)
`
`Compression can bea lossless or lossy operation. Because of
`the immense volumeofvideo information,lossy operations are
`mainly used for video compression. The loss of information or
`distortion measure is usually evaluated with the mean square
`error (MSE), mean absolute error (MAE)criteria, or peak signal-
`to-noise ratio (PSNR):
`
`
`
`Prediction
`
`
` Variable
`Spatial
`Error
`Te Or Eke
`
`Operator
`Length Coder
`
`
`T
`
`VLC
`
`
` Transmit
`Prediction
`
`
`P,
`Encoded
`
`Inverse
`‘
`Intraframe
`
`
`Intraframe — Open
`Quantizer
`Sub-block
`I
`Interframe - Closed
`Q
`or
`$
`Encoded
`
`Fem andEyy
`Interframe
`
`
`Motion
`Delayed
`Inverse Spatial
`Sa
`
`
`Compensation
`Frame
`Operator
`a
`
`Memory
`T!
`Motion
`Vector
`
`
`Variable
`
`MVe
`Motion
`Motion Vectors
`
`
`Estimation
`Length Coder
`
`
`VLC
`I;
`
`
`1 MN
`MSE = ——
`
`A
`dG, jp -I1G py
`
`l
`M oN
`a
`MAE= 5 3) Wa p-1G p
`1
`j=l
`PSNR = 20]og,, (
`Gat) >
`
`(2)
`
`for an imageI andits reconstructed imageI, with pixel indices
`1<i< Mandl <j < N,image size N x M pixels, and
`n bits per pixel. The MSE, MAE, and PSNRasdescribed here
`
`FIGURE 1
`
`Generalized video compression system.
`
`
`
`6
`
`
`
`
`sic COD
`
`to the distribution of the transform coefficients in order
`‘ejntraframe modespatiallyencodesanentirecurrent frame
`
`neriodic basis, eB» every 15 frames, to ensure that system-
`to minimizethe bitrate and the distortion created by the
`rors do not continuously propagate. The intraframe mode
`quantization process. Alternatively, the quantization in-
`
`“» be used to spatially encode a block wheneverthe inter-
`terval size can be adjusted based on the performance of
`encoding mode cannot meetits performance threshold.
`the humanVisual System (HVS). The Joint Pictures Expert
`
`frame versus interframe modeselection algorithmis
`Group (JPEG) standard includes two (luminance and color
`scluded in this diagram.It is responsible for controlling the
`difference) HVS sensitivity weighted quantization matri-
`Fon ofthe encoding functions,data flows, and output data
`ces in its “Examples and Guidelines” annex. JPEG coding
`
`is discussed in Sections 5.5 and 5.6.
`¢for each mode.
`Jntraframe encoding mode does not receive any input
`3. Variable length coding: The lossless VLC is used to ex-
`
`be feedback loop. 1x is spatially encoded, andlosslessly
`ploit the “symbolic” redundancy contained in each block
`od by the variable length coder (VLC) forming Ij,which
`oftransform coefficients. This step is termed “entropy cod-
`
`mitted to the decoder. The receiver decodes I. produc-
`ing” to designate that the encoderis designed to minimize
`
`reconstructed image subblock J,. During the interframe
`the source entropy. The VLC is applied toaserial bit stream
`
`
`‘gmode,the current frame prediction P, is subtracted from
`that is generated by scanning the transformcoefficient
`
`rent frame input J, to form the current prediction error
`block. The scanning pattern should be chosen with the
`
`
`prediction error is then spatially and VLC encoded to
`objective of maximizing the performance of the VLC. The
`
`
`.,andit is transmitted along with the VLC encoded mo-
`MPEG encoderfor instance, describes a zigzag scanning
`
`etors M Vy. The decoder can reconstruct the current frame
`pattern that is intended to maximize transform zero coef-
`
`
`sing the previously reconstructed frame heey (stored in
`ficient run lengths. The H.261 VLCis designed to encode
`
`coder), the current frame motionvectors, and the predic-
`these run lengths by using a variable length Huffman code.
`
`
`ror, The motions vectors M V; operate on iy to generate
`
`ic
`rre nt prediction frame P,. The encoded prediction error
`
`coded to produce the reconstructedpredictionerror Ex.
`ediction error is addedto the prediction to form the cur-
`
`ef. The functional elementsofthe generalized model
`
`ibed here in detail.
`
`copts and Techniques of Video Coding and the H.261 Standard
`
`557
`
`
`
`The feedback loop sequentially reconstructs the encodedspa-
`tial and prediction error frames and stores the results in order
`to create a current prediction. The elements required to do this
`are the inverse quantizer, inverse spatial operator, delayed frame
`memory, motion estimator, and motion compensator.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`atial operator: this element is gencrally a unitary two-
`Hensional linear transform, but in principle it can be
`
`nitary operator that can distribute mostofthe signal
`
`gy into a small numberofcoefficients,i.e., decorrelate
`ignal data. Spatial transformationsare successively ap-
`1 edto small image blocks in order to take advantage of
`
`@high degree of data correlation in adjacent image pix-
`The most widely used spatial operator for image and
`
`© codingis the discrete cosine transform (DCT). It is
`plied to 8 x 8 pixel image blocks andis well suited for
`
`transformations becauseit uses real computations
`st implementations, provides excellent decorrela-
`
`‘signal components, and avoids generation of spu-
`Omponents between the edges of adjacent image
`
`
`atizer: the spatial or transform operator is applied to
`eSin orderto arrange thesignal into a more suit-
`
`oe at for subsequent lossy and lossless coding oper-
`< = Sh operates on the transform generated
`
`eana isalossyoperation that can resultina sig-
`Hi thiskina ee the bitrate. The quantization method
`
`EM. The oe: video encoder is usually scalar andnon-
`
`ation a6 ar quantizer simplifies the complexity of
`n-unjforme? to vector quantization (VQ).
`quantization interval is sized according
`
`
`
`1.
`
`Inverse operators: The inverse operators Q~' and T™! are
`applied to the encoded current frame J;,, or the current
`prediction error E,, in order to reconstruct and store the
`frame for the motion estimator and motion compensator
`to generate the next prediction frame.
`2. Delayed frame memory: Both current and previousframes
`must be available to the motion estimator and motion
`compensatorto generate a prediction frame. The number
`of previous frames stored in memorycan vary based upon
`the requirements of the encoding algorithm. MPEG-1 de-
`fines a B framethatis a bidirectional encoding that requires
`that motion prediction be performedin both the forward
`and backward directions. This necessitates storage of mul-
`tiple frames in memory.
`3. Motion estimation: The temporal encodingaspectofthis
`systemrelies on the assumption that rigid body motionis
`responsible for the differences between two or more suc-
`cessive frames, The objective of the motion estimatoris to
`estimate the rigid body motion between twoframes. The
`motion estimator operates on all current frame 16 x 16
`image blocks and generates the pixel displacement or mo-
`tion vector for each block. The technique used to generate
`motion vectorsis called block-matching motion estimation
`and is discussed further in Section 5.4. The method uses
`the current frame J; and the previous reconstructed frame
`
`
`
`
`
`
`
`
`
`
`
`7
`
`
`
`558
`
`fy as input. Each block in the previous frameis assumed
`to have a displacement that can be found bysearching for
`it in the current frame. The search is usually constrained
`to be within a reasonable neighborhoodso as to minimize
`the complexity of the operation. Search matching is usu-
`ally based on a minimum MSE or MAE criterion. When a
`matchis found, the pixel displacement is used to encode
`the particular block. Ifa search does not meet a minimum
`MSE or MAEthreshold criterion, the motion compen-
`sator will indicate that the current block is to be spatially
`encoded by using the intraframe mode.
`4, Motion compensation: The motion compensator makes
`use of the current frame motion estimates MV; and the
`previously reconstructed frame fy to generate the cur-
`rent frame prediction P,. The current frame prediction is
`constructed by placing the previous frame blocks into the
`current frame according to the motionestimate pixel dis-
`placement. The motion compensator then decides which
`blocks will be encoded as prediction error blocks using
`motion vectors and which blocks will only be spatially en-
`coded.
`
`The generalized model does not address some video compres-
`sion system details suchas the bit-stream syntax (which supports
`different application requirements), or the specifics of the en-
`coding algorithms. These issues are dependent uponthe video
`compression system design.
`Alternative video encoding models have also been researched.
`Three-dimensional (3-D) video information can be compressed
`directly using VQ or 3-D wavelet encoding models. VQ encodes
`a3-D block ofpixels as a codebook index that denotesits “closest
`or nearest neighbor”in the minimum squaredorabsolute error
`sense. However, the VQ codebook size grows on the orderas the
`numberofpossible inputs. Searching the codebookspacefor the
`nearest neighboris generally very computationally complex,but
`structured search techniques can provide goodbitrates, quality,
`and computational performance. Tree-striuctured VQ (TSVQ)
`{13] reduces the search complexity from codebooksize N to log
`N, with a correspondingloss in average distortion. The simplicity
`of the VQ decoder(it only requires a table lookup for the trans-
`mitted codebook index) and its bitrate-distortion performance
`makeit an attractive alternative for specialized applications. The
`complexity of the codebook search generally limits the use of
`VQ inreal-time applications. Vector quantizers have also been
`proposed for interframe, variable bitrate, and subband video
`compression methods [4].
`Three-dimensional wavelet encodingis a topic ofrecent inter-
`est. This video encoding methodis based on thediscrete wavelet
`transform methods discussed in Section 5.4. The wavelet trans-
`formisa relatively new transform that decomposes a signal into
`a multiresolution representation. The multiresolution decompo-
`sition makes the wavelet transform an excellent signal analysis
`tool because signal characteristics can be viewed in a variety of
`
`Handbook of Image and Video
`
`time-frequencyscales. The wavelet tran
`Sform js imple
`practice by the use of multiresolution o
`T subbandfly
`The waveletfilterbank is well suited fo
`T Video enco
`
`ofits ability to adapt to the multires
`olution charag
`video signals. Wavelet transform encodingsare nanyrang
`
`chicalin their time-frequency representation and
`able for progressive transmission[6]. Theyhaye also
`
`to possess excellent bitrate-distortion character;a
`Direct three-dimensional video compression, a ‘
`from a major drawback for real-time encoding a .e
`
`sion. In order to encode a sequence of images in a
`the sequence mustbe buffered. Thisintroduces a buff
`
`computational delay that can be verynoticeable int
`4
`J
`interactive video communications,
`
`Video compression techniquestreating visual info
`
`accordance with HVS models have recently been in
`
`These methodsare termed “second-generation or ab
`methods, and attemptto achieve verylarge compre
`by imitating the operations of the HVS. The HVS me,
`
`also be incorporated into moretraditional video com
`techniques by reflecting visual perception into varioy
`
`of the coding algorithm. HVS weightings have been d
`the DCT ACcoefficients quantizer used in the MPEGene
`
`A discussion of these techniques can be found in Chapte
`Digital video compressionis currently enjoyingtre!
`i
`
`growth,partially because of the great advances in V
`
`and microcomputertechnologyin the past decade. Th
`
`nature of video communications necessitates the use of
`
`purpose andspecialized high-performance hardwaredey
`the near future, advancesin design and manufacturingtech
`
`gies will create hardware devices that will allow greatera
`ability, interactivity, and interoperability of video appli
`These advances will challenge future video compressi
`nology to support format-free implementations.
`
`
`
`3 Video Compression Application
`Requirements
`
`
`A widevariety ofdigital video applications currently 60s
`
`range from simple low-resolution and low-bandwidth a
`tions (multimedia, PicturePhone) to very high-resolutior i
`
`high-bandwidth (HDTV) demands.This section will P
`quirementsof current and future digital video applical
`the demandstheyplace on the video compression syst
`
`As a way to demonstrate the importance of video:
`sion, the transmission ofdigital video television sight
`
`sented. The bandwidth required by a digital television
`approximately one-half the number of picture cleme
`
`
`els) displayed per second. The analogpixelsize m®
`dimensionis the distance between scanninglines8
`zontal dimensionis the distance the scanning spot ™°
`
`
`
`8
`
`
`
`asic Concepts and Techniques of Video Coding and the H.261 Standard
`i
`
`559
`
`
`
`B, = (cycles/frame)( Fr)
`= (cycles/line)(Nz)(Fr)
`(0.5) (aspect ratio)(Fr)(Ni)( Rx)
`084
`= (0.8)(Fr)(Mz)(Rx)s
`
`(3)
`
` )
`
`mental issue facing the design of video compression systems.
`To this end, it is important to fully characterize an applica-
`tion’s video communications requirements before designing or
`selecting an appropriate video compression system. Factorsthat
`should be consideredin the design and selection ofa video com-
`pression system include the following items.
`
`1. Video characteristics: video parameters such as the dy-
`namic range, sourcestatistics, pixel resolution, and noise
`content canaffect the performance of the compressionsys-
`tem.
`
`bs
`
`. Transmission requirements: transmission bitrate require-
`ments determine the power of the compression system.
`Very high transmission bandwidth, storage capacity, or
`quality requirements may necessitate lossless compression.
`Conversely, extremely low bitrate requirements may dic-
`tate compression systems that trade off image quality for
`a large compression ratio. Progressive transmissionis a key
`issue for selection of the compression system.It is gen-
`erally used when the transmission bandwidth exceeds the
`compressed video bandwidth. Progressive codingrefers to
`a multiresolution, hierarchical, or subband encoding of
`the video information. It allows for transmission and re-
`construction ofeachresolution independently from low to
`high resolution. In addition, channelerrors affect system
`performance and the quality of the reconstructed video.
`Channelerrors can affect the bit stream randomly or in
`burst fashion, The channelerror characteristics can have
`different effects on different encoders, and they can range
`from local to global anomalies. In general, transmission
`error correction codes (ECC)are used to mitigate the ef-
`fect of channelerrors, but awareness and knowledge ofthis
`issue is important.
`3. Compression system characteristics and performance: the
`nature of video applications makes many demandson the
`video compression system. Interactive video applications
`such as videoconferencing demand that the video com-
`pression systems have symmetric capabilities. Thatis, each
`participantin the interactive video session must have the
`same video encoding and decoding capabilities, and the
`system performance requirements must be met by both
`the encoder and decoder. In contrast, television broad-
`cast video has significantly greater performance require-
`ments at the transmitter because it has the responsibility
`of providing real-time high quality compressed video that
`meets the transmission channel capacity. Digital video sys-
`tem implementation requirements can vary significantly.
`Desktop televideo conferencing can be implemented by
`using software encoding and decoding, or it may require
`specialized hardware and transmission capabilities to pro-
`vide a high-quality performance. Thecharacteristics ofthe
`application will dictate the suitability of the video com-
`pression algorithm for particular system implementations.
`
`= system bandwidth,
`
`= numberof frames transmitted per second(fps),
`= numberof scanninglinesper frame,
`
`_= horizontal resolution (lines), proportional
`
`to pixel resolution.
`
`ional Television Systems Committee (NTSC) aspect ra-
`
`i
`/3, the constant0.5 is the ratio of the numberofcycles to
`
`mumberoflines, andthefactor 0.84is the fraction ofthe hor-
`anning interval that is devoted to signal transmission.
`
`NTSC transmission standard used for television broad-
`nthe United States has the following parameter values:
`
`= 29.97 fps, N, = 525 lines, and Ry = 340 lines. This
`video system bandwidth By of 4.2 MHz for the NTSC
`system. In order to transmita colordigital video signal,
`digital pixel format mustbe defined. Thedigital colorpixelis
`of three components: one luminance (Y) component oc-
`8 bits, and two color difference components (U and V)
`uiring 8 bits. The NTSC picture frame has 720 x 480 x 2
`
`inance and color pixels. In order to transmit this in-
`
`n for an NTSC broadcast system at 29.97 frames/s, the
`ing bandwidth is required:
`
`
`
`
`
`tal By ~ Ybitrate = ¥,(29.97 fps) x (24 bits/pixel)
`x (720 x 480 x 2 pixels/frame)
`= 249 MHz.
`
`
`tesents an increase of ~59 times the available system
`th, and ~41 timesthe full transmission channel band-
`) for current NTSC signals. HDTVpicture res-
`
`E-
`“€Quires up to three times more raw bandwidth than
`ple! (‘Two transmission channels totaling 12 MHz are
`
`Or terrestrial HDTV transmissions.) It is clear from
`
`F iple that terrestrial television broadcast systems will
`
`$¢ digital transmission and digital video compression
`
`e the Overall bitrate reduction and image quality re-
`
`or HDTVsignals,
`
`ple not only points outthe significant bandwidth
`
`nent
`$ for digital video information, but also indirectly
`
`Ip
`the issue of digital video quality requirements. The
`®
`“tweenbitrate and quality or distortion is a funda-
`
`
`
`9
`
`
`
`560
`
`Handbook of Image and Video p
`
`The importance of the encoder and system implementa-
`tion decision cannot be overstated; system architectures
`and performancecapabilities are changing at a rapid pace
`and the choice ofthe best solution requirescareful analysis
`ofthe all possible system and encoderalternatives.
`4. Rate-distortion requirements: the rate-distortion require-
`mentis a basic consideration in the selection of the video
`encoder. The video encoder must be able to provide the
`bitrate(s) and video fidelity (or range of video fidelity)
`required by the application. Otherwise, any aspect of the
`system maynot meetspecifications. For example,if the bi-
`trate specification is exceeded in order to support a lower
`MSE,a larger than expected transmission error rate may
`cause a catastrophic system failure.
`5. Standards requirements: video encoder compatibility with
`existing and future standards is an important considera-
`tion if the digital video system is required to interoperate
`with existing orfuture systems. A good exampleis that ofa
`desktop videoconferencing application supporting a num-
`ber of legacy video compression standards. Thisresults in
`requiring support ofthe older video encoding standards on
`new equipment designed for a newer incompatible stan-
`dard. Videoconferencing equipment not supporting the
`old standards would not becapable or as capable to work
`in environments supporting older standards.
`
`These factors are displayed in Table 1 to demonstrate video
`compressionsystem requirements for some commonvideo com-
`munications applications. The video compression system de-
`signer at a minimum should consider these factors in making
`a determination aboutthe choice of video encoding algorithms
`and technology to implement.
`
`
`
`4 Digital Video Signals and Formats |
`
`Video compression techniques make use of signal m,
`order to be able to utilize the body of digital sigiaill
`
`sis/processing theory and techniques that have been ; a
`over the past fifty or so years. The design of a video ¢,
`sion system,as represented bythe generalized mode] in
`
`in Section 2, requires a knowledge ofthe signal chara
`
`and the digital processesthat are used to create the digital
`signal. It is also highly desirable to understand vide
`systems, and the behavior of the HVS.
`
`4.1 Sampling of Analog Video Signals
`
`Digital video information is generated by sampling
`sity of the original continuous analog video signal
`in three dimensions. The spatial componentofthe yi
`
`nal is sampled in the horizontal and vertical dimension
`
`and the temporal componentis sampledin the time dim
`(1). This generatesa series ofdigital images or image
`
`I(1, j, k). Video signals that contain colorized informa
`
`whoseintensities are likewise sampled in three dimensj
`
`sampling process inherently quantizes the video signal di
`the digital word precision usedto representtheintensityvt
`
`
`video representation can be reproduced witharbitrary cl
`to the original analog video signal. The topic ofvideo:sam}
`andinterpolation is discussed in Chapter7.2.
`
`pling theorem. This theorem defines the conditions und
`
`
`TABLE 1 Digital video application requirements
`
`
`
` Application
`
`Bitrate Req.
`
`Distortion Req.
`
`Networkvideo
`on demand
`
`1.5 Mbps
`10 Mbps
`
`High
`medium
`
`Video phone
`
`64 Kbps
`
`High distortion
`
`Desktop multimedia
`video CDROM
`
`1.5 Mbps
`
`High distortion
`to medium
`
`10 Mbps
`
`1.5 Mbps
`
`Medium
`distortion
`High distortion
`
`Desktop LAN
`videoconference
`Desktop WAN
`videoconference
`
`Desktop dial-up
`videoconference
`Digitalsatellite
`television
`
`HDTV
`
`DVD
`
`Transmission Req.
`Internet
`100 Mbps
`LAN
`ISDN p x 64
`
`PC channel
`
`Computational Req.
`MPEG-1
`MPEG-2
`
`H.261 encoder
`H.261 decoder
`MPEG-1 decoder
`
`Fast ethernet
`100 Mbps
`Ethernet
`
`Hardware decoders
`
`Hardware decoders
`
`Standards Req.
`MPEG-1
`MPEG-2
`MPEG-7
`H.261
`
`MPEG-1
`MPEG-2
`MPEG-7
`MPEG-2,
`H.261
`MPEG-1,
`MPEG-4,
`H.263
`MPEG-4,
`H.263
`MPEG-2
`
`Software decoder
`
`MPEG-2 decoder
`
`MPEG-2 decoder
`
`MPEG-2
`
`MPEG-2 decoder
`
`MPEG-2
`
`64 Kbps
`
`10 Mbps
`
`Very high
`distortion
`Low distortion
`
`20 Mbps
`
`Low distortion
`
`20 Mbps
`
`Lowdistortion
`
`POTS and
`internet
`Fixed service
`satellites
`12-MHz
`terrestrial link
`PC channel
`
`10
`
`10
`
`
`
`
`
`yBasic CO
`
`noepts and Techniques of Video Coding and the H.261 Standard
`
`561
`
`ILI
`
`- fp
`
`Se
`
`0
`(a)
`
`tL, 1
`
`f
`
`-f
`
`-3/2fr
`
`-f;
`
`“af,
`
`-fo
`
`te Wf,
`
`Si
`
`32 f,
`
`f
`
`0
`(b)
`
`UL; ]
`
`“f
`
`32f-f-fe
`
`fe
`
`f,
`
`32F
`
`0
`(c)
`
`i
`
`FIGURE 2 Nyquist sampling theorem, with magnitudesofFourierspectrafor (a) input !; (b) sampled
`input J,, with f, > 2 fg; (c) sampled input /,, with f; < 2 fp.
`
`
`
`
`
`
`analog signals can be “perfectly” reconstructed. Ifthese
`M$ are not met, the resulting digital signal will contain
`OMponents which introduce artifacts into the recon-
`
`The Nyquist conditions are depicted graphically for
`ensional case in Fig.2.
`
`dimensional signal / is sampled atrate f;. It is band-
`
`‘85 are all real-world signals) in the frequency domain
`Per frequency boundoffs. According to the Nyquist
`
`orem, ifa bandlimitedsignal is sampled, the result-
`
`“Spectrumis made up oftheoriginal signal spectrum
`
`fplicates of the original spectrum spaced at integer
`
`f the sampling frequency f;. Diagram (a) in Fig. 2
`
`"€ magnitude |L| of the Fourier spectrum for /. The
`ME of the Fourier spectrum |L,| for the sampled sig-
`
`own for two cases. Diagram (b) presents the case
`
`ce original signal | can be reconstructed by recovering
`
`7 sPectral island. Diagram (c) displays the case where
`Sampling criteria has not been met and spectral
`
`8. The spectral overlap is termedaliasing and oc-
`i p< 2fz.When f, > 2fg,the original signal can be
`
`poe
`by using a low-pass digital filter whose passband
`
`is designed to recover |L{, These relationships provide a basic
`framework for the analysis and design ofdigital signal process-
`ing systems.
`Two-dimensional or spatial sampling is a simple extension of
`the one-dimensional case. The Nyquist criteria has to be obeyed
`in both dimensions;i.e., the samplingrate in the horizontal direc-
`tion must be two timesgreater than the upper frequency bound
`in the horizontal direction, and the samplingratein thevertical
`direction must be two times greater than the upper frequency
`boundin thevertical direction.In practice, spatial sampling grids
`are square so that an equal numberofsamples per unit length in
`each direction are collected. Charge coupled devices (CCDs) are
`typically used to spatially sample analog imagery andvideo, The
`sampling grid spacing of these devices is more than sufficient
`to meet the Nyquist criteria for most resolution and applica-
`tion requirements.Theelectrical characteristics of CCDs have a
`greater effect on the image orvideo quality thanits samplinggrid
`size.
`Temporal samplingofvideosignals is accomplished by captur-
`ing a spatial or image frame in the time dimension. The temporal
`samples are captured at a uniform rate of ~60fields/s for NISC
`
`11
`
`
`
`
`
`11
`
`
`
`562,
`
`television and 24 fps fora motionfilmrecording. These sampling
`rates are significantly smaller than the spatial samplingrate. The
`maximumtemporalfrequency that can be reconstructed accord-
`ing to the Nyquist frequencycriteria is 30 Hz in the caseoftelevi-
`sion broadcast. Therefore any rapidintensity change (caused, for
`instance, by a moving edge) between two successive frameswill
`cause aliasing because the harmonic frequency content of sucha
`steplike function exceeds the Nyquist frequency. Temporalalias-
`ing ofthis kind can begreatly mitigated in CCDsby the use of
`low-pass temporalfiltering to remove the high-frequency con-
`tent. Photoconductor storage tubes are used