`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DIGITAL.
`
`VIDEO: AN
`
`INTRODUCTION
`
`a) MPEG-e
`
`Barry G. Haskell
`Head, Image Processing Research Department, AT&T Labs
`Atul Puri
`Principal Member. of. Technical Staff,
`Image. Processing Research Department, AT&T: Labs
`Arun N. Netravali
`Vice President of Research, Bell Labs,
`Lucent Technologies
`
`VIMEO/IAC EXHIBIT 1025
`VIMEO ETAL., v. BT, IPR2019-00833
`
`VIMEO/IAC EXHIBIT 1025
`VIMEO ET AL., v. BT, IPR2019-00833
`
`
`
`Cover design: Curtis Tow Graphics
`
`Copyright© 1997 by Chapman & Hall
`
`Printed in the United States of America
`
`Chapman & Hall
`115 Fifth Avenue
`New York, NY 10003
`
`Thomas Nelson Australia
`102 Dodds Street
`South Melbourne, 3205
`Victoria, Australia
`
`Chapman & Hall
`2-6 Boundary Row
`London SE1 8HN
`England
`
`Chapman & Hall GmbH
`Postfach 100 263
`D-69442 Weinheim
`Germany
`
`International Thomson Editores
`Campos Eliseos 385, Piso 7
`Col. Polanco
`11560 Mexico D.F
`Mexico
`
`International Thomson Publishing-Japan
`Hirakawacho-cho K yowa Building, 3F
`1-2-1 Hirakawacho-cho
`Chiyoda-ku, 102 Tokyo
`Japan
`
`International Thomson Publishing Asia
`221 Henderson Road #05-10
`Henderson Building
`Singapore 0315
`
`All rights reserved. No part of this book covered by the copyright hereon may be reproduced or
`used in any form or by any means-graphic, electronic, or mechanical, including photocopying,
`recording, taping, or information storage and retrieval systems-without the written permission of
`the publisher.
`
`1 2 3 4 5 6 7 8 9 10 XXX 01 00 99 98 97
`
`Library of Congress Cataloging-in-Publication Data
`
`Haskell, Barry G.
`Digital video : an introduction to MPEG-2 / Barry G, Haskell, Atul
`Puri, and Arun N. Netravali.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0-412-08411-2
`1. Digital video. 2. Video compression -- Standards.
`theory.
`I. Puri, Atul. II. Netravali, Arun N. III. Title.
`TK6680.5.H37 1996
`621.388'33--dc20
`
`3. Coding
`
`96-14018
`CIP
`
`British Library Cataloguing in Publication Data available
`
`"Digital Video: An Introduction to MPEG-2" is intended to present technically accurate and
`authoritative information from highly regarded sources. The publisher, editors, authors, advisors,
`and contributors have made every reasonable effort to ensure the accuracy of the information, but
`cannot assume responsibility for the accuracy of all information, or for the consequences of its use.
`
`To order this or any other Chapman & Hall book, please contact International Thomson
`Publishing, 7625 Empire Drive, Florence, KY 41042. Phone: (606) 525-6600 or 1-800-842-3636.
`Fax: (606) 525-7778. e-mail: order@chaphall.com.
`
`For a complete listing of Chapman & Hall titles, send your request to Chapman & Hall, Dept. BC,
`115 Fifth Avenue, New York, NY 10003,
`
`
`
`l
`
`8
`MPEG~2 Video Coding and Compression
`
`The MPEG group was chartered by the Interna(cid:173)
`tional Organization for Standardization (ISO) to
`standardize coded representations of video and
`audio suitable for digital storage and transmission
`media. Digital storage media include magnetic com(cid:173)
`puter disks, optical compact disk read-only-memory
`(CD-RO:i\11), digital audio tape (DAT), and so forth.
`Transmission media include telecommunications
`networks, home coaxial cable TV (CATV), and
`over-the-air broadcast. The group's goal has been to
`develop a generic coding standard that can be used in
`many digital video implementations. Thus, some
`applications will typically require further specifica(cid:173)
`tion and refinement. As of this writing, MPEG has
`produced two standards, known colloquially as
`MPEG-1 1 and MPEG-2 2. Work also continues on
`methods and applications for future coding stan(cid:173)
`dards and is known colloquially as MPEG-4.
`MPEG-1 is an International Standard1 for the
`coded representation of digital video and associat(cid:173)
`ed audio at bitrates up to about 1.5 l\!Ibits/s. Its
`official name is ISO/IEC 11172. If video is coded
`at about 1.1 Mb its/ s and stereo audio is coded at
`128 kbits/ s per channel, then the total audio/video
`digital signal will fit onto the CD-ROM bitrate of
`approximately 1.4 Mbits/s as well as the North
`American ISDN Primary Rate (23 B-channels) of
`1.47 Mbits/s. The specified bitrate of 1.5 Mbits/s
`
`is not a hard upper limit. In fact, MPEG-1 allows
`rates as high as 100 Mbits/ s. However, during the
`course of MPEG-1 algorithm development, coded
`image quality was optimized at a rate of 1.1
`Mbits/ s using progressive* scanned color images.
`Two Source Input Formats (SIFs) were used for opti(cid:173)
`mization6•8•10 of MPEG-1. One, corresponding to
`NTSC, was 352 pels, 240 lines, 29.97 frames/s.
`The other, corresponding to PAL, was 352 pels,
`2 88 line~, 2 5 frames/ s. SIF uses 2: 1 chrominance
`subsampling, both horizontally and vertically, in
`the same 4:2:0 format as H.261. ~
`Originally, MPEG-2 video2 (ISO/IEC 13818-2)
`was meant to code interlaced CCIR 601 video for a
`large number of consumer applications. One of the
`main differences between MPEG-1 and MPEG-2 is
`that MPEG-2 handles interlace efficiently: 5,9,11 ,12
`Since the picture resolution of CCIR 601 is about
`four times the SIF of MPEG-1, the bitrate chosen
`for MPEG-2 optimization was 4 l\!Ibits/s. However,
`MPEG-2 allows rates as high as 429 Gbits/s.
`
`8.1 MPEG-2 CHROMINANCE
`SAMPLING
`
`A bitrate of 4 Mbits/ s was deemed too low to
`enable high-quality transmission of every CCIR
`
`*The term Noninterlaced is also used for progressively scanned pictures.
`
`156
`
`
`
`601 chrominance sample. Thus, an MPEG-2 4:2:0
`format was defined to allow for 2: 1 vertical sub(cid:173)
`sampling of the chrominance, in addition to the
`normal 2: 1 horizontal chrominance subsampling
`of CCIR 601.
`For interlace, the temporal integrity of the 4:2:0
`chrominance samples mu~t be maintained. Thus,
`MPEG-2 normally defines the first, third, etc. rows
`of 4:2:0 chrominance CbCr samples to be from the
`same field as the first, third, etc. rows of luminance
`r samples. The second, fourth, etc. rows of
`chrominance CbCr samples are from the same field
`as the second, fourth, etc. rows of luminance Y
`samples. However, an override capability is also
`available to indicate that the 4:2:0 chrominance
`samples are all temporally the same as the tempo(cid:173)
`rally first field of the frame. The MPEG-2 4:2:0
`chrominance sampling is shown in Fig. 8.1.
`At higher bitrates the full 4:2:2 chrominance
`format7 of CCIR 601 may be used, in which the
`chrominance is subsampled 2: 1 horizontally only.
`In 4:2:2 video, the first luminance and chromi(cid:173)
`nance samples of each line are geometrically cosit-
`
`i'vIPEG-2 Video Coding and Compression
`
`157
`
`ed. MPEG-2 also allows for a 4:4:4 chrominance
`format,7 in which the luminance and chrominance
`samplings are identical.
`
`8.2 REQUIREMENTS OF THE
`MPEG-2 VIDEO STANDARD
`
`The main requirement of MPEG-2 video is that it
`should achieve the highest possible quality of the
`decoded video during normal play. In addition to
`obtaining excellent picture quality, we need to ran(cid:173)
`domly display any single frame -in the video
`stream. Also, the capability of performing fast
`searches directly on the video stream, both forward
`and backward, is extremely desirable if the storage
`medium has seek capabilities. It is also useful to be
`able to edit compressed video streams directly
`while maintaining decodability. Also, multipoint
`network communications may require the ability to
`communicate simultaneously with SIF and CCIR
`601 decoders. Communication over packet net(cid:173)
`works may require prioritization so that the net-
`
`X X X X X X X
` Q - - - -~ top field
`0
`0
`0
`pels
`X X1X X1X X1X
`___ -1- __ - _1 _ ____ L ___ _
`XX XX X X 1 X
`0
`0
`0
`0
`X X X X X X X
`------------
`x X X X I X
`0
`0
`0
`X X X X X
`
`bottom
`field pels
`
`II
`
`X
`
`X
`
`X represent luminance pels
`
`Q represent chrominance pels
`
`11
`
`Fig. 8, 1 MPEG-2 4:2:0 chrominance format for interlaced video. Chrominance is subsampled 2: 1 both horizontally and
`vertically. Alternate lines of chrominance are temporally aligned with alternate fields.
`
`
`
`158
`
`1WPEG-2 Video Coding and Compression
`
`work can drop low-priority packets in case of con(cid:173)
`gestion. Broadcasters may wish to send a progres(cid:173)
`sive scanned HDTV program to CCIR 601 inter(cid:173)
`lace receivers as well as to progressive HDTV
`receivers.
`To satisfy all these requirements MPEG-2 has
`defined a large number of capabilities. However,
`not all applications will require all the features of
`MPEG-2. Thus,
`to promote
`interoperability
`among applications, MPEG-2 has designated sev(cid:173)
`eral sets of constrained parameters using a two(cid:173)
`dimensional rank ordering. One of the dimen(cid:173)
`sions, called Prqfile, specifies the coding features
`supported. The other dimension, called Level,
`specifies the picture resolutions, bitrates, and so
`forth that can be handled. A number of Profile(cid:173)
`Level combinations have been defined (see Chap(cid:173)
`ter 11 ), the most important of which is called Main
`Prqfile at Jvlain Level, or MP@ML for short. Para(cid:173)
`meter constraints for MP@ML are shown in
`Table 8.1.
`
`8.3 MAIN PROFILE ALGORITHM
`OVERVIEW
`
`Uncompressed digital video requires an extremely
`high transmission bandwidth. Digitized NTSC res(cid:173)
`olution video, for example, has a bitrate of approx(cid:173)
`imately 100 :tvibits/ s. With digital video, compres(cid:173)
`sion is necessary to reduce the bitrate to suit most
`applications. The required degree of compression
`is achieved by exploiting the spatial and ,temporal
`
`Table 8, 1 Parameter Bounds for MPEG-2 Main Profile at
`Main Level (Jl,,IP@l\tIL) Video Streams.
`
`Parameter
`
`Samples/line
`Lines/frame
`Frames/ second
`Samples/ second
`Bitrate
`Buffer size
`Chroma format
`Image aspect ratio
`
`Bound
`
`720
`576
`30
`10 368 000
`15 Mbits/s
`1 835 008 bits
`4:2:0
`4:3, 16:9 and square pels
`
`redundancy present in a video signal. However, the
`compression process is inherently lossy, and the sig(cid:173)
`nal reconstructed from
`the compressed video
`stream is not identical to the input video signal,
`Compression sometimes causes some visible arti(cid:173)
`facts in the decoded pictures.
`For progressive scanned video there is very little
`difference between MPEG-1 and MPEG-2 com(cid:173)
`pression capabilities. However, interlace presents
`complications in removing both types of redun(cid:173)
`dancy, and many features have been added to deal
`specifically with it.
`As we saw in Chapter 7, MPEG-2 specifies a
`choice of two picture structures. Field-pictures con(cid:173)
`sist of fields that are coded independently. In
`Frame-pictures, on the other hand, field pairs are
`merged into frames before coding. MPEG-2
`requires interlace to be displayed as alternate top
`and bottom fields.* However, either field can be
`displayed first within a frame. The Main Profile
`allows only 4:2:0 chrominance sampling,
`Both spatial and temporal redundancy reduc(cid:173)
`tions are needed for the high-compression require(cid:173)
`ments of MPEG-2. Many techniques used by
`MPEG-2 have been described in previous chapters,
`
`8.3. l Exploiting Spatial
`Redundancy
`As in JPEG and H.261, 3 the MPEG-1 and
`MPEG-2 video-coding algorithms employ a Block(cid:173)
`based two-dimensional Discrete Cosine Transform
`(DCT). A picture is first divided into 8 X 8 Blocks
`of pels. The two-dimensional DCT is then applied
`independently on each Block. This operation
`results in an 8 X 8 Block of DCT coefficients in
`which most of the energy in the original (pel) Block
`is typically concentrated in a few low-frequency
`coefficients. The coefficients of each Block may be
`scanned and transmitted in the same zigzag order
`as JPEG and H.261 (see Fig. 8.2a),
`The main effect of interlace in Frame-pictures
`is that since adjacent scan lines come from differ(cid:173)
`ent fields, vertical correlation is reduced when
`there is motion in the scene. MPEG-2 provides two
`features for dealing with this.
`
`*The top field contains the top line of the frame. The bottom field contains the second (and bottom) line of the frame.
`
`
`
`1vIPEG-2 Video Coding and Compression
`
`159
`
`(a) Zigzag Scan
`
`{b) Alternate Scan
`
`Fig, 8.2 Two methods for scanning DCT coefficients are available in MPEG-2. The zigzag order (a) is used in JPEG, H.261, and
`MPEG-1. The alternate scan (b) often gives better compression for interlaced video when there is significant motion.
`
`First, with reduced vertical correlation, the
`zigzag scanning order for DCT coefficients shown
`in Fig. 8.2a may not be optimum. Thus, MPEG-2
`has an Alternate_Scan, shown in Fig. 8.2b, that may
`be specified by the encoder on a picture-by-picture
`basis to allow the significant bottom-left frequen(cid:173)
`cies to be sent earlier.
`Second, a capability for field_DCT coding with(cid:173)
`in a Frame-picture Macroblock (NIB) is provided.
`That is, just prior to performing the DCT, the
`encoder may reorder the luminance lines within a
`MB so that the first eight lines come from the top
`field, and the last eight lines come from the bottom
`field. This reordering is undone after the IDCT in
`the encoder and the decoder. The effect of this
`reordering is to increase the vertical correlation
`within the luminance blocks and thus increase the
`energy compaction in the DCT domain. Again, a
`· comparison of the sum of absolute vertical line dif(cid:173)
`ferences is often sufficient for deciding when to use
`field_DCT coding and when to use frame_DCT
`coding in a MB. Chrominance MBs are not
`reordered in Main Profile field_DCT coding.
`After scanning, .a quantizer is applied to the
`DCT coefficients, which results in many of them
`being set to zero. As withJPEG, a different quanti(cid:173)
`zation step size may be applied to each DCT coef(cid:173)
`ficient. This is specified by a Qyantizer lvlatrix that is
`sent in. the video stream. This quantization is
`responsible for the lossy nature of the compression
`algorithms in JPEG, H.261, and MPEG video.
`
`Qompression is achieved by transmitting only the
`nonzero quantized coefficients and by entropy(cid:173)
`coding their locations and amplitudes.
`
`8.3.2 Exploiting Temporal
`Redundancy
`Temporal redundancy results from similarity
`between adjacent pictures. MPEG-2 exploits this
`redundancy by computing and transmitting an
`interframe difference signal called the prediction
`error. In computing the prediction error, the tech(cid:173)
`nique of Macroblock (MB) motion compensation
`is employed to correct for motion, as described in
`Section 7 .1.2. Pictures coded using Forward Pre(cid:173)
`diction are called P-pictures. Pictures coded using
`Bidirectional Prediction are called B-pictures.
`Pictures that are Bidirectionally predicted are
`never themselves used as Reference pictures, that
`is, Reference pictures for B-pictures must be either
`P-pictures or I-pictures. Similarly, Reference pic(cid:173)
`tures for P-pictures must also be either P-pictures
`or I-pictures.
`The positions of the best-matching Prediction
`lvlacroblocks are indicated by motion vectors that
`describe the displacement between them and the
`Target Macroblocks. The motion vector information
`is also encoded and transmitted along with the pre(cid:173)
`diction error.
`The prediction error itself is transmitted using
`the DCT-based intraframe encoding technique
`
`I ,.
`
`
`
`160
`
`1vlPEG-2 Video Coding and Compression
`
`compressed
`video data -----IIP--t._ _ _ B_u_f_f-er---1
`from channel
`_
`_
`
`I ~~
`
`I:
`
`de:~der
`
`At each picture time, the
`data for one picture is
`removed for decoding
`
`Fig, 8,3 Video Buffer Verifier (VBV). The buffer must never overflow, and underflow is allowed only under restricted circum(cid:173)
`stances. The buffer size is specified in the video stream. For MP@l\tIL the maximum allowable VBV buffer size is I 835 008 bits.
`
`summarized previously. In MPEG-2 video, as in
`H.261 and MPEG-1, the MB size is chosen to be
`16 X 16 pels, representing a reasonable tradeoff
`between the compression provided by motion com(cid:173)
`pensation and the cost of transmitting the motion
`vectors.
`Many encoder procedures are not specified by
`the standard, so that different algorithms may be
`employed at the encoder as long as the resulting
`video stream is consistent with the specified syntax.
`For example, the details of the motion estimation
`procedure are not part of the standard. Neither
`are the decision methods used to determine the
`various coding modes and structures. It is the
`responsibility of the encoder to decide which pre(cid:173)
`diction mode is best. Needless to say, this can be an
`extremely complex process if done with full opti(cid:173)
`mality. Thus, in practice numerous shortcuts are
`usually employed for economical implementation
`(see Section 7.3).
`
`8.3.3 Video Buffer Verifier
`
`As with H.261 and MPEG-1, MPEG-2 has one
`important encoder restriction, namely a limitation
`on the variation in bits/ picture, especially in the
`case of constant bitrate operation. This limitation
`is enforced through a Video Beffer Verifier (VBV),
`which corresponds to the Hypothetical Reference
`Decoder of H.261.
`The contents of the transmitted video stream
`must meet the requirements of the VBV, which is
`
`shown in Fig. 8.3. Data enter the VBV at a piece(cid:173)
`wise constant rate for each picture. At each picture
`decode time the data for one picture are removed
`instantaneously from the VBV buffer and decoded.
`At no time must the VBV buffer overflmv, and
`underflow is allowed only under restricted circum(cid:173)
`stances as described later.
`If the VBV input data rate is the same for each
`picture, then the video is said to be coded at Con(cid:173)
`stant Bitrate (CBR). Otherwise, the video is said to
`be coded at Variable Bitrate (VBR). The VBV video
`bitrate is not specified per se. What is specified
`instead is the amount of time, called vbv _delayJ *
`that the picture headert resides in the VBV before
`being extracted for decoding. Thus, the picture
`header for picture n arrives at Systems Time
`
`~n) = DTS(n) - vbv_delay(n)
`
`(8.1)
`
`where DTS(n) is the Systems Decoding Time
`Stamp for picture n. The VBV input bitrate for
`picture n is then given by
`
`R(n) = JVbits(n)/ [ ~n -j:- 1) - t(n)]
`
`(8.2)
`
`where ..Nbits(n) is the number of bits+ in picture n.
`Of course, the actual instantaneous transmission
`bitrate after Systems packetization and multiplex(cid:173)
`ing may be very bursty and far from this value.
`In normal operation, the interval between pic(cid:173)
`ture decoding times is determined by the specified
`picture
`rate and structure. However, during
`
`*See Section 10.2.8 for the specification of vbv_delay.
`tActually the last byte of the Picture Start Code.
`+More precisely, the number of bits up to the next Picture Start-Code.
`
`
`
`low_delay operation, a departure from
`is
`this
`allowed in case the encoder requires a very large
`number of bits to code a particular picture. Such a
`picture is called a Big Picture. In this case the VBV
`decoder must wait longer than one picture inter(cid:173)
`val, perhaps several picture intervals, for all the bits
`of the Big Picture to enter the VBV buffer before
`instantaneous decoding can take place.
`For Big Pictures, the values of vbv _delay and
`DTS may not be correct, since most encoders are
`well into the coding of a picture before they realize
`it is a Big Picture. However, Eqs. (8.1) and (8.2) are
`still valid for most encodings of Big Pictures.
`Note that this definition of VBV allows for
`varying the delay during the video coding. If the
`encoder requires a relatively large number of bits
`for a Big Picture, the decoder display must repeat
`the previously decoded picture a few times while
`waiting for the arrival of those bits, thus increas(cid:173)
`ing the delay between camera input and display
`output.
`If the encoder then resumes normal operation
`with the frame following the Big Picture, the total
`delay will remain at the new higher value. Howev(cid:173)
`er, in many applications the users would like the
`delay to be reduced to its lower nominal value as
`soon as possible. Thus, to reduce the delay, the
`encoder normally skips a few frames following the
`Big Picture and does not send them.
`
`8.4 OVERVIEW OF THE MPEG-2
`VIDEO STREAM SYNTAX
`
`The MPEG-2 video standard specifies the syntax
`and semantics of the compressed video stream pro(cid:173)
`duced by the video encoder. The standard also
`specifies how this video stream is to be parsed and
`decoded to produce a decompressed video signal.
`Most of MPEG-2 2 consists of additions to MPEG-
`1. 1 However, unlike MPEG-1, Big Pictures, as in
`H.261, are allowed.
`The video stream syntax is flexible to support
`the variety of applications envisaged for
`the
`MPEG-2 video standard. To this end, the overall
`
`j\,fPEG-2 Video Coding and Compression
`
`161
`
`Table 8,2
`
`Si."X Headers of MPEG-2 Video Stream Syntax
`
`Syntax Header
`
`Functionality
`
`Sequence
`GrouJ5 of Pictures
`Picture
`Slice
`
`Macro block
`Block
`
`Definition of entire video sequence
`Enables random access in video stream
`Primary coding unit
`Resynchronization, refresh, and error
`recovery
`Motion compensation unit
`Transform and compression unit
`
`syntax is constructed in a hierarchy of several
`Headers, each performing a different logical func(cid:173)
`tion. The different Headers in the syntax and their
`use are illustrated in Table 8.2.
`
`8.4. l Video Sequence Header
`The Video Sequence Header and its extensions
`contains basic parameters such as the size of the
`coded video pictures, size of the displayed video
`pictures if different, Image Aspect Ratio (IAR),
`picture rate, maximum bitrate (Rma:J, VBV buffer
`size, low _delay indication, Profile and Level identi(cid:173)
`fication, Interlace or Progressive sequence indica(cid:173)
`tion, private user data, plus certain other global
`parameters.
`This Header also allows for the optional trans(cid:173)
`mission of JPEG style Quantizer Matrices, one for
`Intra-coded MBs and one for Nonintra coded
`MBs. * Unlike JPEG, if one or both quantizer
`matrices are not sent, default values are defined.
`These are shown in Fig. 8.4.
`Private user data can also be sent in the Sequence
`Header extension as long as they do not contain a
`Start Code Prefix (Psc), which MPEG-2 defines as a
`string of 23 or more binary zeros followed by a bi(cid:173)
`nary one.
`
`1
`
`8.4.2 Group of Pictures (GOP)
`Header
`Below the Video Sequence Header is the Group
`qf Pictures (GOP) Header, which provides support
`for random access, fast search, and editing. The
`GOP Header contains a time code (hours, min(cid:173)
`utes, seconds, frames) used by certain recording
`
`*With 4:2:2 video, two additional quantizer matrices are sent for Intra chrominance and Nonlntra chrominance.
`
`- - - - --- - - - - - • -
`
`- - - ----- - -~ -
`
`
`
`162
`
`j\;JPEG-2 Video Coding and Compression
`
`8 16 19 22 26 27 29 34
`
`16 16 22 24 47 49 34 37
`
`19 22 26 27 29 34 34 38
`
`22 22 26 27 29 34 37 40
`
`22 26 27 29 32 35 40 48
`
`26 27 29 32 35 40 48 58
`
`26 27 29 34 38 46 56 69
`
`27 29 35 38 46 56 69 83
`Intra
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`
`16 16 16 16 16 16 16 16
`Inter
`
`Fig, 8.4 Default Quantizer Matrices for Intra and Nonintra
`Pictures.
`
`devices. It also contains editing flags to indicate
`whether the B-pictures following the first I-picture
`of the GOP can be decoded following a random
`access.
`In MPEG, a sequence of transmitted video pic(cid:173)
`tures is typically divided into a series of GOPs,
`where each GOP begins with an Intra-coded pic(cid:173)
`ture (I-picture) followed by an arrangement of For-
`
`ward Predictive-coded pictures (P-pictures) and
`Bidirectionally Predicted pictures (B-pictures).*
`Figure 8.5 shows examples of MPEG GOPs.
`The top GOP is comprised of pictures O to 14,
`and since there are no B-pictures the encoding/
`transmission order is the same as the camera/
`display order.
`The bottom GOP contains pictures 1 to 12,
`consisting of one I-picture, three P-pictures, and
`eight B-pictures. The encoding/transmission order
`of the pictures in this GOP is shown at the bottom
`of Fig. 8.5. B-pictures 1 and 2 are encoded after I(cid:173)
`picture 3, using P-picture O and I-picture 3 as ref(cid:173)
`erence. Note that B-pictures 13 and 14 are part of
`the next GOP because they are encoded after I(cid:173)
`picture 15.
`Pictures are displayed in their camera order 0,
`1, 2, 3, 4 ... If B-pictures are to appear in the
`sequence, then a Reordering Picture Delqy must be
`used for all I- and P-pictures to produce the correct
`display order, as shown in Fig. 8.6.
`Random access and fast search are enabled by
`the availability of the I-pictures, which can be
`decoded independently and serve as starting points
`for further decoding. The MPEG-2 video standard
`allows GOPs to be of arbitrary structure and
`length. The GOP Header may be used as the basic
`unit for editing an MPEG-2 video stream.
`
`8.4.3 Picture Header
`Below the GOP is the Picture Header; which c~n(cid:173)
`tains the type of picture that is present, for exam(cid:173)
`ple, I, P, or B, as well as a Temporal Reference indicat(cid:173)
`ing the position of the picture in camera/ display
`order within the GOP.t It also contains the para(cid:173)
`meter vbv_delay that indicates how long to wait
`after a random access before starting to decode.
`Without this information, a decoder buffer could
`underflow or overflow following a random access.
`Within the Picture Header several picture cod(cid:173)
`ing extensions are allowed. For example, the quan(cid:173)
`tization accuracy of the Intra DC coefficients may
`be increased from the 8 bits of MPEG-1 to as
`much as 10 bits for MP@ML. A 3:2 pulldown flag,
`
`*A GOP usually contains only one I-picture. However, more than one are allowed,
`tThe Temporal Reference is reset to zero in the first picture to be displayed of each GOP.
`
`i"
`
`I
`
`I
`
`i ~
`r
`
`I
`I,
`
`
`
`Group of Pictures with an I-Picture every N
`Pictures and a P-Picture every M Pictures
`
`14 I
`:
`1 o
`a
`s
`1-3
`12
`11
`9
`5
`4
`3
`2
`1
`I
`i--_.,.__ _____ Display and Encoding Order (GOP= Pictures Oto 14)----J
`
`15
`
`0
`
`4
`3
`5
`2
`6
`8
`7
`12 10
`9
`11
`,_ _ _ _ _ _ Encoding Order (GOP = Pictures 1 to 12) ---..i
`
`15 13 14
`
`Fig. 8,5 Examples of MPEG Group of pictures.
`
`Video
`_....,... Decoder 1------c,....,,...
`
`Coded
`Video from
`Channel
`
`I-and P(cid:173)
`Pictures
`
`/
`
`Reordering
`Picture
`Delay ~ to Display
`00---------11!1--o
`Decoded
`\
`Pictures
`
`B-Plctures
`
`Fig, 8,6 Decoder for MPEG video stream containing I-, P-, and B-pictures. B-pictures are displayed immediately through the
`bottom path, whereas each decoded I- or P-picture passes first via the top path to a Reordering Picture Delay, to await display after
`the ensuing B-pictures. The switches are either both up or both down.
`
`I
`
`!'
`
`163
`
`
`
`164
`
`1vIPEG-2 Video Coding and Compression
`
`called repeatJirstJieldJ indicates, for Frame(cid:173)
`pictures, that the first field of the picture should be
`displayed one more time following the display of
`the second field, An alternative scan to the DCT
`zigzag scan may be specified. Also, the presence of
`error concealment motion vectors in I-pictures
`may be indicated. Other information includes Pic(cid:173)
`ture Structure (field or frame), field temporal order
`(for Frame-pictures), progressive frame indicator,
`and information for reconstruction of a composite
`NTSC or PAL analog waveform,
`Within the Picture Header a picture display
`extension allows for the position of a display rectangle
`to be defined for each picture. This feature is use(cid:173)
`ful, for example, when coded pictures having IAR
`16:9 are to be also received by conventional TVs
`having IAR 4:3. This capability is also known as
`Pan and Scan,
`
`8.4.4 Slice Header
`A Slice is a string of consecutive MBs of arbi(cid:173)
`trary length running from left to right across the
`picture, for example, as shown in Fig. 8, 7, In I -
`pictures, all MBs are transmitted. In P-pictures and
`B-pictures, typically some MBs of a slice are trans-
`
`mitted and some are not, that is, they are skipped,
`However, the first and last MBs of a Slice must
`always be transmitted. A Slice is not allowed to
`extend beyond the right edge of the picture, and
`Slices must not overlap.
`The Slice Header is intended to be used for resyn(cid:173)
`chronization in the event of transmission bit
`errors. It is the responsibility of the encoder to
`choose the length of each Slice depending on the
`expected bit error conditions, Prediction registers
`used in the differential encoding of motion vectors
`and DC Intra coefficients are reset at the start of a
`Slice.
`The Slice Header contains the vertical position
`of the Slice within the picture, as well as a quan(cid:173)
`tizer _scale_code parameter used to define the
`quantizer step size until such time as a new step
`size is optionally sent at the MB level. The Slice
`Header may also contain an indicator for Slices
`that contain only Intra MBs. These may be used in
`certain fast forward and fast reverse display appli(cid:173)
`cations.
`All Profiles defmed so far have the restricted Slice
`structure, for which all MBs in the picture must belong
`to a Slice, that is, the Slices cover the entire picture
`with no gaps in between, as shown in Fig. 8. 7,
`
`I
`
`I
`
`I
`
`I
`
`I
`I
`
`Fig. 8, 7 Possible arrangement of Slices in which slice lengths vary throughout the picture, In MPEG-2 the left edge of the
`picture always starts a new slice,
`
`
`
`8.4.5 Macroblock Header
`The MacroBlock (MB) is the 16 X 16 motion
`compensation unit, and each MB begins with a
`J\;JacroBlock Header. For the first MB of each Slice,
`the horizontal position with respect to the left edge
`of the picture (in MBs) is coded using the »iac(cid:173)
`roblock_address-incre»ient VLC shown in
`Table 8.3. The positions of additional transmitted
`MBs are then coded differentially with respect to
`the most recently transmitted MB, also using the
`macroblock_address-increment VLC.
`In P-pictures, skipped MBs are assumed Nonin(cid:173)
`tra with zero DCT coefficients and zero motion
`vectors. In B-pictures, skipped MBs are assumed
`Nonintra with zero DCT coefficients and motion
`vectors the same as the previous MB, which cannot
`be Intra.
`Also included in the MacroBlock Header are
`Macroblock_type
`(Intra, Nonintra,
`etc.),
`Motion Vector Type, DCT_type (field_DCT or
`frame_DCT), quantizer _scale_code, motion
`vectors, and a coded block pattern indicating
`which blocks in the MB are coded. As with other
`Headers, many parameters may or may not be
`present, depending on Macroblock_type, as shown
`in Fig. 8.8. The VLC for coded block pattern,
`which is present if macroblock_pattern = 1, is giv(cid:173)
`en for Main Profile in Fig. 8.9.
`MPEG-2 has many more MB Types than
`MPEG-1, owing to the additional features provid(cid:173)
`ed as ,well as to the complexities of coding inter(cid:173)
`laced video. Some of these are discussed later.
`
`8.4.6 Block
`A Block consists of the data for the quantized
`DCT coefficients of an 8 X 8 Block in the MB. It
`is VLC coded as described in the next Sections.
`MP@ML has six blocks per MB. For noncoded
`Blocks, the DCT coefficients are assumed to be
`zero.
`
`8.4.6. 1 Quantization of DCT
`Coefficients
`
`As with JPEG, Intra-coded blocks have their
`DC coefficients coded differentially with respect to
`
`* A step size 1 is also available in the High Profile.
`
`MPEG-2 Video Coding and Compression
`
`165
`
`Table 8.3 Macroblock_address-increment Variable
`Length Code (VLC). The escape value is used for
`addresses larger than 33. N*O means N zeros.
`
`VLC
`Codeword
`
`011
`010
`OOll
`0010
`0001 1
`0001 0
`4*0 111
`4*0 110
`4*0 1011
`4*0 1010
`4*0 1001
`4*0 1000
`5*0 111
`5*0 110
`5*0 101 11
`5*0 101 10
`
`macro block_
`address-
`increment
`
`I
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`
`VLC
`Codeword
`
`5*0 101 01
`5*0 IOI 00
`5*0 100 11
`5*0 100 10
`5*0 100 011
`5*0100010
`5*0 100 001
`5*0 100 000
`6*0 11111
`6*0 11110
`6*011101
`6*0 11 100
`6*0 11 011
`6*0 11 010
`6*011001
`6*0 11 000
`7*0 1 000
`
`macro block_
`address-
`increment
`
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`escape_word
`
`the previous block of the same YCbCr type, unless
`the previous block is Nonintra, belongs to a
`skipped MB, or belongs to another Slice. In any of
`these cases the prediction value is reset to the
`midrange value of 1024. The range of unquan(cid:173)
`tized Intra DC coefficients·is 0 to 8 X 255, which
`mean