`
`MPEG-1 AND MPEG-2 Video Standards
`
`Supavadee Aramvith and Ming-Ting Sun
`
`Information Processing Laboratory, Department of Electrical Engineering, Box 352500
`
`University of Washington, Seattle, Washington 98195-2500
`
`{supava,sun } @ee.washington.edu
`
`1. MPEG-1 Video Coding Standard
`
`1.1 Introduction
`
`1.1.1 Background and structure of MPEG-1 standards activities
`
`The development of digital video technology in the 19805 has made it possible to use digital video compression in
`
`various kinds of applications. The effort to develop standards for coded representation of moving pictures, audio, and
`
`their combination is carried out in the Moving Picture Experts Group (MPEG). MPEG is a group formed under the
`
`auspices of the International Organization for Standardization (ISO) and the International Electrotechnical Commission
`
`(IEC).
`
`It operates in the framework of the Joint ISO/IEC Technical Committee 1 (JTC l) on Information Technology,
`
`which was formally Working Group 11 (WGl 1) of Sub—Committee 29 (SC29). The premise is to set the standard for
`
`coding moving pictures and the associated audio for digital storage media at about 1.5 Mbit/s so that a movie can be
`
`compressed and stored in a CD-ROM (Compact Disc — Read Only Memory). The resultant standard is the
`
`international standard for moving picture compression, ISO/IEC 11172 or MPEG-1 (Moving Picture Experts Group -
`
`Phase 1). MPEG-1 standards consist of 5 parts, including: Systems (11172-l), Video (11172-2), Audio (11172-3),
`
`Conformance Testing (11172-4), and Software Simulation (11172-5).
`
`In this chapter, we will focus only on the video
`
`part.
`
`The activity of the MPEG committee started in 1988 based on the work of ISO IPEG (Joint Photographic Experts
`
`Group) [1] and CCITT Recommendation H.261: “Video Codec for Audiovisual Services at px64 kbits/s” [2]. Thus, the
`
`MPEG-1 standard has much in common with the JPEG and H.261 standards. The MPEG development methodology
`
`was similar to that of H.261 and was divided into three phases: Requirements, Competition, and Convergence [3]. The
`
`purpose of the Requirements phase is to precisely set the focus of the effort and determine the rule for the competition
`
`phase. The document of this phase is a “Proposal Package Description” [4] and a test methodology [5]. The next step
`
`is the competition phase in which the goal is to obtain state of the art technology fi‘om the best of academic and
`
`industrial research. The criteria are based on the technical merits and the trade-ofi~ between video quality and the cost
`
`of implementation of the ideas and the subjective test [5]. After the competition phase, various ideas and techniques
`
`1
`
`DISH 1034
`
`Sling TV v. Realtime
`|PR2018—01342
`
`1
`
`DISH 1034
`Sling TV v. Realtime
`IPR2018-01342
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`are integrated into one solution in the convergence phase. The solution results in a document called the simulation
`
`model. The simulation model implements, in some sort of programming language, the operation of a reference encoder
`
`and a decoder. The simulation model is used to carry out simulations to optimize the performance of the coding scheme
`
`[6]. A series of fiilly documented experiments called core experiments are then carried out. The MPEG committee
`
`reached the Committee Draft (CD) status in September 1990 and the Committee Draft (CD 11172) was approved in
`
`December 1991. International Standard (IS) 11172 for the first three parts was established in November 1992. The IS
`
`for the last two parts was finalized in November 1994.
`
`1.1.2 MPEG-1 target applications and requirements
`
`The MPEG standard is a generic standard, which means that it is not limited to a particular application. A variety of
`
`digital storage media applications of MPEG-1 have been proposed based on the assumptions that the acceptable video
`
`and audio quality can be obtained for a total bandwidth of about 1.5 Mbits/s. Typical storage media for these
`
`applications include CD-ROM, DAT (Digital Audio Tape), Winchester-type computer disks, and writable optical disks.
`
`The target applications are asymmetric applications where the compression process is performed once and the
`
`decompression process is required ofien. Examples of the asymmetric applications include video CD, video on
`
`demand, and video games.
`
`In these asymmetric applications, the encoding delay is not a concern. The encoders are
`
`needed only in small quantities while the decoders are needed in large volumes. Thus, the encoder complexity is not a
`
`concern while the decoder complexity needs to be low in order to result in low-cost decoders.
`
`The requirements for compressed video in digital storage media mandate several important features of the MPEG-1
`
`compression algorithm. The important features include normal playback, frame-based random access and editing of
`
`video, reverse playback, fast forward / reverse play, encoding high-resolution still frames, robustness to uncorrectable
`
`errors, etc. The applications also require MPEG—1 to support flexible picture-sizes and frame-rates. Another
`
`requirement is that the encoding process can be performed in reasonable speed using existing hardware technologies
`
`and the decoder can be implemented using small number of chips in low cost.
`
`Since MPEG-1 video coding algorithm is based heavily on H.261, in the following sections, we will focus only on
`those which are different fi‘om H.261.
`
`1.2 MPEG-1 Video Coding vs. H.261
`
`1.2.1 Bi—directional motion compensated prediction
`
`2
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`In H.261, only the previous video frame is used as the reference frame for the motion compensated prediction (forward
`
`prediction). MPEG-1 allows the fiiture flame to be used as the reference fi'ame for the motion compensated prediction
`
`(backward prediction), which can provide better prediction. For example, as shown in figure 1, if there are moving
`
`objects, and if only the forward prediction is used, there will be uncovered areas (such as the block behind the car in
`
`Frame N) for which we may not be able to find a good matching block from the previous reference picture (Frame N—
`
`1). On the other hand, the backward prediction can properly predict these uncovered areas since they are available in
`
`the 11.1th reference picture, i.e. fi'ame N+1 in this example. Also shown in the figure, if there are objects moving into
`
`the picture (the airplane in the figure), these new objects cannot be predicted from the previous picture, but can be
`
`predicted from the future picture.
`
`
`
`Frame N-‘I
`
`Frame N
`
`Frame N+1
`
`Figure 1: A video sequence showing the benefits of bi-directional prediction.
`
`1.2.2 Motion compensated prediction with half-pixel accuracy
`
`The motion estimation in H.261 is restricted to only integer-pixel accuracy. However, a moving object ofien moves to
`
`a position which is not on the pixel-grid but between the pixels. MPEG-1 allows half-pixel-accuracy motion vectors.
`
`By estimating the displacement at a finer resolution, we can expect improved prediction and, thus, better performance
`
`than motion estimation with integer—pixel accuracy. As shown in Figure 2, since there is no pixel-value at the half-
`
`pixel locations, interpolation is required to produce the pixel-values at the half-pixel positions. Bi-linear interpolation
`
`is used in MPEG-1 for its simplicity. As in H.261, the motion estimation is performed only on luminance blocks. The
`
`resulting motion vector is scaled by 2 and applied to the chrominance blocks. This reduces the computation but may
`
`not necessarily be optimal. Motion vectors are differentially encoded with respect to the motion vector in the preceding
`
`adjacent macroblock. The reason is that the motion vectors of adjacent regions are highly correlated, as it is quite
`
`common to have relatively uniform motion over areas of picture.
`
`1.3 MPEG-1 video structure
`
`1.3.1 Source Input Format (SIF)
`
`3
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`
`
`
`
`
`‘x/lnteger-pixel grid
`
`
`
`’1
`
`X Pixel values on integer-pixel grid
`lnterpolaled pixel values on half-pixel
`. grid using bilinear interpolation from
`
`\ pixelvaluesoninteger-pixelgrid
`
`Half-pixel grid
`
`Figure 22 Half-pixel motion estimation.
`
`The typical MPEG-1 input format is the Source Input Format (SIF). SIF was derived from CCIR601, a worldwide
`
`standard for digital TV studio. CCIR601 specifies the Y Cb Cr color coordinate where Y is the luminance component
`
`(black and white information), and Cb and Cr are two color difference signals (chrominance components). A
`
`luminance sampling frequency of 13.5 MHz was adopted. There are several Y Cb Cr sampling formats, such as 4:424,
`
`4:222, 4:121, and 422:0.
`
`In 424:4, the sampling rates for Y, Cb, and Cr are the same.
`
`In 42222, the sampling rates of Cb
`
`and Cr are half of that of Y.
`
`In 4:1:1 and 42:0, the sampling rates of Cb and Cr are one quarter of that of Y. The
`
`positions on Cb Cr samples for 4:424, 42222, 42121, and 42220 are shown in Figure 3.
`
`x : Luminance samples
`
`0 : Chrominance samples
`
`
`(d)
`
`Figure 3: Luminance and chrominance samples in (a) 4:424 format (b) 422:2 format (c) 4:1:1 format (d) 42220 format.
`
`Converting analog TV signal to digital video with the 13.5 MHz sampling rate of CCIR601 results in 720 active pixels
`
`per line (576 active lines for PAL and 480 active lines for NTSC). This results in a 720x480 resolution for NTSC and a
`
`720x576 resolution for PAL. With 42222, the uncompressed bit-rate for transmitting CCIR601 at 30 frames/s is then
`
`about 166 Mbits/s. Since it is difficult to compress a CCIR601 video to 1.5 Mb/s with good video quality, in MPEG-1,
`
`typically the source video resolution is decimated to a quarter of the CCIR601 resolution by filtering and sub-sampling.
`
`The resultant format is called Source Input Format (SIF) which has a 360x240 resolution for NTSC and a 360x288
`
`resolution for PAL. Since in the video coding algorithm, the block-size of 16x16 is used for motion compensated
`
`prediction, the number of pixels in both the horizontal and the vertical dimensions should be multiples of 16. Thus, the
`
`four left-most and right-most pixels are discarded to give a 352x240 resolution for NTSC systems (30 frames/s) and a
`
`352x288 resolution for PAL systems (25 frames/s). The chrominance signals have half of the above resolutions in both
`
`4
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`the horizontal and vertical dimensions (4:220, 176x120 for NTSC and 176x144 for PAL). The uncompressed bit-rate
`
`for SIF (NTSC) at 30 frames/s is about 30.4 Mbits/s.
`
`1.3.2
`
`Group Of Pictures (GOP) and I-B-P Pictures
`
`In MPEG, each video sequence is divided into one or more groups of pictures (GOPs). There are four types of pictures
`
`defined in MPEG-1: I-, P-, B-, and D—pictures of which the first three are shown in figure 4. Each GOP is composed of
`
`one or more pictures; one of these pictures must be an I-picture. Usually, the spacing between two anchor fiames (I- or
`
`P-pictures) is referred to as M, and the spacing between two successive I-pictures is referred to as N.
`and N=9.
`
`In Figure 4, M=3
`
`//\/—\“n
`
`Ullflllflllflllflllflllfl
`
`\IQCDQ/J
`k)\Group ofpictures
`
`Group ofpictures
`
`
`
`Figure 4: MPEG Group Of Pictures.
`
`I—pictures (Intra—coded pictures) are coded independently with no reference to other pictures. I-pictures provide random
`
`access points in the compressed video data, since the I—pictures can be decoded independently without referencing to
`
`other pictures. With I-pictures, an MPEG bit-stream is more editable. Also, error propagation due to transmission
`
`errors in previous pictures will be terminated by an I-picture since the I-picture does not have a reference to the
`
`previous pictures. Since I-pictures use only transform coding without motion compensated predictive coding,
`
`it
`
`provides only moderate compression.
`
`P—pictures (Predictive—coded pictures) are coded using the forward motion-compensated prediction similar to that in
`
`H.261 fiom the preceding 1- or P—picture. P-pictures provide more compression than the I-pictures by virtue of motion-
`
`compensated prediction. They also serve as references for B—pictures and future P—pictures. Transmission errors in the
`
`I-pictures and P-pictures can propagate to the succeeding pictures since the I-pictures and P—pictures are used to predict
`
`the succeeding pictures.
`
`B-pictures (Bi-directional-coded pictures) allow macroblocks to be coded using bi-directional motion-compensated
`
`prediction from both the past and future reference I- or P—pictures.
`
`In the B-pictures, each bi-directional motion-
`
`compensated macroblock can have two motion vectors: a forward motion vector which references to a best matching
`
`5
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`block in the previous 1- or P-pictures, and a backward motion vector which references to a best matching block in the
`
`next I— or P-pictures as shown in figure 5. The motion compensated prediction can be formed by the average of the two
`
`referenced motion compensated blocks. By averaging between the past and the future reference blocks, the effect of
`
`noise can be decreased. B-pictures provide the best compression compared to I- and P-pictures.
`
`I- and P-pictures are
`
`used as reference pictures for predicting B—pictures. To keep the structure simple and since there is no apparent
`
`advantage to use B-pictures for predicting other B-pictures, the B—pictures are not used as reference pictures Hence, B-
`
`pictures do not propagate errors.
`
`Backward motion vector
`
`Fonivard motion vector
`
`
`
`
`v/l—_.i 4" — Best matching macroblock
`
`
`l__l
`I
`|
`Future reference picture
`
`_—\‘§:/ CurrentB-picture
`
`
`
`Ml
`
`l
`
`Best matching macroblock
`
`Past reference picture
`
`Figure 5: Bi—directional motion estimation.
`
`D-pictures (DC-pictures) are low-resolution pictures obtained by decoding only the DC coefficient of the Discrete
`
`Cosine Transform coefficients of each macroblock. They are not used in combination with I-, P-, or B-pictures. D-
`
`pictures are rarely used, but are defined to allow fast searches on sequential digital storage media.
`
`The trade-off of having fiequent B—pictures is that it decreases the correlation between the previous 1- or P-picture and
`
`the next reference P- or I-picture.
`
`It also causes coding delay and increases the encoder complexity. With the example
`
`shown in Figure 4 and Figure 6, at the encoder, if the order of the incoming pictures is 1, 2, 3, 4, 5, 6, 7, ..., the order of
`
`coding the pictures at the encoder will be: 1, 4, 2, 3, 7, 5, 6,
`
`At the decoder, the order of the decoded pictures will
`
`be 1, 4, 2, 3, 7, 5, 6,
`
`However, the display order after the decoder should be 1, 2, 3, 4, 5, 6, 7. Thus, frame-
`
`memories have to be used to put the pictures in the correct order. This picture re—ordering causes delay. The
`
`computation of bi-directional motion vectors and the picture-re—ordering frame-memories increase the encoder
`
`complexity.
`
`In Figure 6, two types of GOPs are shown. GOPl can be decoded without referencing other GOPs.
`
`It is called a
`
`Closed-GOP. In GOP2, to decode the 8Lh B- and 9"1 B-pictures, the 7111 P—picture in GOPl is needed. GOP2 is called an
`
`Open GOP which means the decoding of this GOP needs to reference other GOPs.
`
`6
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`Encoder Input:
`
`11 ZB 3B 4P SB 68 7P SB 9310111812813P14B1SB16P
`
`4
`H
`
`I»
`
`GOP1
`
`GOPZ
`
`Decoder Input:
`11 4P 2B 3B 7P 53 GB 1018B QB 13P 11B1ZB 16P 14B 15B
`
`4————
`54
`GOP1
`
`GOPZ
`
`r
`
`CLOSED
`
`OPEN
`
`Figure 6: Frame reordering.
`
`1.3.3
`
`Slice, Macroblock, and Block structures
`
`An MPEG picture consists of slices. A slice consists of a contiguous sequence of macroblocks in a raster scan order
`
`(from left to right and from top to bottom). In an MPEG coded bit-stream, each slice starts with a slice—header which is
`
`a clear-codeword (a clear-codeword is a unique bit-pattern which can be identified without decoding the variable-length
`
`codes in the bit-stream). Due to the clear-codeword slice-header, slices are the lowest level of units which can be
`
`accessed in an MPEG coded bit-stream without decoding the variable—length codes.
`
`Slices are important in the
`
`handling of channel errors.
`
`If a bit-stream contains a bit-error, the error may cause error propagation due to the
`
`variable-length coding. The decoder can regain synchronization at the start of the next slice. Having more slices in a
`
`bit-stream allows better error-termination, but the overhead will increase.
`
`A macroblock consists of a 16x16 block of luminance samples and two 8x8 block of corresponding chrominance
`
`samples as shown in figure 7. A macroblock thus consists of four 8x8 Y—blocks, one 8x8 Cb block, and one 8x8 Cr
`
`block. Each coded macroblock contains motion-compensated prediction information (coded motion vectors and the
`
`prediction errors). There are four types of macroblocks: intra, forward—predicted, backward-predicted, and averaged
`
`macroblocks. The motion information consists of one motion vector for forward— and backward—predicted macroblocks
`
`and two motion vectors for bi-directionally-predicted (or averaged) macroblocks. P-pictures can have intra- and
`
`forward-predicted macroblocks. B-pictures can have all four types of macroblocks. The first and last macroblocks in a
`
`slice must always be coded. A macroblock is designated as a skipped macroblock when its motion vector is zero and
`
`all the quantized DCT coefficients are zero. Skipped macroblocks are not allowed in I—pictures. Non-intra coded
`
`macroblocks in P- and B-pictures can be skipped. For a skipped macroblock, the decoder just copies the macroblock
`
`from the previous picture.
`
`7
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`\
`
`(l:—1§CRIC)ji-1_OCIK:.\II
`
`
`'
`
`
`
`i SLICE 1 | [
`
`;
`
` I SLICE 2
`
`
`
`[SLICE 1
`Y LUMINANCE
`|
`-———— :SLIEE a
`_| SLICE 14
`Queue 3
`M— I Cr Chrominance
`i_sL:CE_1_4
`BLICE 15
`
`I
`I
`
`‘
`
`
`
`
`
`
`
`1
`_
`
`‘|
`
`Fallen: 1
`games 2
`iSLIDE 3
`I
`l Cb Chrom inance
`i
`'
`fight—.—
`'
`g
`
`Figure 7: Macroblock and slice structures
`
`1.4 Summary of the major differences between MPEG-1 video and H.261
`
`As compared to H.261, MPEG-1 video differs in the following aspects:
`
`. MPEG-1 uses bi-directional motion compensated predictive coding with half-pixel accuracy while H.261 has no
`
`bi-directional prediction (B-pictures) and the motion vectors are always in integer-pixel accuracy.
`
`0 MPEG-1 supports the maximum motion vector range of —512 to +511.5 pixels for half-pixel motion vectors and —
`
`1024 to +1023 for integer-pixel motion vectors while H.261 has a maximum range of only :15 pixels.
`
`0 MPEG-1 uses visually weighted quantization based on the fact that the human eye is more sensitive to quantization
`
`errors related to low spatial frequencies than to high spatial frequencies. MPEG-1 defines a default 64-element
`
`quantization matrix, but also allows custom matrices appropriate for different applications. H.261 has only one
`
`quantizer for the intra DC coefficient and 31 quantizers for all other coefficients.
`
`0
`
`H.261 only specifies two source formats: CIF (Common Intermediate Format, 352x288 pixels) and QCIF (Quarter
`
`CIF, 176x144 pixels).
`
`In MPEG-1, the typical source format is SIF (352x240 for NTSC, and 352x288 for PAL).
`
`However, the users can specify other formats. The picture size can be as large as 4k x 4k pixels. There are certain
`
`parameters in the bit-streams that are left flexible, such as the number of lines per picture (less than 4096), the
`
`number of pels per line (less than 4096), picture rate (24, 25, and 30 fi‘ames/s), and fourteen choices of pel aspect
`ratios.
`
`8
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`&
`
`In MPEG-1, I-, P—, and B—pictures are organized as a flexible Group Of Pictures (GOP).
`
`- MPEG-1 uses a flexible slice structure instead of Group Of Blocks (GOB) as defined in H.261.
`
`I MPEG-1 has D-pictures to allow the fast-search option.
`
`.
`
`In order to allow cost effective implementation of user terminals, MPEG-1 defines a Constrained Parameter Set
`
`which lays down specific constraints, as listed in Table 1.
`
`Table 1: MPEG-1 Constrained Parameter Set.
`
`0 Horizontal size <= 720 pels
`° Vertical size <= 576 pels
`0 Total number ofMacroblocks/picture <= 396
`0 Total number of Macroblockslsecond
`<= 396x25 = 330x30
`0 Picture rate <= 30 frames/second
`0 Bit rate <= 1.86 Mbits/second
`
`- Decoder Buffer <= 376832 bits
`
`1.5 Simulation Model
`
`Similar to H.261, MPEG-1 specifies only the syntax and the decoder. Many detailed coding options such as the rate-
`
`control strategy,
`
`the quantization decision levels,
`
`the motion estimation schemes, and coding modes for each
`
`macroblock are not specified. This allows future technology improvement and product differentiation. In order to have
`
`a reference MPEG—1 video quality, Simulation Models were developed in MPEG-1. A simulation model contains a
`
`specific reference implementation of the MPEG-1 encoder and decoder including all the details which are not specified
`
`in the standard. The final version of the MPEG-1 simulation model is “Simulation Model 3” (SM3) [7].
`
`In SM3, the
`
`motion estimation technique uses one forward and/or one backward motion vector per macroblock with half—pixel
`
`accuracy. A two-step search scheme which consists of a full-search in the range of +/- 7 pixels with the integer-pixel
`
`precision, followed by a search in 8 neighboring half-pixel positions, is used. The decision of the coding mode for each
`
`macroblock (whether or not it will use motion compensated prediction and intra/inter coding), the quantizer decision
`
`levels, and the rate-control algorithm are all specified.
`
`1.6 MPEG-1 video bit-stream structures
`
`As shown in figure 8, there are 6 layers in the MPEG-1 video bit-stream: the video sequence, group of pictures, picture,
`
`slice, macroblock, and block layers.
`
`9
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`0 A video sequence layer consists of a sequence header, one or more groups of pictures, and an end-of-sequence
`
`code.
`
`It contains the setting of the following parameters: the picture size (horizontal and vertical sizes), pel aspect
`
`ratio, picture rate, bit-rate, the minimum decoder buffer size (video buffer verifier size), constraint parameters flag
`
`(this flag is set only when the picture size, picture rate, decoder buffer size, bit rate, and motion parameters satisfy
`
`the constraints bound in Table l), the control for the loading of 64 eight-bit values for intra and non-intra
`
`quantization tables, and the user data.
`
`0
`
`The GOP layer consists of a set of pictures that are in a continuous display order.
`
`It contains the setting of the
`
`following parameters: the time code which gives the hours-minutes-seconds time interval from the start of the
`
`sequence, the closed GOP flag which indicates whether the decoding operation needs pictures from the previous
`
`GOP for motion compensation, the broken link flag which indicated whether the previous GOP can be used to
`
`decode the current GOP, and the user data.
`
`I
`
`The picture layer acts as a primary coding unit. It contains the setting of the following parameters: the temporal
`
`reference which is the picture number in the sequence and is used to determine the display order, the picture types
`
`(I/P/B/D), the decoder buffer initial occupancy which gives the number of bits that must be in the compressed
`
`video buffer before the idealized decoder model defined by MPEG decodes the picture (it is used to prevent the
`
`decoder buffer overflow and underflow), the forward motion vector resolution and range for P- and B-pictures, the
`
`backward motion vector resolution and range for B—pictures, and the user data.
`
`I
`
`The slice layer acts as a resynchronization unit. It contains the slice vertical position where the slice starts, and the
`
`quantizer scale that is used in the coding of the current slice.
`
`0
`
`The macroblock layer acts as a motion compensation unit.
`
`It contains the setting of the following parameters: the
`
`optional stuffing bits, the macroblock address increment, the macroblock type, quantizer scale, motion vector, and
`
`the Coded Block Pattern which defines the coding patterns of the 6 blocks in the macroblock.
`
`0
`
`The block layer is the lowest layer of the video sequence and consists of coded 8x8 DCT coefficients. When a
`
`macroblock is encoded in the Intra—mode, the DC-coefficient is encoded similar to that in JPEG (the DC coefficient
`
`of the current macroblock is predicted fi'om the DC coefficient of the previous macroblock). At the beginning of
`
`each slice, predictions for DC coefficients for luminance and chrominance blocks are reset to 1024. The
`
`differential DC values are categorized according to their absolute values and the category information is encoded
`
`using VLC (Variable-Length Code). The category information indicates the number of additional bits following
`
`the VLC to represent the prediction residual. The AC-coefficients are encoded similar to that in H.261 using a
`
`VLC to represent the zero-run-length and the value of the non-zero coefficient. When a macroblock is encoded in
`
`non-intra modes, both the DC— and AC-coefficients are encoded similar to that in H.261.
`
`10
`
`10
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`Above the video sequence layer, there is a system layer in which the video sequence is packetized. The video and
`
`audio bit streams are then multiplexed into an integrated data stream. These are defined in the Systems part.
`
`1.7 Summary
`
`MPEG-1 is mainly for storage media applications. Due to the use of B—picture, it may result in long end-to-end delay.
`
`The MPEG-1 encoder is much more expensive than the decoder due to the large search range, the half-pixel accuracy
`
`in motion estimation, and the use of the bi-directional motion estimation. The MPEG-l syntax can support a variety of
`
`frame-rates and formats for various storage media applications. Similar to other video coding standards, MPEG-1 does
`
`not specify every coding option (motion estimation, rate-control, coding modes, quantization, pre—processing, post-
`
`processing, etc.). This allows continuing technology improvement and product differentiation.
`
`I
`
`Video sequence
`
`
`I Sequence header
`" GOP
`I GOP __ GOP
`I WI Sequence layer
`
`
`Picture
`I Picture
`Picture
`I
`
`GOP header
`
`IWI GOP layer
`
`
`
`I
`
`Picture header
`
`Slice header
`
`Slice
`
`I
`Slice
`Slice
`I
`-----
`I WI Picture layer
`
`
`I Macroblock I Macroblock __ Macroblock I
`I Macroblock I
`
`Slice layer
`
`I
`
`I
`
`I
`
`
`I Macroblock header-i Blocko I Block1
`Block2 I Block3 I Block4 I Blocks
`I Macroblock layer
`
`IDifferential Dc coefficienlI AC coefficientI AC ceefficientI-AC coefficient
`
`.
`
`I End-Of—Block
`
`BIOCK layer
`
`Figure 8: MPEG-1 bit-stream syntax layers.
`
`2. MPEG-2 video coding standard
`
`2.1 Introduction
`
`2.1.1 Background and structure of MPEG-2 standards activities
`
`The MPEG-2 standard represents the continuing efforts of the MPEG committee to develop generic video and audio
`
`coding standards after their development of MPEG-1. The idea of this second phase of MPEG work came from the fact
`
`that MPEG-1 is optimized for applications at about 1.5 Mb/s with input source in SIF, which is a relatively low-
`
`resolution progressive format. Many higher quality higher bit-rate applications require a higher resolution digital video
`
`source such as CCIR601, which is an interlaced format. New techniques can be developed to code the interlaced video
`better.
`
`11
`
`11
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`The MPEG-2 committee started working in late 1990 after the completion of the technical work of MPEG—1. The
`
`competitive tests of video algorithms were held in November 1991, followed by the collaborative phase. The
`
`Committee Draft (CD) for the video part was achieved in November 1993. The MPEG-2 standard (ISO/IEC 13818)
`
`[8] currently consists of 9 parts. The first five parts are organized in the same fashion as MPEG-1: systems, video,
`
`audio, conformance testing, and simulation sofiware technical report. The first three parts of MPEG-2 reached
`
`International Standard (IS) status in November 1994. Part 4 and 5 were approved in March 1996. Part 6 of the MPEG-
`
`2 standard specifies a full set of Digital Storage Media Control Commands (DSM—CC). Part 7 is the specification of a
`
`non-backward compatible audio. Part 8 was originally planned to be the coding of 10-bit video but was discontinued.
`
`Part 9 is the specification of Real—time Interface (RTI) to Transport Stream decoders which may be utilized for
`
`adaptation to all appropriate networks carrying MPEG-2 Transport Streams.
`
`Part 6 and Part 9 have already been
`
`approved as International Standards in July 1996. Like the MPEG-1 standard, MPEG-2 video coding standard
`
`specifies only bit stream syntax and the semantics of the decoding process. Many encoding options were left
`
`unspecified to encourage continuing technology improvement and product differentiation.
`
`MPEG-3, which was originally intended for HDTV (High Definition digital Television) at higher bit-rates, was merged
`
`with MPEG-2. Hence there is no MPEG-3. MPEG-2 video coding standard (ISO/IEC 13818-2) was also adopted by
`
`ITU—T as ITU—T Recommendation H.262 [9].
`
`2.1.2 Target applications and requirements
`
`MPEG-2 is primarily targeted at coding high-quality video at 4 —15 Mb/s for Video On Demand (VOD), digital
`
`broadcast television, and Digital Storage Media such as DVD (Digital Versatile Disc). It is also used for coding HDTV
`
`(High-Definition TV), Cable/Satellite digital TV, video services over various networks, 2-way communications, and
`
`other high-quality digital video applications.
`
`The requirements from MPEG-2 applications mandate several
`
`important features of the compression algorithm.
`
`Regarding picture quality, MPEG-2 needs to be able to provide good NTSC quality video at a bit-rate of about 4-6
`
`Mbits/s and transparent NTSC quality video at a bit—rate of about 8-10 Mbits/s.
`
`It also needs to provide the capability
`
`of random access and quick channel-switching by means of inserting I-pictures periodically. The MPEG-2 syntax also
`
`needs to support trick modes, e.g. fast forward and fast reverse play, as in MPEG-1. Low-delay mode is specified for
`
`delay-sensitive visual communications applications. MPEG-2 has scalable coding modes in order to support multiple
`
`grades of video quality, video formats, and frame-rate for various applications. Error resilience options include intra
`
`motion vector, data partitioning, and scalable coding. Compatibility between the existing and the new standard coders
`
`is another prominent feature provided by MPEG-2. For example, MPEG-2 decoders should be able to decode MPEG-1
`
`bit-streams.
`
`If scalable coding is used, the base-layer of MPEG-2 signals can be decoded by a MPEG—1 decoder.
`
`Finally, it should allow reasonable complexity encoders and low-cost decoders be built with mature technology. Since
`
`12
`
`12
`
`
`
`Copyright 1999 Academic Press. This material will be published in the Image and Video Processing Handbook.
`
`MPEG-2 video is based heavily on MPEG—1, in the following sections, we will focus only on those features which are
`different from MPEG-1 video.
`
`2.2. MPEG-2 Profiles and Levels
`
`MPEG-2 standard is designed to cover a wide range of applications. However, features needed for some applications
`
`may not be needed for other applications. If we put all the features into one single standard, it may result in an overly
`
`expensive system for many applications.
`
`It is desirable for an application to implement only the necessary features to
`
`lower the cost of the system. To meet this need, MPEG-2 classified the groups of features for important applications
`
`into Profiles. A Profile is defined as a specific subset of the MPEG-2 bit stream syntax and functionality to support a
`
`class of applications (e.g. low-delay video conferencing applications, or storage media applications). Within each
`
`Profile, Levels are defined to support applications which have difi'erent quality requirements (e.g. diff