`A simple introduction
`
`Dr. S.R. Ely (BBC)
`
`1. Introduction
`
`The Moving Pictures Expert Group (MPEG)
`started in 1988 as Working Group 11, Sub-
`committee 29, of ISO/IEC JTC11 with the aim of
`defining the standards for digital compression of
`video and audio signals. It took as its basis the
`ITU-T2 standard for video-conferencing and
`video-telephony [1], together with that of the Joint
`Photographic Experts Group (JPEG) which had
`initially been developed for compressing still
`images such as electronic photography [2].
`
`The first goal of MPEG was to define a video
`coding algorithm for digital storage media, in
`particular CD-ROM. The resulting standard was
`published in 1993 as ISO/IEC 11172 [3] and com-
`prises three parts, covering the systems aspects
`(multiplexing and synchronization), video coding
`and audio coding. This standard has been applied
`in the CD-i system to provide full motion video
`playback from CD, and is widely used in PC
`applications for which a range of hardware and
`software coders and decoders are available.
`
`MPEG-1 is restricted to non-interlaced video for-
`mats and is primarily intended to support video
`coding at bit-rates up to about 1.5 Mbit/s.
`
`1.
`
`2.
`
`Joint Technical Committee No. 1 of the International
`Organisation for Standardisation and the International
`Electrotechnical Commission.
`International Telecommunication Union
`– Telecommunication Standardization Bureau.
`
`The core element of all DVB
`systems is the MPEG-2 vision
`coding standard, which is based
`upon a flexible toolkit of techniques
`for bit-rate reduction.
`The MPEG-2 specification only
`defines the bit-stream syntax and
`decoding process. The coding
`process is not specified, which
`means that compatible improve-
`ments in the picture quality will
`continue to be possible.
`In this article, the author provides a
`simple introduction to the techni-
`calities of the MPEG-2 video coding
`standard.
`
`In 1990, MPEG began work on a second standard
`which would be capable of coding interlaced
`pictures directly, originally to support high-quality
`applications at bit-rates in the range of about 5 to
`10 Mbit/s. MPEG-2 now also supports high-
`definition formats at bit-rates in the range of about
`15 to 30 Mbit/s. The MPEG-2 standard was first
`published in 1994 as ISO/IEC 13818, again com-
`prising three parts – systems, video and audio. A
`second version of the standard was published in
`1995 [4].
`
`It is important to note that the MPEG standards
`specify only the syntax and semantics of the bit-
`streams and the decoding process. They do not
`
`EBU Technical Review Winter 1995
`Ely
`
`Original language: English
`Manuscript received 30/11/95.
`
`12
`
`1
`
`SAMSUNG-1030
`
`
`
`specify the coding process: this is left mainly to the
`discretion of the coder designers, thus giving scope
`for improvement as coding techniques are refined
`and new techniques are developed.
`
`2. Video coding principles
`
`If we take a studio-quality 625-line component
`picture and digitize it according to ITU Recom-
`mendations BT.601 [5] and BT.656 [6] (i.e. if we
`use 4:2:2 sampling as shown in Fig. 1), a bit-
`stream of 216 Mbit/s is used to convey the lumi-
`nance and the two chrominance components. For
`bandwidth-restricted media – such as terrestrial or
`satellite channels – a method is required to reduce
`the bit-rate needed to represent the digitized
`picture.
`
`A video bit-rate reduction (compression) system
`operates by removing the redundant and less-
`important information from the signal prior to
`transmission, and by reconstructing an approxima-
`tion of the image from the remaining information
`at the decoder. In video signals, three distinct
`kinds of redundancy can be identified:
`
`1)Spatial and temporal redundancy
`
`Here, use is made of the fact that the pixel values
`are not independent but are correlated with their
`neighbours, both within the same frame and
`across frames. So, to some extent, the value of
`a pixel is predictable, given the values of the
`neighbouring pixels.
`
`2)Entropy redundancy
`
`For any non-random digitized signal, some
`code values occur more frequently than others.
`This can be exploited by coding the more-
`frequently occurring values with shorter codes
`than the rarer ones. This same principle has
`long been exploited in Morse code where the
`most common letters in English, “E” and “T”,
`are represented by one dot and one dash, res-
`pectively, whereas the rarest letters, “X”, “Y”
`and “Z”, are each represented by a total of four
`dots and dashes.
`
`3)Psycho-visual redundancy
`
`This form of redundancy results from the way
`the eye and the brain work. In audio, we are
`familiar with the limited frequency response of
`the ear: in video, we have to consider two limits:
`
`– the limit of spatial resolution (i.e. the fine
`detail which the eye can resolve);
`
`EBU Technical Review Winter 1995
`Ely
`
`– the limit of temporal resolution (i.e. the abil-
`ity of the eye to track fast-moving images).
`Temporal resolution means, for example,
`that a change of picture (a shot-change)
`masks the fine detail on either side of the
`change.
`
`3. MPEG video compression
`toolkit
`
`Sample-rate reduction is a very effective method
`of reducing the bit-rate but, of course, it introduces
`irreversible loss of resolution. For very low bit-
`rate applications (e.g. in MPEG-1), alternate fields
`are discarded and the horizontal sampling-rate is
`reduced to around 360 pixels-per-line (giving
`about 3.3 MHz resolution). The sample rate for
`the chrominance is half that of the luminance, both
`horizontally and vertically. In this way, the bit-rate
`can be reduced to less than one fifth that of a con-
`ventional definition (4:2:2) sampled signal.
`
`For “broadcast quality” at bit-rates in the range 3
`to 10 Mbit/s, horizontal sample-rate reduction is
`not advisable for the luminance or chrominance
`signals, nor is temporal sub-sampling. However,
`for distribution and broadcast applications, suffi-
`cient chrominance resolution can be provided if
`the sampling frequency of the vertical chromi-
`nance is halved. Thus, for most MPEG-2 coding
`applications, 4:2:0 sampling is likely to be used
`rather than 4:2:2. However, 4:2:2 and 4:4:4 sam-
`pling are also supported by MPEG-2. It may be of
`interest to note that a conventional delay-line PAL
`decoder effectively yields the same vertical sub-
`
`Figure 1
`4:2:2 sampling.
`
`R
`
`G
`
`B
`
`RGB
`to
`YUV
`matrix
`
`5.75 MHz
`
`2.75 MHz
`
`2.75 MHz
`
`13.5 MHz
`
`8 bits
`
`ADC
`
`Y
`
`6.75 MHz
`
`8 bits
`
`ADC
`
`6.75 MHz
`
`8 bits
`
`ADC
`
`CB
`
`CR
`
`Y (cid:2) 8 (cid:1) 13.5 (cid:2) 108
`CB (cid:2) 8 (cid:1) 6.75 (cid:2) 54
`CR (cid:2) 8 (cid:1) 6.75 (cid:2) 54
`Total (cid:2) 216 Mbit(cid:3)s
`
`13
`
`2
`
`
`
`of a different DCT “basis” function to the original
`image block. The lowest frequency basis function
`(top-left in Fig. 3) is called the DC coefficient and
`may be thought of as representing the average
`brightness of the block.
`
`The DCT does not directly reduce the number of
`bits required to represent the block. In fact, for an
`8 x 8 image block of 8-bit pixels, the DCT pro-
`duces an 8 x 8 block of at least 11-bit DCT co-
`efficients, to allow for reversibility! The reduction
`in the number of bits follows from the fact that, for
`typical blocks of natural images, the distribution of
`coefficients is non-uniform – the transform tends
`to concentrate the energy into the low-frequency
`coefficients, and many of the other coefficients are
`near zero. The bit-rate reduction is achieved by not
`transmitting the near-zero coefficients, and by
`quantizing and coding the remaining coefficients
`as described below. The distribution of the non-
`uniform coefficients is a result of the spatial redun-
`dancy present in the original image block.
`
`Many different forms of transformation have been
`investigated for bit-rate reduction. The best trans-
`forms are those which tend to concentrate the ener-
`gy of a picture block into a few coefficients. The
`DCT is one of the best transforms in this respect
`and has the advantage that the DCT and its inverse
`are easy to implement in digital processing. The
`choice of an 8 x 8 block-size is a trade-off between
`the need to use a large picture area for the trans-
`form, so the energy compaction described above is
`most efficient, and the fact that the content and
`movement of the picture varies spatially, which
`would tend to support a smaller block-size. A large
`block-size would also emphasize variations from
`block-to-block in the decoded picture; it would
`also emphasize the effects of “windowing” by the
`block structure.
`
`3.2. Coefficient quantization
`
`After a block has been transformed, the transform
`coefficients are quantized. Different quantization
`is applied to each coefficient depending on the
`spatial frequency within the block that it repre-
`sents. The objective is to minimize the number of
`bits which must be transmitted to the decoder, so
`that it can perform the inverse transform and re-
`construct the image: reduced quantization accura-
`cy reduces the number of bits which need to be
`transmitted to represent a given DCT coefficient,
`but increases the possible quantization error for
`that coefficient. Note that the quantization noise
`introduced by the coder is not reversible in the
`decoder, so the coding and decoding process is
`“lossy”.
`
`EBU Technical Review Winter 1995
`Ely
`
`sampling of the chrominance signals as does 4:2:0
`sampling.
`
`Apart from sample-rate reduction, the MPEG tool-
`kit includes two different kinds of tools to exploit
`redundancy in images:
`
`1)Discrete Cosine Transform (DCT)
`
`The purpose of using this orthogonal transform
`– which is similar to the Discrete Fourier Trans-
`form (DFT) – is to assist the processing which
`removes spatial redundancy, by concentrating
`the signal energy into relatively few coeffi-
`cients.
`
`2)Motion-compensated interframe
`prediction
`
`This tool is used to remove temporal redun-
`dancy. It is based on techniques similar to the
`well-known differential pulse-code modulation
`(DPCM) principle.
`
`3.1. Discrete cosine transform
`
`The luminance signal of a 4:2:0-sampled digitized
`625-line picture comprises about 704 pixels hori-
`zontally and about 576 lines vertically (see Fig. 2).
`In MPEG coding, spatial redundancy is removed
`by processing the digitized signals in 2-D blocks of
`8 pixels by 8 lines (taken from either one field or
`two, depending on the mode of operation).
`
`As Fig. 3 illustrates, the DCT transform is a revers-
`ible process which maps between the normal 2-D
`presentation of the image and one which repre-
`sents the same information in what may be thought
`of as the frequency domain. Each coefficient in the
`8 x 8 DCT domain block indicates the contribution
`
`8
`
`8
`
`88 blocks
`
`704 pixels
`
`576 lines
`
`72 blocks
`
`Figure 2
`Block-based DCT.
`
`14
`
`3
`
`
`
`d.c. coefficient
`
`8
`
`Horizontal
`frequencies
`
`Increasing cycles per
`picture width
`
`8
`
`Figure 3
`DCT transform pairs.
`
`frequencies
`
`Vertical
`
`DCT
`
`IDCT
`
`8
`
`Increasing cycles
`per picture height
`
`8
`
`tern, which is preferable for pictures without a
`strong vertical structure, there is no bias and the
`scan proceeds diagonally from top left to bottom
`right, as illustrated in Fig. 4. The coder indicates
`its choice of scan pattern to the decoder.
`
`The strings of coefficients produced by the zigzag
`scanning are coded by counting the number of zero
`coefficients preceding a non-zero coefficient, i.e.
`run-length coded. The run-length value, and the
`value of the non-zero coefficient which the run of
`zero coefficients precedes, are then combined and
`coded using a variable-length code (VLC). The
`VLC exploits the fact that short runs of zeros are
`more likely than long ones, and small coefficients
`
`d.c. coefficient
`
`Figure 4
`Scanning of DCT
`blocks and run-
`length coding with
`variable-length codes
`(Entropy coding).
`
`Horizontal frequencies
`
`Vertical frequencies
`
`Note 1: Zigzag scanning.
`Note 2: Run/amplitude coding: the run of zeros and the amplitude of the DCT
`coefficient are given one Variable Length Code (VLC) (Huffman Code).
`
`15
`
`More quantization error can be tolerated in the
`high-frequency coefficients, because HF noise is
`less visible than LF quantization noise. Also,
`quantization noise is less visible in the chromi-
`nance components than in the luminance compo-
`nent. MPEG uses weighting matrices to define the
`relative accuracy of the quantization of the differ-
`ent coefficients. Different weighting matrices can
`be used for different frames, depending on the pre-
`diction mode used.
`
`The weighted coefficients are then passed through
`a fixed quantization law, which is usually a linear
`law. However, for some prediction modes there is
`an increased threshold level (i.e. a dead-zone)
`around zero. The effect of this threshold is to max-
`imize the number of coefficients which are quan-
`tized to zero: in practice, it is found that small devi-
`ations around zero are usually caused by noise in
`the signal, so suppressing these values actually
`“improves” the subjective picture quality.
`
`Quantization noise is more visible in some blocks
`than in others; for example, in blocks which con-
`tain a high-contrast edge between two plain areas.
`In such blocks, the quantization parameters can be
`modified to limit the maximum quantization error,
`particularly in the high-frequency coefficients.
`
`3.3. Zigzag coefficient scanning,
`run-length coding, and
`variable-length coding
`
`After quantization, the 8 x 8 blocks of DCT co-
`efficients are scanned in a zigzag pattern (see
`Fig. 4) to turn the 2-D array into a serial string of
`quantized coefficients. Two scan patterns are de-
`fined. The first is usually preferable for picture
`material which has strong vertical frequency com-
`ponents due to, perhaps, the interlace picture struc-
`ture. In this scan pattern, there is a bias to scan
`vertical coefficients first. In the second scan pat-
`
`EBU Technical Review Winter 1995
`Ely
`
`4
`
`
`
`Y, CR or CB
`
`Line-scan
`to
`block-scan
`conversion
`
`DCT
`
`Quantization
`
`Zigzag
`scan
`
`Run-
`length
`code
`
`VLC
`
`Buffer
`
`Output
`
`Buffer occupancy control
`
`Figure 5
`Basic DCT coder.
`
`are more likely than large ones. The VLC allocates
`codes which have different lengths, depending
`upon the expected frequency of occurrence of each
`zero-run-length / non-zero coefficient value com-
`bination. Common combinations use short code
`words; less common combinations use long code
`words. All other combinations are coded by the
`combination of an escape code and two fixed-
`length codes, one 6-bit word to indicate the run
`length, and one 12-bit word to indicate the co-
`efficient value.
`
`One VLC code table is used in most circumstances.
`However, a second VLC code table is used for
`some special pictures. The DC coefficient is
`treated differently in some modes. However, all
`the VLCs are designed such that no complete code-
`word is the prefix of any other codeword: they are
`similar to the well-known Huffman code. Thus the
`decoder can identify where one variable-length
`codeword ends and another starts, when operating
`within the correct codebook. No VLC or combina-
`tion of codes is allowed to produce a sequence of
`23 contiguous zeros – this particular sequence is
`used for synchronization purposes.
`
`DC coefficients in blocks contained within intra
`macroblocks (see Section 3.7.) are differentially
`encoded before variable-length coding.
`
`3.4. Buffering and feedback
`
`Figure 6
`Basic DPCM coder.
`
`The DCT coefficient quantization, the run-length
`coding and the variable-length coding processes
`produce a varying bit-rate which depends upon the
`
`Input
`
`+ve
`
`–ve
`
`Quantizer
`
`Quantized
`prediction error
`to channel
`
`Predictor
`
`Locally-decoded
`output
`
`16
`
`complexity of the picture information and the
`amount and type of motion in the picture. To pro-
`duce the constant bit-rate needed for transmission
`over a fixed bit-rate system, a buffer is needed to
`smooth out the variations in bit-rate. To prevent
`overflow or underflow of this buffer, its occupancy
`is monitored and feedback is applied to the coding
`processes to control the input to the buffer. The
`DCT quantization process is often used to provide
`direct control of the input to the buffer: as the
`buffer becomes full, the quantizer is made coarser
`to reduce the number of bits used to code each DCT
`coefficient: as the buffer empties, the DCT quanti-
`zation is made finer. Other means of controlling
`the buffer occupancy may be used as well as, or
`instead of, the control of DCT coefficient quanti-
`zation.
`
`Fig. 5 shows a block diagram of a basic DCT codec
`with, in this example, the buffer occupancy con-
`trolled by feedback to the DCT coefficient quanti-
`zation.
`
`It is important to note that the final bit-rate at the
`output of an MPEG video encoder can be freely
`varied: if the output bit-rate is reduced, the buffer
`will empty more slowly and the coder will auto-
`matically compensate by, for example, making the
`DCT coefficient quantization coarser. But, of
`course, reducing the output bit-rate reduces the
`quality of the decoded pictures. There is no need
`to lock input sampling rates to channel bit-rates, or
`vice-versa.
`
`3.5. Reduction of temporal
`redundancy: interframe
`prediction
`
`In order to exploit the fact that pictures often
`change little from one frame to the next, MPEG
`includes temporal prediction modes: that is, we
`attempt to predict one frame to be coded from a
`previous “reference” frame.
`
`Fig. 6 illustrates a basic differential pulse code
`modulation (DPCM) coder in which we quantize
`and transmit only the differences between the input
`and a prediction based on the previous locally-
`decoded output. Note that the prediction cannot be
`
`EBU Technical Review Winter 1995
`Ely
`
`5
`
`
`
`Prediction
`error
`
`Input
`
`+ve
`
`–ve
`
`DCT
`
`Quantizer
`
`DCT of
`prediction error
`
`Inverse
`DCT
`
`Figure 7
`DCT with interframe
`prediction coder.
`
`Frame
`delay
`
`based on previous source pictures, because the pre-
`diction has to be repeatable in the decoder (where
`the source pictures are not available). Consequent-
`ly, the coder contains a local decoder which recon-
`structs pictures exactly as they would be in the
`actual decoder. The locally-decoded output then
`forms the input to the predictor. In interframe pre-
`diction, samples from one frame are used in the
`prediction of samples in other “reference” frames.
`
`In MPEG coding, we combine interframe predic-
`tion (which reduces temporal redundancy) with
`the DCT and variable-length coding tools that
`were described in Section 3.3. (which reduce spa-
`tial redundancy), as shown in Fig. 7. The coder
`subtracts the prediction from the input to form a
`prediction-error picture. The prediction error is
`transformed with the DCT, the coefficients are
`quantized, and these quantized values are coded
`using a VLC.
`
`The simplest interframe prediction is to predict a
`block of samples from the co-sited (i.e. the same
`spatial position) block in the reference frame. In
`this case the “predictor” would comprise simply a
`delay of exactly one frame, as shown in Fig. 7.
`This makes a good prediction for stationary re-
`gions of the image but is poor in moving areas.
`
`3.6. Motion-compensated
`interframe prediction
`
`A more sophisticated prediction method, known as
`motion-compensated interframe prediction, off-
`sets any translational motion which has occurred
`between the block being coded and the reference
`frame, and uses a shifted block from the reference
`frame as the prediction (see Fig. 8).
`
`EBU Technical Review Winter 1995
`Ely
`
`One method of determining the motion that has
`occurred between the block being coded and the
`reference frame is a “block-matching” search in
`which a large number of trial offsets are tested in
`the coder (see Fig. 9).
`
`The “best” offset is selected on the basis of a mea-
`surement of the minimum error between the block
`being coded and the prediction. Since MPEG de-
`
`Figure 8
`Motion-compensated
`interframe predictiom
`
`Frame n–1 (previous)
`
`Macroblock
`vector offset
`
`Non-motion-
`compensated
`prediction
`
`Frame n (current)
`
`Macroblock used for
`motion-compensated
`prediction
`
`16
`
`16
`
`Macroblock grid
`
`Position of macroblock
`in previous frame
`
`17
`
`6
`
`
`
`Search area
`
`Search block
`(macroblock)
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`Search block is reference
`
`Search block moved around
`search area to find best match
`
`Figure 9
`Principle of
`block-matching
`motion.
`
`fines only the decoding process, not the coder, the
`choice of motion measurement algorithm is left to
`the designer of the coder and is an area where con-
`siderable difference in performance occurs be-
`tween different algorithms and different imple-
`mentations. A major requirement is to have a
`search area large enough to cover any motion pres-
`
`Frame n–1 (previous)
`
`ent from frame to frame. However, increasing the
`size of the search area greatly increases the proces-
`sing needed to find the best match: various tech-
`niques such as hierarchical block matching are
`used to try to overcome this dilemma.
`
`Bi-directional prediction (see Fig. 10) consists of
`forming a prediction from both the previous frame
`and the following frame, by a linear combination
`of these, shifted according to suitable motion esti-
`mates.
`
`Bi-directional prediction is particularly useful
`where motion uncovers areas of detail. However,
`to enable backward prediction from a future frame,
`the coder re-orders the pictures so that they are
`transmitted in a different order from that in which
`they are displayed. This process, and the process
`of re-ordering to the correct display sequence in
`the decoder, introduces considerable end-to-end
`processing delay which may be a problem in some
`applications. To overcome this, MPEG defines a
`profile (see Section 4) which does not use bi-
`directional prediction.
`
`Macroblock
`vector offset
`
`Intra
`prediction
`
`Frame n (current)
`
`Position of macroblock
`in frame n+1
`
`Bi-directional
`prediction
`
`Frame n+1 (next)
`
`Position of macroblock
`in frame n–1
`
`Macroblock
`vector offset
`
`Figure 10
`Motion-compensated
`bi-directional
`prediction.
`
`18
`
`EBU Technical Review Winter 1995
`Ely
`
`7
`
`
`
`Figure 11
`Motion-compensated
`interframe prediction
`DCT.
`
`Prediction
`error
`
`Input
`
`+ve
`
`–ve
`
`DCT
`
`Quantizer
`
`DCT of
`prediction error
`
`Inverse
`DCT
`
`Variable
`delay
`
`Fixed
`delay
`
`Motion
`compensation
`unit
`
`Displacement
`vectors
`
`Whereas the basic coding unit for spatial redun-
`dancy reduction in MPEG is based on an 8 x 8
`block, motion-compensation is usually based on a
`16 pixel by 16 line macroblock. The size of the
`macroblock is a trade-off between the need to
`minimize the bit-rate required to transmit the mo-
`tion representation (known as motion vectors) to
`the decoder, which supports the case for a large
`macroblock size, and the need to vary the predic-
`tion process locally within the picture content and
`movement, which supports the case for a small ma-
`croblock size.
`
`To minimize the bit-rate needed to transmit the
`motion vectors, they are differentially-encoded
`with reference to previous motion vectors. The
`motion vector value prediction error is then vari-
`able-length coded using another VLC table.
`
`Fig. 11 shows a conceptual motion-compensated
`inter-frame DCT coder in which, for simplicity, we
`illustrate implementing the process of motion-
`compensated prediction by suggesting a “variable
`delay”. In practical implementations, of course,
`the motion-compensated prediction is implement-
`ed in other ways.
`
`3.7. Prediction modes
`
`In an MPEG-2 coder, the motion-compensated
`predictor supports many methods for generating a
`
`prediction. For example, a macroblock may be
`“forward predicted” from a past picture, “back-
`ward predicted” from a future picture, or “inter-
`polated” by averaging a forward and backward
`prediction. Another option is to make a zero-
`value prediction, such that the source image block
`rather than the prediction error-block is DCT-
`coded. Such macroblocks are known as intra- or
`I-coded.
`
`Although no prediction information is needed for
`intra-macroblocks, they can carry motion vector
`information. In normal circumstances, the motion
`vector information for an I-coded macroblock is
`not used, but its function is to provide a means of
`concealing the decoding errors when data errors in
`the bit-stream make it impossible to decode the
`data for that macroblock.
`
`Fields of a frame may be predicted separately from
`their own motion vector (field prediction coding),
`or together using a common motion vector (frame
`prediction coding). Generally, in the case of image
`sequences where the motion is slow, frame predic-
`tion coding is more efficient. However, when mo-
`tion speed increases, field prediction coding be-
`comes more efficient.
`
`In addition to the two basic modes of field and
`frame prediction, two further modes have been de-
`fined:
`
`EBU Technical Review Winter 1995
`Ely
`
`19
`
`8
`
`
`
`Decoder
`input
`
`Input
`buffer
`
`VLC
`decoder
`
`Inverse
`quantizer
`
`Inverse
`DCT
`
`Motion
`vectors
`
`Difference
`picture
`
`Coded
`
`Not coded
`
`Previous
`I or P
`picture store
`
`Motion
`compensation
`
`Forward
`prediction
`
`Adder
`
`Display
`buffer
`
`Decoder
`output
`
`Interpolator
`
`Interpolated
`prediction
`
`Prediction
`
`Future
`I or P
`picture store
`
`Motion
`compensation
`
`Backward
`prediction
`
`No
`prediction
`
`Figure 12
`Decoding a “B”
`macroblock.
`
`1)16 x 8 motion compensation
`
`This mode uses at least two motion vectors for
`each macroblock: one vector is used for the up-
`per 16 x 8 region and one for the lower half. (In
`the case of B-pictures (see Section 3.8), a total
`of four motion vectors are used for each macro-
`block in this mode, since both the upper and the
`lower regions may each have motion vectors
`referring to past and future pictures.):
`
`The 16 x 16 motion compensation mode is per-
`mitted only in field-structured pictures and is
`intended to allow that, in such cases, the spatial
`area covered by each motion vector is approxi-
`mately equal to that of a 16 x 16 macroblock in
`a frame structure picture.
`
`2)Dual prime mode
`
`Figure 13
`MPEG picture types.
`
`This mode may be used in both field- and frame-
`structured coding but is only permitted in P-
`
`pictures (see Section 3.8) when there have been
`no B-pictures between the P-picture and its ref-
`erence frame. In this case, a motion vector and
`a differential-offset motion vector are trans-
`mitted.
`
`For field pictures, two motion vectors are
`derived from this data and are used to form two
`predictions from two reference fields. These
`two predictions are combined to form the final
`prediction.
`
`For frame pictures, this process is repeated for
`each of the two fields: each field is predicted
`separately, giving rise to a total of four field
`predictions which are combined to form the fi-
`nal two predictions. Dual prime mode is used
`as an alternative to bi-directional prediction,
`where low delay is required: it avoids the frame
`re-ordering needed for bi-directional prediction
`but achieves similar coding efficiency.
`
`I
`picture
`
`B
`picture
`
`B
`picture
`
`P
`picture
`
`B
`picture
`
`B
`picture
`
`P
`picture
`
`B
`picture
`
`Forward prediction
`Backward prediction
`
`Note 1: An intra-coded (I) picture is coded using information only from itself.
`Note 2: Predictive-coded (P) pictures are coded with reference to a previous I or P picture.
`Note 3: Bidirectionally-predictive (B) pictures are coded with reference to both the previous I or P picture and the next (future) I or P picture.
`
`20
`
`EBU Technical Review Winter 1995
`Ely
`
`9
`
`
`
`For each macroblock to be coded, the coder
`chooses between these prediction modes, trying to
`minimize the distortions on the decoded picture
`within the constraints of the available channel bit-
`rate. The choice of prediction mode is transmitted
`to the decoder, together with the prediction error,
`so that it can regenerate the correct prediction.
`
`Fig. 12 illustrates how a bi-directionally coded
`macroblock (a B-macroblock) is decoded. The
`switches illustrate the various prediction modes
`available for such a macroblock. Note that the
`coder has the option not to code some macro-
`blocks: no DCT coefficient information is trans-
`mitted for those blocks and the macroblock ad-
`dress counter skips to the next coded macroblock.
`The decoder output for the uncoded macroblocks
`simply comprises the predictor output.
`
`3.8. Picture Types
`
`In MPEG-2, three “picture types” are defined (see
`Fig. 13). The picture type defines which predic-
`tion modes may be used to code each macroblock:
`
`1)Intra pictures (I-pictures)
`
`These are coded without reference to other pic-
`tures. Moderate compression is achieved by
`reducing spatial redundancy but not temporal
`redundancy. They are important as they pro-
`vide access points in the bit-stream where de-
`coding can begin without reference to previous
`pictures.
`
`2)Predictive pictures (P-pictures)
`
`These are coded using motion-compensated
`prediction from a past I- or P-picture and may
`be used as a reference for further prediction. By
`reducing spatial and temporal redundancy, P-
`pictures offer increased compression compared
`to I-pictures.
`
`3)Bi-directionally-predictive pictures
`(B-pictures)
`
`These use both past and future I- or P-pictures
`for motion compensation, and offer the highest
`degree of compression. As noted above, to en-
`able backward prediction from a future frame,
`the coder re-orders the pictures from the natural
`display order to a “transmission” (or “bit-
`stream”) order so that the B-picture is trans-
`mitted after the past and future pictures which
`it references (see Fig. 14). This introduces a
`delay which depends upon the number of con-
`secutive B-pictures.
`
`3.9. Group of pictures
`
`The different picture types typically occur in a re-
`peating sequence termed a Group of Pictures or
`GOP. A typical GOP is illustrated in display order
`in Fig. 14(a) and in transmission order in Fig.
`14(b).
`
`A regular GOP structure can be described with two
`parameters:
`
`– N (the number of pictures in the GOP);
`
`– M (the spacing of the P-pictures).
`
`The GOP illustrated in Fig. 14 is described as
`N = 9 and M = 3.
`
`For a given decoded picture quality, coding – using
`each picture type – produces a different number of
`bits. In a typical sequence, a coded I-picture needs
`three times more bits than a coded P-picture, which
`itself occupied 50% more bits than a coded B-
`picture.
`
`4. MPEG profiles and levels
`
`MPEG-2 is intended to be generic, supporting a di-
`verse range of applications. Different algorithmic
`elements or “tools”, developed for many applica-
`
`Dr. Bob Ely is an R&D manager at BBC Research and Development Department, Kingswood Warren,
`Surrey, UK.
`
`Currently, he is working with the BBC’s Digital Broadcasting Project which aims to investigate the techni-
`cal and commercial feasibility of digital terrestrial broadcasting and to implement technical field-trials and
`demonstrations.
`
`After completing his PhD in computer communications systems at Daresbury Nuclear Physics Laboratory,
`Bob Ely joined BBC Research Department to work on RDS and related data transmission systems. He later
`led the BBC team which developed the Nicam digital stereo-sound-with-television system. For many years,
`he was Chairman of the EBU Specialist Group on RDS, a Vice-Chairman of Working Party R and has also
`been a member of EBU Groups on conditional access systems.
`
`EBU Technical Review Winter 1995
`Ely
`
`21
`
`10
`
`
`
`tions, have been integrated into a single bit-stream
`syntax.
`
`profile suitable for low-delay applications such
`as video conferencing.
`
`To implement the full syntax in all decoders is un-
`necessarily complex, so a small number of subsets
`or profiles of the full syntax have been defined.
`Also, within a given profile, a “level” is defined
`which describes a set of constraints such as maxi-
`mum sampling density, on parameters within the
`profile.
`
`The profiles defined to date fit together such that
`a higher profile is a superset of a lower one. A de-
`coder which supports a particular profile and level
`is only required to support the corresponding sub-
`set of the full syntax and a set of parameter
`constraints. To restrict the number of options
`which must be supported, only selected combina-
`tions of profile and level are defined as confor-
`mance points (see Table 1). These are:
`
`1)Simple profile
`
`Figure 14
`Example Group of
`Pictures (GOP).
`
`This uses no B-frames and, hence, no backward
`or interpolated prediction. Consequently, no
`picture re-ordering is required which makes this
`
`a
`
`Display order
`
`2)Main profile
`
`This adds support for B-pictures which im-
`proves the picture quality for a given bit-rate but
`increases the delay. Currently, most MPEG-2
`video decoder chip-sets support main profile.
`
`3)SNR profile
`
`This adds support for enhancement layers of
`DCT coefficient refinement, using signal-to-
`noise ratio (SNR) scalability.
`
`4)Spatial profile
`
`This adds support for enhancement layers carry-
`ing the image at different resolutions, using the
`spatial scalability tool.
`
`5)High pro