throbber
MPEG video coding
`A simple introduction
`
`Dr. S.R. Ely (BBC)
`
`1. Introduction
`
`The Moving Pictures Expert Group (MPEG)
`started in 1988 as Working Group 11, Sub-
`committee 29, of ISO/IEC JTC11 with the aim of
`defining the standards for digital compression of
`video and audio signals. It took as its basis the
`ITU-T2 standard for video-conferencing and
`video-telephony [1], together with that of the Joint
`Photographic Experts Group (JPEG) which had
`initially been developed for compressing still
`images such as electronic photography [2].
`
`The first goal of MPEG was to define a video
`coding algorithm for digital storage media, in
`particular CD-ROM. The resulting standard was
`published in 1993 as ISO/IEC 11172 [3] and com-
`prises three parts, covering the systems aspects
`(multiplexing and synchronization), video coding
`and audio coding. This standard has been applied
`in the CD-i system to provide full motion video
`playback from CD, and is widely used in PC
`applications for which a range of hardware and
`software coders and decoders are available.
`
`MPEG-1 is restricted to non-interlaced video for-
`mats and is primarily intended to support video
`coding at bit-rates up to about 1.5 Mbit/s.
`
`1.
`
`2.
`
`Joint Technical Committee No. 1 of the International
`Organisation for Standardisation and the International
`Electrotechnical Commission.
`International Telecommunication Union
`– Telecommunication Standardization Bureau.
`
`The core element of all DVB
`systems is the MPEG-2 vision
`coding standard, which is based
`upon a flexible toolkit of techniques
`for bit-rate reduction.
`The MPEG-2 specification only
`defines the bit-stream syntax and
`decoding process. The coding
`process is not specified, which
`means that compatible improve-
`ments in the picture quality will
`continue to be possible.
`In this article, the author provides a
`simple introduction to the techni-
`calities of the MPEG-2 video coding
`standard.
`
`In 1990, MPEG began work on a second standard
`which would be capable of coding interlaced
`pictures directly, originally to support high-quality
`applications at bit-rates in the range of about 5 to
`10 Mbit/s. MPEG-2 now also supports high-
`definition formats at bit-rates in the range of about
`15 to 30 Mbit/s. The MPEG-2 standard was first
`published in 1994 as ISO/IEC 13818, again com-
`prising three parts – systems, video and audio. A
`second version of the standard was published in
`1995 [4].
`
`It is important to note that the MPEG standards
`specify only the syntax and semantics of the bit-
`streams and the decoding process. They do not
`
`EBU Technical Review Winter 1995
`Ely
`
`Original language: English
`Manuscript received 30/11/95.
`
`12
`
`1
`
`SAMSUNG-1030
`
`

`

`specify the coding process: this is left mainly to the
`discretion of the coder designers, thus giving scope
`for improvement as coding techniques are refined
`and new techniques are developed.
`
`2. Video coding principles
`
`If we take a studio-quality 625-line component
`picture and digitize it according to ITU Recom-
`mendations BT.601 [5] and BT.656 [6] (i.e. if we
`use 4:2:2 sampling as shown in Fig. 1), a bit-
`stream of 216 Mbit/s is used to convey the lumi-
`nance and the two chrominance components. For
`bandwidth-restricted media – such as terrestrial or
`satellite channels – a method is required to reduce
`the bit-rate needed to represent the digitized
`picture.
`
`A video bit-rate reduction (compression) system
`operates by removing the redundant and less-
`important information from the signal prior to
`transmission, and by reconstructing an approxima-
`tion of the image from the remaining information
`at the decoder. In video signals, three distinct
`kinds of redundancy can be identified:
`
`1)Spatial and temporal redundancy
`
`Here, use is made of the fact that the pixel values
`are not independent but are correlated with their
`neighbours, both within the same frame and
`across frames. So, to some extent, the value of
`a pixel is predictable, given the values of the
`neighbouring pixels.
`
`2)Entropy redundancy
`
`For any non-random digitized signal, some
`code values occur more frequently than others.
`This can be exploited by coding the more-
`frequently occurring values with shorter codes
`than the rarer ones. This same principle has
`long been exploited in Morse code where the
`most common letters in English, “E” and “T”,
`are represented by one dot and one dash, res-
`pectively, whereas the rarest letters, “X”, “Y”
`and “Z”, are each represented by a total of four
`dots and dashes.
`
`3)Psycho-visual redundancy
`
`This form of redundancy results from the way
`the eye and the brain work. In audio, we are
`familiar with the limited frequency response of
`the ear: in video, we have to consider two limits:
`
`– the limit of spatial resolution (i.e. the fine
`detail which the eye can resolve);
`
`EBU Technical Review Winter 1995
`Ely
`
`– the limit of temporal resolution (i.e. the abil-
`ity of the eye to track fast-moving images).
`Temporal resolution means, for example,
`that a change of picture (a shot-change)
`masks the fine detail on either side of the
`change.
`
`3. MPEG video compression
`toolkit
`
`Sample-rate reduction is a very effective method
`of reducing the bit-rate but, of course, it introduces
`irreversible loss of resolution. For very low bit-
`rate applications (e.g. in MPEG-1), alternate fields
`are discarded and the horizontal sampling-rate is
`reduced to around 360 pixels-per-line (giving
`about 3.3 MHz resolution). The sample rate for
`the chrominance is half that of the luminance, both
`horizontally and vertically. In this way, the bit-rate
`can be reduced to less than one fifth that of a con-
`ventional definition (4:2:2) sampled signal.
`
`For “broadcast quality” at bit-rates in the range 3
`to 10 Mbit/s, horizontal sample-rate reduction is
`not advisable for the luminance or chrominance
`signals, nor is temporal sub-sampling. However,
`for distribution and broadcast applications, suffi-
`cient chrominance resolution can be provided if
`the sampling frequency of the vertical chromi-
`nance is halved. Thus, for most MPEG-2 coding
`applications, 4:2:0 sampling is likely to be used
`rather than 4:2:2. However, 4:2:2 and 4:4:4 sam-
`pling are also supported by MPEG-2. It may be of
`interest to note that a conventional delay-line PAL
`decoder effectively yields the same vertical sub-
`
`Figure 1
`4:2:2 sampling.
`
`R
`
`G
`
`B
`
`RGB
`to
`YUV
`matrix
`
`5.75 MHz
`
`2.75 MHz
`
`2.75 MHz
`
`13.5 MHz
`
`8 bits
`
`ADC
`
`Y
`
`6.75 MHz
`
`8 bits
`
`ADC
`
`6.75 MHz
`
`8 bits
`
`ADC
`
`CB
`
`CR
`
`Y (cid:2) 8 (cid:1) 13.5 (cid:2) 108
`CB (cid:2) 8 (cid:1) 6.75 (cid:2) 54
`CR (cid:2) 8 (cid:1) 6.75 (cid:2) 54
`Total (cid:2) 216 Mbit(cid:3)s
`
`13
`
`2
`
`

`

`of a different DCT “basis” function to the original
`image block. The lowest frequency basis function
`(top-left in Fig. 3) is called the DC coefficient and
`may be thought of as representing the average
`brightness of the block.
`
`The DCT does not directly reduce the number of
`bits required to represent the block. In fact, for an
`8 x 8 image block of 8-bit pixels, the DCT pro-
`duces an 8 x 8 block of at least 11-bit DCT co-
`efficients, to allow for reversibility! The reduction
`in the number of bits follows from the fact that, for
`typical blocks of natural images, the distribution of
`coefficients is non-uniform – the transform tends
`to concentrate the energy into the low-frequency
`coefficients, and many of the other coefficients are
`near zero. The bit-rate reduction is achieved by not
`transmitting the near-zero coefficients, and by
`quantizing and coding the remaining coefficients
`as described below. The distribution of the non-
`uniform coefficients is a result of the spatial redun-
`dancy present in the original image block.
`
`Many different forms of transformation have been
`investigated for bit-rate reduction. The best trans-
`forms are those which tend to concentrate the ener-
`gy of a picture block into a few coefficients. The
`DCT is one of the best transforms in this respect
`and has the advantage that the DCT and its inverse
`are easy to implement in digital processing. The
`choice of an 8 x 8 block-size is a trade-off between
`the need to use a large picture area for the trans-
`form, so the energy compaction described above is
`most efficient, and the fact that the content and
`movement of the picture varies spatially, which
`would tend to support a smaller block-size. A large
`block-size would also emphasize variations from
`block-to-block in the decoded picture; it would
`also emphasize the effects of “windowing” by the
`block structure.
`
`3.2. Coefficient quantization
`
`After a block has been transformed, the transform
`coefficients are quantized. Different quantization
`is applied to each coefficient depending on the
`spatial frequency within the block that it repre-
`sents. The objective is to minimize the number of
`bits which must be transmitted to the decoder, so
`that it can perform the inverse transform and re-
`construct the image: reduced quantization accura-
`cy reduces the number of bits which need to be
`transmitted to represent a given DCT coefficient,
`but increases the possible quantization error for
`that coefficient. Note that the quantization noise
`introduced by the coder is not reversible in the
`decoder, so the coding and decoding process is
`“lossy”.
`
`EBU Technical Review Winter 1995
`Ely
`
`sampling of the chrominance signals as does 4:2:0
`sampling.
`
`Apart from sample-rate reduction, the MPEG tool-
`kit includes two different kinds of tools to exploit
`redundancy in images:
`
`1)Discrete Cosine Transform (DCT)
`
`The purpose of using this orthogonal transform
`– which is similar to the Discrete Fourier Trans-
`form (DFT) – is to assist the processing which
`removes spatial redundancy, by concentrating
`the signal energy into relatively few coeffi-
`cients.
`
`2)Motion-compensated interframe
`prediction
`
`This tool is used to remove temporal redun-
`dancy. It is based on techniques similar to the
`well-known differential pulse-code modulation
`(DPCM) principle.
`
`3.1. Discrete cosine transform
`
`The luminance signal of a 4:2:0-sampled digitized
`625-line picture comprises about 704 pixels hori-
`zontally and about 576 lines vertically (see Fig. 2).
`In MPEG coding, spatial redundancy is removed
`by processing the digitized signals in 2-D blocks of
`8 pixels by 8 lines (taken from either one field or
`two, depending on the mode of operation).
`
`As Fig. 3 illustrates, the DCT transform is a revers-
`ible process which maps between the normal 2-D
`presentation of the image and one which repre-
`sents the same information in what may be thought
`of as the frequency domain. Each coefficient in the
`8 x 8 DCT domain block indicates the contribution
`
`8
`
`8
`
`88 blocks
`
`704 pixels
`
`576 lines
`
`72 blocks
`
`Figure 2
`Block-based DCT.
`
`14
`
`3
`
`

`

`d.c. coefficient
`
`8
`
`Horizontal
`frequencies
`
`Increasing cycles per
`picture width
`
`8
`
`Figure 3
`DCT transform pairs.
`
`frequencies
`
`Vertical
`
`DCT
`
`IDCT
`
`8
`
`Increasing cycles
`per picture height
`
`8
`
`tern, which is preferable for pictures without a
`strong vertical structure, there is no bias and the
`scan proceeds diagonally from top left to bottom
`right, as illustrated in Fig. 4. The coder indicates
`its choice of scan pattern to the decoder.
`
`The strings of coefficients produced by the zigzag
`scanning are coded by counting the number of zero
`coefficients preceding a non-zero coefficient, i.e.
`run-length coded. The run-length value, and the
`value of the non-zero coefficient which the run of
`zero coefficients precedes, are then combined and
`coded using a variable-length code (VLC). The
`VLC exploits the fact that short runs of zeros are
`more likely than long ones, and small coefficients
`
`d.c. coefficient
`
`Figure 4
`Scanning of DCT
`blocks and run-
`length coding with
`variable-length codes
`(Entropy coding).
`
`Horizontal frequencies
`
`Vertical frequencies
`
`Note 1: Zigzag scanning.
`Note 2: Run/amplitude coding: the run of zeros and the amplitude of the DCT
`coefficient are given one Variable Length Code (VLC) (Huffman Code).
`
`15
`
`More quantization error can be tolerated in the
`high-frequency coefficients, because HF noise is
`less visible than LF quantization noise. Also,
`quantization noise is less visible in the chromi-
`nance components than in the luminance compo-
`nent. MPEG uses weighting matrices to define the
`relative accuracy of the quantization of the differ-
`ent coefficients. Different weighting matrices can
`be used for different frames, depending on the pre-
`diction mode used.
`
`The weighted coefficients are then passed through
`a fixed quantization law, which is usually a linear
`law. However, for some prediction modes there is
`an increased threshold level (i.e. a dead-zone)
`around zero. The effect of this threshold is to max-
`imize the number of coefficients which are quan-
`tized to zero: in practice, it is found that small devi-
`ations around zero are usually caused by noise in
`the signal, so suppressing these values actually
`“improves” the subjective picture quality.
`
`Quantization noise is more visible in some blocks
`than in others; for example, in blocks which con-
`tain a high-contrast edge between two plain areas.
`In such blocks, the quantization parameters can be
`modified to limit the maximum quantization error,
`particularly in the high-frequency coefficients.
`
`3.3. Zigzag coefficient scanning,
`run-length coding, and
`variable-length coding
`
`After quantization, the 8 x 8 blocks of DCT co-
`efficients are scanned in a zigzag pattern (see
`Fig. 4) to turn the 2-D array into a serial string of
`quantized coefficients. Two scan patterns are de-
`fined. The first is usually preferable for picture
`material which has strong vertical frequency com-
`ponents due to, perhaps, the interlace picture struc-
`ture. In this scan pattern, there is a bias to scan
`vertical coefficients first. In the second scan pat-
`
`EBU Technical Review Winter 1995
`Ely
`
`4
`
`

`

`Y, CR or CB
`
`Line-scan
`to
`block-scan
`conversion
`
`DCT
`
`Quantization
`
`Zigzag
`scan
`
`Run-
`length
`code
`
`VLC
`
`Buffer
`
`Output
`
`Buffer occupancy control
`
`Figure 5
`Basic DCT coder.
`
`are more likely than large ones. The VLC allocates
`codes which have different lengths, depending
`upon the expected frequency of occurrence of each
`zero-run-length / non-zero coefficient value com-
`bination. Common combinations use short code
`words; less common combinations use long code
`words. All other combinations are coded by the
`combination of an escape code and two fixed-
`length codes, one 6-bit word to indicate the run
`length, and one 12-bit word to indicate the co-
`efficient value.
`
`One VLC code table is used in most circumstances.
`However, a second VLC code table is used for
`some special pictures. The DC coefficient is
`treated differently in some modes. However, all
`the VLCs are designed such that no complete code-
`word is the prefix of any other codeword: they are
`similar to the well-known Huffman code. Thus the
`decoder can identify where one variable-length
`codeword ends and another starts, when operating
`within the correct codebook. No VLC or combina-
`tion of codes is allowed to produce a sequence of
`23 contiguous zeros – this particular sequence is
`used for synchronization purposes.
`
`DC coefficients in blocks contained within intra
`macroblocks (see Section 3.7.) are differentially
`encoded before variable-length coding.
`
`3.4. Buffering and feedback
`
`Figure 6
`Basic DPCM coder.
`
`The DCT coefficient quantization, the run-length
`coding and the variable-length coding processes
`produce a varying bit-rate which depends upon the
`
`Input
`
`+ve
`
`–ve
`
`Quantizer
`
`Quantized
`prediction error
`to channel
`
`Predictor
`
`Locally-decoded
`output
`
`16
`
`complexity of the picture information and the
`amount and type of motion in the picture. To pro-
`duce the constant bit-rate needed for transmission
`over a fixed bit-rate system, a buffer is needed to
`smooth out the variations in bit-rate. To prevent
`overflow or underflow of this buffer, its occupancy
`is monitored and feedback is applied to the coding
`processes to control the input to the buffer. The
`DCT quantization process is often used to provide
`direct control of the input to the buffer: as the
`buffer becomes full, the quantizer is made coarser
`to reduce the number of bits used to code each DCT
`coefficient: as the buffer empties, the DCT quanti-
`zation is made finer. Other means of controlling
`the buffer occupancy may be used as well as, or
`instead of, the control of DCT coefficient quanti-
`zation.
`
`Fig. 5 shows a block diagram of a basic DCT codec
`with, in this example, the buffer occupancy con-
`trolled by feedback to the DCT coefficient quanti-
`zation.
`
`It is important to note that the final bit-rate at the
`output of an MPEG video encoder can be freely
`varied: if the output bit-rate is reduced, the buffer
`will empty more slowly and the coder will auto-
`matically compensate by, for example, making the
`DCT coefficient quantization coarser. But, of
`course, reducing the output bit-rate reduces the
`quality of the decoded pictures. There is no need
`to lock input sampling rates to channel bit-rates, or
`vice-versa.
`
`3.5. Reduction of temporal
`redundancy: interframe
`prediction
`
`In order to exploit the fact that pictures often
`change little from one frame to the next, MPEG
`includes temporal prediction modes: that is, we
`attempt to predict one frame to be coded from a
`previous “reference” frame.
`
`Fig. 6 illustrates a basic differential pulse code
`modulation (DPCM) coder in which we quantize
`and transmit only the differences between the input
`and a prediction based on the previous locally-
`decoded output. Note that the prediction cannot be
`
`EBU Technical Review Winter 1995
`Ely
`
`5
`
`

`

`Prediction
`error
`
`Input
`
`+ve
`
`–ve
`
`DCT
`
`Quantizer
`
`DCT of
`prediction error
`
`Inverse
`DCT
`
`Figure 7
`DCT with interframe
`prediction coder.
`
`Frame
`delay
`
`based on previous source pictures, because the pre-
`diction has to be repeatable in the decoder (where
`the source pictures are not available). Consequent-
`ly, the coder contains a local decoder which recon-
`structs pictures exactly as they would be in the
`actual decoder. The locally-decoded output then
`forms the input to the predictor. In interframe pre-
`diction, samples from one frame are used in the
`prediction of samples in other “reference” frames.
`
`In MPEG coding, we combine interframe predic-
`tion (which reduces temporal redundancy) with
`the DCT and variable-length coding tools that
`were described in Section 3.3. (which reduce spa-
`tial redundancy), as shown in Fig. 7. The coder
`subtracts the prediction from the input to form a
`prediction-error picture. The prediction error is
`transformed with the DCT, the coefficients are
`quantized, and these quantized values are coded
`using a VLC.
`
`The simplest interframe prediction is to predict a
`block of samples from the co-sited (i.e. the same
`spatial position) block in the reference frame. In
`this case the “predictor” would comprise simply a
`delay of exactly one frame, as shown in Fig. 7.
`This makes a good prediction for stationary re-
`gions of the image but is poor in moving areas.
`
`3.6. Motion-compensated
`interframe prediction
`
`A more sophisticated prediction method, known as
`motion-compensated interframe prediction, off-
`sets any translational motion which has occurred
`between the block being coded and the reference
`frame, and uses a shifted block from the reference
`frame as the prediction (see Fig. 8).
`
`EBU Technical Review Winter 1995
`Ely
`
`One method of determining the motion that has
`occurred between the block being coded and the
`reference frame is a “block-matching” search in
`which a large number of trial offsets are tested in
`the coder (see Fig. 9).
`
`The “best” offset is selected on the basis of a mea-
`surement of the minimum error between the block
`being coded and the prediction. Since MPEG de-
`
`Figure 8
`Motion-compensated
`interframe predictiom
`
`Frame n–1 (previous)
`
`Macroblock
`vector offset
`
`Non-motion-
`compensated
`prediction
`
`Frame n (current)
`
`Macroblock used for
`motion-compensated
`prediction
`
`16
`
`16
`
`Macroblock grid
`
`Position of macroblock
`in previous frame
`
`17
`
`6
`
`

`

`Search area
`
`Search block
`(macroblock)
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
`
`Search block is reference
`
`Search block moved around
`search area to find best match
`
`Figure 9
`Principle of
`block-matching
`motion.
`
`fines only the decoding process, not the coder, the
`choice of motion measurement algorithm is left to
`the designer of the coder and is an area where con-
`siderable difference in performance occurs be-
`tween different algorithms and different imple-
`mentations. A major requirement is to have a
`search area large enough to cover any motion pres-
`
`Frame n–1 (previous)
`
`ent from frame to frame. However, increasing the
`size of the search area greatly increases the proces-
`sing needed to find the best match: various tech-
`niques such as hierarchical block matching are
`used to try to overcome this dilemma.
`
`Bi-directional prediction (see Fig. 10) consists of
`forming a prediction from both the previous frame
`and the following frame, by a linear combination
`of these, shifted according to suitable motion esti-
`mates.
`
`Bi-directional prediction is particularly useful
`where motion uncovers areas of detail. However,
`to enable backward prediction from a future frame,
`the coder re-orders the pictures so that they are
`transmitted in a different order from that in which
`they are displayed. This process, and the process
`of re-ordering to the correct display sequence in
`the decoder, introduces considerable end-to-end
`processing delay which may be a problem in some
`applications. To overcome this, MPEG defines a
`profile (see Section 4) which does not use bi-
`directional prediction.
`
`Macroblock
`vector offset
`
`Intra
`prediction
`
`Frame n (current)
`
`Position of macroblock
`in frame n+1
`
`Bi-directional
`prediction
`
`Frame n+1 (next)
`
`Position of macroblock
`in frame n–1
`
`Macroblock
`vector offset
`
`Figure 10
`Motion-compensated
`bi-directional
`prediction.
`
`18
`
`EBU Technical Review Winter 1995
`Ely
`
`7
`
`

`

`Figure 11
`Motion-compensated
`interframe prediction
`DCT.
`
`Prediction
`error
`
`Input
`
`+ve
`
`–ve
`
`DCT
`
`Quantizer
`
`DCT of
`prediction error
`
`Inverse
`DCT
`
`Variable
`delay
`
`Fixed
`delay
`
`Motion
`compensation
`unit
`
`Displacement
`vectors
`
`Whereas the basic coding unit for spatial redun-
`dancy reduction in MPEG is based on an 8 x 8
`block, motion-compensation is usually based on a
`16 pixel by 16 line macroblock. The size of the
`macroblock is a trade-off between the need to
`minimize the bit-rate required to transmit the mo-
`tion representation (known as motion vectors) to
`the decoder, which supports the case for a large
`macroblock size, and the need to vary the predic-
`tion process locally within the picture content and
`movement, which supports the case for a small ma-
`croblock size.
`
`To minimize the bit-rate needed to transmit the
`motion vectors, they are differentially-encoded
`with reference to previous motion vectors. The
`motion vector value prediction error is then vari-
`able-length coded using another VLC table.
`
`Fig. 11 shows a conceptual motion-compensated
`inter-frame DCT coder in which, for simplicity, we
`illustrate implementing the process of motion-
`compensated prediction by suggesting a “variable
`delay”. In practical implementations, of course,
`the motion-compensated prediction is implement-
`ed in other ways.
`
`3.7. Prediction modes
`
`In an MPEG-2 coder, the motion-compensated
`predictor supports many methods for generating a
`
`prediction. For example, a macroblock may be
`“forward predicted” from a past picture, “back-
`ward predicted” from a future picture, or “inter-
`polated” by averaging a forward and backward
`prediction. Another option is to make a zero-
`value prediction, such that the source image block
`rather than the prediction error-block is DCT-
`coded. Such macroblocks are known as intra- or
`I-coded.
`
`Although no prediction information is needed for
`intra-macroblocks, they can carry motion vector
`information. In normal circumstances, the motion
`vector information for an I-coded macroblock is
`not used, but its function is to provide a means of
`concealing the decoding errors when data errors in
`the bit-stream make it impossible to decode the
`data for that macroblock.
`
`Fields of a frame may be predicted separately from
`their own motion vector (field prediction coding),
`or together using a common motion vector (frame
`prediction coding). Generally, in the case of image
`sequences where the motion is slow, frame predic-
`tion coding is more efficient. However, when mo-
`tion speed increases, field prediction coding be-
`comes more efficient.
`
`In addition to the two basic modes of field and
`frame prediction, two further modes have been de-
`fined:
`
`EBU Technical Review Winter 1995
`Ely
`
`19
`
`8
`
`

`

`Decoder
`input
`
`Input
`buffer
`
`VLC
`decoder
`
`Inverse
`quantizer
`
`Inverse
`DCT
`
`Motion
`vectors
`
`Difference
`picture
`
`Coded
`
`Not coded
`
`Previous
`I or P
`picture store
`
`Motion
`compensation
`
`Forward
`prediction
`
`Adder
`
`Display
`buffer
`
`Decoder
`output
`
`Interpolator
`
`Interpolated
`prediction
`
`Prediction
`
`Future
`I or P
`picture store
`
`Motion
`compensation
`
`Backward
`prediction
`
`No
`prediction
`
`Figure 12
`Decoding a “B”
`macroblock.
`
`1)16 x 8 motion compensation
`
`This mode uses at least two motion vectors for
`each macroblock: one vector is used for the up-
`per 16 x 8 region and one for the lower half. (In
`the case of B-pictures (see Section 3.8), a total
`of four motion vectors are used for each macro-
`block in this mode, since both the upper and the
`lower regions may each have motion vectors
`referring to past and future pictures.):
`
`The 16 x 16 motion compensation mode is per-
`mitted only in field-structured pictures and is
`intended to allow that, in such cases, the spatial
`area covered by each motion vector is approxi-
`mately equal to that of a 16 x 16 macroblock in
`a frame structure picture.
`
`2)Dual prime mode
`
`Figure 13
`MPEG picture types.
`
`This mode may be used in both field- and frame-
`structured coding but is only permitted in P-
`
`pictures (see Section 3.8) when there have been
`no B-pictures between the P-picture and its ref-
`erence frame. In this case, a motion vector and
`a differential-offset motion vector are trans-
`mitted.
`
`For field pictures, two motion vectors are
`derived from this data and are used to form two
`predictions from two reference fields. These
`two predictions are combined to form the final
`prediction.
`
`For frame pictures, this process is repeated for
`each of the two fields: each field is predicted
`separately, giving rise to a total of four field
`predictions which are combined to form the fi-
`nal two predictions. Dual prime mode is used
`as an alternative to bi-directional prediction,
`where low delay is required: it avoids the frame
`re-ordering needed for bi-directional prediction
`but achieves similar coding efficiency.
`
`I
`picture
`
`B
`picture
`
`B
`picture
`
`P
`picture
`
`B
`picture
`
`B
`picture
`
`P
`picture
`
`B
`picture
`
`Forward prediction
`Backward prediction
`
`Note 1: An intra-coded (I) picture is coded using information only from itself.
`Note 2: Predictive-coded (P) pictures are coded with reference to a previous I or P picture.
`Note 3: Bidirectionally-predictive (B) pictures are coded with reference to both the previous I or P picture and the next (future) I or P picture.
`
`20
`
`EBU Technical Review Winter 1995
`Ely
`
`9
`
`

`

`For each macroblock to be coded, the coder
`chooses between these prediction modes, trying to
`minimize the distortions on the decoded picture
`within the constraints of the available channel bit-
`rate. The choice of prediction mode is transmitted
`to the decoder, together with the prediction error,
`so that it can regenerate the correct prediction.
`
`Fig. 12 illustrates how a bi-directionally coded
`macroblock (a B-macroblock) is decoded. The
`switches illustrate the various prediction modes
`available for such a macroblock. Note that the
`coder has the option not to code some macro-
`blocks: no DCT coefficient information is trans-
`mitted for those blocks and the macroblock ad-
`dress counter skips to the next coded macroblock.
`The decoder output for the uncoded macroblocks
`simply comprises the predictor output.
`
`3.8. Picture Types
`
`In MPEG-2, three “picture types” are defined (see
`Fig. 13). The picture type defines which predic-
`tion modes may be used to code each macroblock:
`
`1)Intra pictures (I-pictures)
`
`These are coded without reference to other pic-
`tures. Moderate compression is achieved by
`reducing spatial redundancy but not temporal
`redundancy. They are important as they pro-
`vide access points in the bit-stream where de-
`coding can begin without reference to previous
`pictures.
`
`2)Predictive pictures (P-pictures)
`
`These are coded using motion-compensated
`prediction from a past I- or P-picture and may
`be used as a reference for further prediction. By
`reducing spatial and temporal redundancy, P-
`pictures offer increased compression compared
`to I-pictures.
`
`3)Bi-directionally-predictive pictures
`(B-pictures)
`
`These use both past and future I- or P-pictures
`for motion compensation, and offer the highest
`degree of compression. As noted above, to en-
`able backward prediction from a future frame,
`the coder re-orders the pictures from the natural
`display order to a “transmission” (or “bit-
`stream”) order so that the B-picture is trans-
`mitted after the past and future pictures which
`it references (see Fig. 14). This introduces a
`delay which depends upon the number of con-
`secutive B-pictures.
`
`3.9. Group of pictures
`
`The different picture types typically occur in a re-
`peating sequence termed a Group of Pictures or
`GOP. A typical GOP is illustrated in display order
`in Fig. 14(a) and in transmission order in Fig.
`14(b).
`
`A regular GOP structure can be described with two
`parameters:
`
`– N (the number of pictures in the GOP);
`
`– M (the spacing of the P-pictures).
`
`The GOP illustrated in Fig. 14 is described as
`N = 9 and M = 3.
`
`For a given decoded picture quality, coding – using
`each picture type – produces a different number of
`bits. In a typical sequence, a coded I-picture needs
`three times more bits than a coded P-picture, which
`itself occupied 50% more bits than a coded B-
`picture.
`
`4. MPEG profiles and levels
`
`MPEG-2 is intended to be generic, supporting a di-
`verse range of applications. Different algorithmic
`elements or “tools”, developed for many applica-
`
`Dr. Bob Ely is an R&D manager at BBC Research and Development Department, Kingswood Warren,
`Surrey, UK.
`
`Currently, he is working with the BBC’s Digital Broadcasting Project which aims to investigate the techni-
`cal and commercial feasibility of digital terrestrial broadcasting and to implement technical field-trials and
`demonstrations.
`
`After completing his PhD in computer communications systems at Daresbury Nuclear Physics Laboratory,
`Bob Ely joined BBC Research Department to work on RDS and related data transmission systems. He later
`led the BBC team which developed the Nicam digital stereo-sound-with-television system. For many years,
`he was Chairman of the EBU Specialist Group on RDS, a Vice-Chairman of Working Party R and has also
`been a member of EBU Groups on conditional access systems.
`
`EBU Technical Review Winter 1995
`Ely
`
`21
`
`10
`
`

`

`tions, have been integrated into a single bit-stream
`syntax.
`
`profile suitable for low-delay applications such
`as video conferencing.
`
`To implement the full syntax in all decoders is un-
`necessarily complex, so a small number of subsets
`or profiles of the full syntax have been defined.
`Also, within a given profile, a “level” is defined
`which describes a set of constraints such as maxi-
`mum sampling density, on parameters within the
`profile.
`
`The profiles defined to date fit together such that
`a higher profile is a superset of a lower one. A de-
`coder which supports a particular profile and level
`is only required to support the corresponding sub-
`set of the full syntax and a set of parameter
`constraints. To restrict the number of options
`which must be supported, only selected combina-
`tions of profile and level are defined as confor-
`mance points (see Table 1). These are:
`
`1)Simple profile
`
`Figure 14
`Example Group of
`Pictures (GOP).
`
`This uses no B-frames and, hence, no backward
`or interpolated prediction. Consequently, no
`picture re-ordering is required which makes this
`
`a
`
`Display order
`
`2)Main profile
`
`This adds support for B-pictures which im-
`proves the picture quality for a given bit-rate but
`increases the delay. Currently, most MPEG-2
`video decoder chip-sets support main profile.
`
`3)SNR profile
`
`This adds support for enhancement layers of
`DCT coefficient refinement, using signal-to-
`noise ratio (SNR) scalability.
`
`4)Spatial profile
`
`This adds support for enhancement layers carry-
`ing the image at different resolutions, using the
`spatial scalability tool.
`
`5)High pro

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket