`
`ifiv....-.,l...r...
`
`.f
`
`1
`
`NEULION 1025
`
`
`
`MPEG-2
`
`!
`
`John Watkinson
`
`"m
`
`”Q____:-
`
`1
`
`I
`
`(9
`Focal Press
`
`OXFORD AUCKLAND BOSTON iOHANNESBURG MELBOURNE NEW DELHI
`
`
`
`2
`
`
`
`Focal Press
`
`An imprint of Butterworth-Heinemann
`Linacre House, Jordan Hill, Oxford OX2 SDP
`225 Wildwood Avenue, Woburn, MA 01801-2041
`A division of Reed Educational and Professional Publishing Ltd
`
`& A member of the Reed Elsevier plc group
`
`First published 1999
`
`© John Watkinson 1999
`
`All rights reserved. No part of this publication may be reproduced in
`any material form (including photocopying or storing in any medium by
`electronic means and whether or not transiently or incidentally to some
`other use of this publication) without the written permission of the
`copyright holder except in accordance with the provisions of the Copyright.
`Designs and Patents Act 1988 or under the terms of a licence issued by the
`Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London,
`England Wll’ 9HE. Applications for the copyright holder’s written
`permission to reproduce any part of this publication should be addressed
`to the publishers
`
`British Library Cataloguing in Publication Data
`A catalogue record For this book is available from the British Library
`
`Library of Congress Cataloguing in Publication Data
`A catalogue record for this book is available from the Library of Congress
`
`ISBN 0 240 51510 2
`
`Composition by Genesis Typesetting, Rochester, Kent
`Printed and bound in Great Britain
`
`
`
`i.
`
`'-
`
`,
`
`f
`
`E
`"
`
`:t’
`f.
`
`“a
`._v,
`i”!
`'3
`t
`
`SE.
`is
`
`.13
`
`it
`
`g
`h;
`
`5 i
`
`.‘
`
`3;is?
`
`I
`
`{hunkTrimfur
`{ETC},
`
`(mum luJ'ItJll liri'mirrrrr
`
`o;
`
`FUR EVERY TITLE THAT WE PL'BLISII. llL'l'l'EflWORI'li-IIEINFMANN
`WILL PAY FOR BTCV T0 PLANT AND CARE FOR A TREE.
`
`
`
`
`3
`
`
`
`Contents
`
`Preface
`
`Acknowledgements
`
`Chapter 1
`
`Introduction to compression
`
`1.1 What is MPEG-2?
`
`1.2 Why compression is necessary
`1.3
`Some applications of MPEG—2
`1.4
`Lossless and perceptive coding
`1.5
`Compression principles
`1.6 Audio compression
`1.6.1
`Sub—band coding
`1.6.2 Transform coding
`1.6.3 Predictive coding
`Video compression
`1.7.1 Lntra-coded compression
`1.7.2
`Inter-coded compression
`1.7.3
`Introduction to motion compensation
`1.7.4 Film-originated video compression
`MPEG—2 profiles and levels
`MPEG—2 bitstreams
`
`1.8
`1.9
`
`1.7
`
`I
`1‘
`L
`
`1.10 Drawbacks of compression
`1.11 Compression preprocessing
`1.12 Some guidelines
`References
`
`
`
`‘E
`
`ix
`
`xi
`
`1
`
`1
`
`3
`4
`6
`8
`12
`12
`13
`13
`13
`15
`15
`17
`18
`20
`22
`
`23
`24
`25
`26
`
`
`
`4
`
`
`
`vi Contents
`
`Chapter 2 Fundamentals
`
`2.1
`2.2
`
`2.3
`2.4
`
`2.5
`2.6
`2.7
`2.8
`2.9
`2.10
`
`2.11
`2.12
`
`2.13
`2.14
`
`2.15
`2.16
`2.17
`2.18
`2.19
`2.20
`2.21
`2.22
`
`2.23
`2.24
`
`2.25
`
`What is an audio signal?
`What is a video signal?
`Types of video
`What is a digital signal?
`Introduction to conversion
`
`Sampling and aliasing
`Reconstruction
`
`Filter design
`Sampling clock jitter
`Choice of audio sampling rate
`
`Video sampling structures
`The phase-locked loop
`Quantizing
`Quantizing error
`Dither
`
`Binary codes for audio
`Binary codes for component video
`Introduction to digital processes
`
`Logic elements
`Storage elements
`Binary adding
`Gain control by multiplication
`Multiplexing principles
`Packets
`
`Statistical multiplexing
`References
`
`Chapter 3 Processing for compression
`
`3.1
`3.2
`3.3
`3.4
`3.5
`3.6
`3.7
`3.8
`3.9
`3.10
`
`Filters
`
`Downsampling filters
`The quadrature mirror filter
`Filtering for video noise reduction
`Transforms
`The Fourier transform
`
`The discrete cosine transform (DCT)
`The wavelet transform
`
`Motion compensation
`Motion-estimation techniques
`3.10.1 Block matching
`3.10.2 Gradient matching
`3.10.3 Phase correlation
`
`2'7
`
`27
`27
`28
`30
`33
`34
`37
`39
`42
`
`46
`
`49
`50
`53
`56
`58
`64
`65
`66
`68
`70
`73
`
`74
`75
`75
`76
`
`77
`
`77
`83
`87
`91
`92
`96
`104
`108
`
`110
`111
`111
`112
`113
`
`
`
`5
`
`
`
`3.11 Compression and requantizing
`References
`
`Chapter 4 Audio compression
`4.1
`Introduction
`4.2
`The ear
`4.3
`The cochlea
`4.4
`Level and loudness
`4.5
`Frequency discriminatiou
`46
`Critical bands
`4.7
`Beats
`4.8
`Codec level calibration
`4.9 Quality measurement
`4.10 The limits
`4.11 Compression applications
`4.12 History of MPEG audio coding
`4.13 MPEG audio compression tools
`4.14 Transform coding
`4.15 MPEG Layer I audio coding
`4.16 MPEG Layer II audio coding
`4.17 MPEG Layer 1H audio coding
`4.18 Dolby Arc-3
`4.19 Compression in stereo
`References
`
`Chapter 5 MPEG-2 video compression
`
`Contents
`
`vii
`
`118
`123
`
`124
`124
`125
`126
`128
`130
`131
`133
`136
`137
`138
`139
`139
`141
`144
`146
`150
`151
`151
`153
`159
`
`160
`
`The eye
`5.1
`5.2 Dynamic resolution
`5.3
`Contrast
`5.4
`Colour vision
`5.5
`Colour difference signals
`56
`Progressive or interlaced scan?
`5.7
`Spatial and temporal redundancy in MPEG
`5.8
`I and P coding
`5.9
`Bidirectional coding
`5.10 Coding applications
`5.11 Spatial compression
`5.12 Scanning and run-length/variable-length coding
`5.13 A bidirectional coder
`5.14 Slices
`
`160
`164
`168
`169
`171
`174
`179
`183
`184
`187
`188
`192
`197
`200
`
`'I
`
`I j
`
`i
`
`‘
`
`_-
`f
`
`.
`
`i
`.
`l
`'
`.
`i
`g
`=
`
`1‘
`ii
`iT:
`ii;
`S
`
`.
`
`'
`
`17
`27
`27
`
`:3
`:5:
`37
`39
`42
`44
`:3
`50
`53
`56
`58
`64
`65
`66
`68
`70
`
`73
`'77::
`75
`76
`
`77
`77
`83
`87
`91
`92
`96
`104
`108
`110
`111
`111
`112
`113
`
`I
`
`l
`
`'
`
`.
`l
`i
`l
`l
`I
`t
`i
`l
`
`l
`
`6
`
`
`
`viii Contents
`
`5.15
`5.16
`
`5.17
`5.18
`5.19
`5.20
`
`Handling interlaced pictures
`An MPEG-2 coder
`
`The Elementary Stream
`An MPEG—2 decoder
`
`Coding artifacts
`Processing MPEG-2 and concatenation
`References
`
`Chapter 6 Program and transport streams
`
`Introduction
`
`Packets and time stamps
`
`Transport streams
`Clock references
`
`Program Specific Information (PSI)
`Multiplexing
`Remultiplexing
`Reference
`
`6.1
`6.2
`6.3
`6.4
`6.5
`6.6
`
`6.7
`
`Glossary
`
`Index
`
`201
`206
`208
`209
`212
`214
`221
`
`222
`
`222
`
`222
`225
`226
`228
`
`229
`231
`232
`
`233
`
`239
`
`
`
`7
`
`
`
`.empel—
`es build
`nd short
`lly their
`JI‘I tables
`
`bles are
`
`11055. In
`of code
`
`tng used
`in string
`
`g where
`obtain a
`ting the
`ick is to
`2 human
`3d signal
`ne to the
`As these
`they can
`
`using the
`1 will be
`.9 notice—
`
`arations.
`
`masking,
`ear/brain
`presence
`ut, then it
`
`Jtizing of
`.e coarser.
`
`uantizing
`
`it the ear
`nt bands.
`ntly. The
`
`introduction to compression
`
`13
`
`quantizing error which results is confined to the frequency limits of the
`band and so it can be arranged to be masked by the program material.
`The techniques used in Layers 1 and 2 of MPEG audio are based on sub-
`band coding as are those used in DCC (Digital Compact Cassette).
`
`i.é.2
`
`Transform coding
`
`In transform coding the time—domain audio waveform is converted into
`a frequency domain representation such as a Fourier, discrete cosine or
`wavelet transform (see Chapter 3). Transform coding takes advantage of
`the fact that
`the amplitude or envelope of an audio signal changes
`relatively slowly and so the coefficients of
`the transform can be
`transmitted reiatively infrequently. Clearly Such an approach breaks
`down in the presence of transients and adaptive systems are required in
`practice. Transients cause the coefficients to be updated frequently
`whereas in stationary parts of the signal such as sustained notes the
`update rate can be reduced. Discrete cosine transform (DCT) coding is
`used in Layer III of MPEG audio and in the compression system of the
`Sony MiniDisc.
`
`1.6.3
`
`Predictive coding
`
`In a predictive coder there are two identical predictors, one in the coder
`and one in the decoder. Their job is to examine a run of previous sample
`code values and to extrapolate forward to estimate or predict what the
`next code value will be. This is subtracted from the actual next code value
`
`at the encoder to produce a prediction error which is transmitted. The
`decoder then adds the prediction error to its own prediction to obtain the
`output code value again. Predictive coders work with a short encode and
`decode delay and are useful in telephony where a long loop delay causes
`problems.
`
`1.7
`
`Video compression
`
`Video signals exist in four dimensions: these are the attributes of the
`sample,
`the horizontal and vertical spatial axes and the time axis.
`Compression can be applied in any or all of those four dimensions.
`MPEG-2 assumes 8—bit colour difference signals as the input, requiring
`rounding if the source is 10-bit. The sampling rate of the colour signals is
`less thanrthat of the luminance. This is done by downsampling the colour
`samples horizontally and generally vertically as well. Essentially an
`
`
`
`8
`
`
`
`14 MPEG—2
`
`
`
`‘
`
`
`
`
`
`"‘fl1'}._"I‘i"12"."__'
`
`'
`;
`
`i
`*1
`_
`.
`t”
`
`-
`
`-.
`.,
`.
`
`J‘
`'l
`.3,
`
`'~
`"3f
`if
`;f .'
`33:
`a,'1‘
`
`j
`l
`
`ff..—
`ith
`-
`,_
`is
`
`- g
`“Ti
`
`MPEG-2 system has three parallel simultaneous channels, one for
`luminance and two colour difference, which after coding are multiplexed
`into a single bitstream.
`Figure 1.7(a) shows that when individual pictures are compressed
`without reference to any other pictures, the time axis does not enter the
`process which is therefore described as infra-coded (intra = within)
`compression. The term spatial coding will also be found. It is an advantage
`of intra—coded video that there is no restriction to the editing which can
`be carried out on the picture sequence. As a result compressed VTRs such
`as Digital Betacam, DVC and D—9 use spatial coding. Cut editing may
`take place on the compressed data directly if necessary. As spatial coding
`treats each picture independently,
`it can employ certain techniques
`developed for the compression of still pictures. The ISO IPEG (Joint
`Photographic Experts Group) compression standard35'6 are in this
`category. Where a succession of JPEG coded images are used for
`television, the term ‘Motion IPEG' will be found.
`
`
`
`Spatial or
`intra~codlng
`explores
`redundancy
`within a picture
`
`Temporal or
`inter-coding
`explores
`redundancy
`between pictures
`
`I
`I"
`lb)
`(0) Spatial or intro-coding works on Individual images. (b) Temporal or
`Figure 1.7
`inter-coding works on successive images.
`
`Greater compression factors can be obtained by taking account of the
`redundancy from one picture to the next. This involves the time axis, as
`Fi ure 1.7(b shows, and the
`rocess is known as inter-coded inter =
`s
`p
`between) or terrrpoml compression.
`Temporal coding allows a higher compression factor, but has the
`disadvantage that an individual picture may exist only in terms of the
`differences from a previous picture. Clearly editing must be undertaken
`with caution and arbitrary cuts simply cannot be performed on the MPEG
`bitstream. If a previous picture is removed by an edit, the difference data
`will then be insufficient to recreate the current picture.
`
`
`
`9
`
`
`
`l
`
`’
`
`i
`
`.
`
`‘
`
`«r
`
`.e
`1)
`;e
`n
`'h
`.y
`ig
`as
`1t
`is
`3r
`
`3,
`
`:he
`as
`=
`
`the
`the
`:en
`EC
`ata
`
`Introduction to compression
`
`15
`
`7.7. i
`
`Intro-coded compression
`
`Intra-coding works in three dimensions on the horizontal and vertical
`spatial axes and on the sample values. Analysis of typical
`television
`pictures reveals that while there is a high spatial frequency content due to
`detailed areas of the picture, there is a relatively small amount of energy
`at such frequencies. Often pictures contain sizeable areas in which the
`same or similar pixel values exist. This gives rise to low spatial
`frequencies. The average brightness of the picture results in a substantial
`zero-frequency component. Simply omitting the high—frequency compo-
`nents is unacceptable as this causes an obvious softening of the picture.
`A coding gain can be obtained by taking advantage of the fact that the
`amplitude of the spatial components falls with frequency. It
`is also
`possible to take advantage of the eye’s reduced sensitivity to noise in high
`spatial frequencies. If the spatial frequency spectrum is divided into
`frequency bands the high-frequency bands can be described by fewer bits
`not only because their amplitudes are smaller but also because more noise
`can be tolerated. The wavelet transform and the discrete cosine transform
`
`used in MPEG allows two-dimensional pictures to be described in the
`frequency domain and these are discussed in Chapter 3.
`
`1.7.2
`
`inter—coded compression
`
`
`
`Inter-coding takes further advantage of the similarities betWeen succes-
`sive pictures in real material. instead of sending information for each
`picture separately,
`inter-coders will send the difference between the
`previous picture and the current picture in a form of differential coding.
`Figure 1.8 shows the principle. A picture store is required at the coder to
`allow comparison to be made between successive pictures and a similar
`store is required at the decoder to make the previous picture available.
`The difference data may be treated as a picture itself and subjected to
`some form of transform—based spatial compression.
`The simple system of Figure 1.8(a) is of limited use as in the case of a
`transmission error, every subsequent picture would be affected. Channel
`switching in a television set would also be impossible. In practical
`systems a modification is required. One approach is the so-called ’leaky
`predictor’ in which the next picture is predicted from a limited number of
`previous pictures rather than from an indefinite number. As a result
`errors cannot propagate indefinitely. The approach used in MPEG is that
`periodically some absolute picture data are transmitted in place of
`difference data.
`Figure 1.8(b) shows that absolute picture data, known as I or intm
`pictures are interleaved with pictures which are created using difference
`
`1'1
`
`_
`1‘
`
`‘
`l
`
`l
`'
`
`ll 1
`it; "‘
`
`
`
`10
`
`
`
`16 MPEG—2
`
`............
`
`Input
`
`pixel
`
`
`Pixel
`in
`previous
`picture
`
`
`
`
`Time
`
`Y
`
`1 picture delay
`
`Input
`
`Pixel
`difference
`
`(a)
`
`l
`
`D
`
`D
`
`D
`
`I
`
`
`Send
`picture
`
`Send
`difference
`
`Send
`difference
`
`
`Send
`difference
`
`Send
`picture
`
`Send
`difference
`
`I = Intracoded-picture
`D = Differentially coded picture
`
`(*9)
`Figure 1.8 An inter-coded system (0) uses a delay to calculate the pixel
`differences between successive pictures, To prevent error propagation,
`infra—coded pictures (to) may be used periodically.
`
`data, known as P or predicted pictures. The 1 pictures require a large
`amount of data, whereas the P pictures require less data. As a result the
`instantaneous data rate varies dramatically and buffering has to be used
`to allow a constant transmission rate. The leaky predictor needs less
`buffering as the compression factor does not change so much from picture
`to picture.
`e called
`The 1 picture and all the P pictures prior to the next I picture ar
`a group of pictures (GOP). For a high compression factor, a large number
`of P pictures should be present between 1' pictures, making a long GOP-
`
`
`
`-T:"_Fla$r£um
`
`
`
`eq-
`
`'-...:..-.,...~:u.-i.r9'
`
`
`
`11
`
`
`
`introduction to compression
`
`17
`
`However, a long GOP delays recovery from a transmission error. The
`compressed bitstream can only be edited at I pictures as shown.
`In the case of moving objects, although their appearance may not
`change greatly from picture to picture, the data representing them on a
`fixed sampling grid will change and so large differences will be generated
`between successive pictures. It is a great advantage if the effect of motion
`can be removed from difference data so that they only reflect the changes
`in appearance of a moving object since a much greater coding gain can
`then be obtained. This is the objective of motion compensation.
`
`i.7.3
`
`introduction to motion compensation
`
`In real television program material objects move around before a fixed
`camera or the camera itself moves. Motion compensation is a process
`which effectively measures motion of objects from one picture to the next
`so that it can allow for that motion when looking for redundancy between
`pictures. Figure 1.9 shows that moving pictures can be expressed in a
`three-dimensional space which results from the screen area moving along
`the time axis. In the case of still objects, the only motion is along the time
`axis. However, when an object moves, it does so along the optic flow axis
`which is not parallel to the time axis. The optic flow axis joins the same
`point on a moving object as it takes on various screen positions.
`
`
`
`
`Optic flaw
`axis
`
`
`
`
`
`Horizontal
`position
`
`
`
`position
`
`
`
`
`Figure 1.9 Objects trove] in a three-dimensional space along the optic flow axis
`which is only parallel to the time axis if there is no movement.
`
`Time
`
`re
`re
`
`3d
`55
`re
`
`er
`
`)P.
`
`
`
`12
`
`
`
`MPEG-2 video compression
`
`183
`
`how motion is to be measured; it simply defines how a decoder will
`interpret the vectors. Encoder designers are free to use any motion-
`estimation system provided that the right vector protocol
`is created.
`Chapter 3 contrasted a number of motion-estimation techniques.
`Figure 5.21(a) shows that a macroblock contains both luminance and
`colour difference data at different resolutions. Most of the MPEG-2
`Profiles use a 4:2:0 structure which means that the colour is down—
`sampled by a factor of two in both axes. Thus in a 16 X 16 pixel block,
`there are only 8 X 8 colour difference sampling sites. MPEG—2 is based
`upon the 8 X 8 DCT (see section 3.7) and so the 16 X 16 block is the screen
`area which contains an 8 X 8 colour difference sampling block. Thus in
`4:220 in each macroblock there are four luminance DCT blocks, one R — Y
`DCT block and one B ~ Y DCT block, all steered by the same vector.
`
`1xCr
`
`16
`
`8
`
`B
`
`>16
`
`—-
`
`4xY
`
`4:2:0
`
`16
`
`_
`'
`4 - 2 - 2
`
`> 15
`
`B
`
`8
`
`—-
`
`4 x Y
`
`(8)
`
`2 x Cr
`
`2 x Cb
`
`(b)
`
`Figure 5.21 The structure of o mocrobiock. (A mocroblock is the screen oreo
`steered by one vector.) to) In 4:2:0. there are two chromo DCT blooks per
`mocroblock whereos in 4:22 (b) there ore four. 4:2:2 needs 33% more data than
`
`4:20.
`In the 42:2 Profile of MPEG—2, shown in Figure 5.2103), the chroma is
`not downsampled vertically, and so there is twice as much chroma data
`in each macroblock which is otherwise substantially the same.
`
`5.8
`
`I and P coding
`
`Predictive (P) coding cannot be used indefinitely, as it is prone to error
`propagation. A further problem is that it becomes impossible to decode
`the transmission if reception begins part-way through. In real video
`
`3::
`
`iction
`
`d
`
`;e
`ture. (b)
`
`n—ideot.
`)und. or
`Ming
`
`re moving
`
`undaryof
`the an?“
`.e movmg
`cause the
`der might
`TM data.
`naccurate
`ror data is
`m motion
`achieved
`lot specify
`
`i
`
`i
`
`.
`
`I
`
`.
`=
`
`i
`
`1
`i
`I
`g
`'
`l
`I
`I
`1
`i
`5
`i
`
`9
`
`13
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`184 MPEG-2
`
`little
`signals, cuts or edits can be present across which there is
`redundancy and which make motion estimators throw up their hands.
`In the absence of redundancy over a cut, there is nothing to be done but to
`send the new picture information in absolute form. This is called l coding
`where l is an abbreviation of infra coding. As i coding needs no previous
`picture for decoding, then decoding can begin at I coded information.
`MPEG-2 is effectively a toolkit and there is no compulsion to use all the
`tools available. Thus an encoder may choose whether to use i or P coding,
`either once and for all or dynamically on a macroblock by macroblock
`basis.
`For practical reasons, an entire frame may be encoded as l macroblocks
`periodically. This creates a place where the bitstream might be edited or
`where decoding could begin.
`Figure 5.22 shows a typical application of the Simple Profile of MPEG-
`2. Periodically an I picture is created. Between 1' pictures are P pictures
`which are based on the picture before. These P pictures predominantly
`contain macroblocks having vectors and prediction errors. However, it is
`perfectly legal for P pictures to contain I macroblocks. This might be
`useful where, for example, a camera pan introduces new material at the
`edge of the screen which cannot be created from an earlier picture.
`
`lam/ohm ; ..W‘x . Fix
`'1 PPPPli PPPPli
`P
`.._—‘—————I'-
`a
`GOP
`
`I = lntra—coded picture
`P = Predicted picture
`""x = Picture difference
`
`(vectors plus prediction error)
`Figure 5.22 A Simple Profile MPEG-2 signal may contoin periodic i pictures with 0
`number of P pictures between.
`
`Note that although what is sent is called a P picture, it is not a picture
`at all. It is a set of instructions to convert the previous picture into the
`current picture. If the previous picture is lost, decoding is impossible. An
`I picture together with all of the pictures before the next I picture form a
`Group of Pictures (GOP).
`
`5.9
`
`Bidirectional coding
`
`Motion-compensated predictive coding is a useful compression tech-
`nique, but it does have the drawback that it can only take data from 2
`previous picture. Where moving objects reveal a background this i:
`completely unknown in previous pictures and forward prediction fails
`
`
`
`14
`
`
`
`5
`
`little
`
`lands.
`e but to
`
`coding
`revious
`in.
`
`3 all the
`
`:oding,
`roblock
`
`)blocks
`lited or
`
`MPEG—
`:ictures
`
`inantly
`rer, it is
`
`with 0
`
`picture
`nto the
`ble. An
`
`form a
`
`n tech-
`
`from a
`this is
`In fails.
`
`MPEG—2 video compression
`
`185
`
`T=N
`
`T=N+1
`
`Revealed area is
`not in picture N
`but is in picture
`N + 2
`
`T=N+2
`
`Figure 5.23 In bidirectional coding the revealed background can be efficiently
`coded by bringing doto book from 0 future picture.
`
`However, more of the background is visible in later pictures. Figure 5.23
`shows the concept. In the centre of the diagram, 3 moving object has
`revealed some background. The previous picture can contribute nothing,
`whereas the next picture contains all that is required.
`Bidirectional coding is shown in Figure 5.24. A bidirectional or B
`macroblock can be created using a combination of motion compensation
`
`F(“Ward prediction
`
`Iflflfl
`
`Bidirectional
`prediction
`
`1 = Intra-or spatially coded
`'anchor' picture
`P 2 Forward predicted. Coder sends
`difference between land P decoder.
`Adds ditterence to create P
`
`B = Bidirectionally coded picture can be
`coded from a previous
`ior P picture or a later [or Ppicture.
`B piclures are not coded from each other
`
`In bidirectional coding. a number of 8 pictures can be inserted
`Figure 5.24
`between periodlc forward predicted pictures. See text.
`
`
`
`15
`
`
`
`
`
`186 MPEG-2
`
`and the addition of a prediction error. This can be done by forward
`prediction from a previous picture or backward prediction from a
`subsequent picture. It is also possible to use an average of both forward
`and backward prediction. On noisy material this may result in some
`reduction in bit rate. The technique is also a useful way of portraying a
`dissolve.
`The averaging process in MPEG—2 is a simple linear interpolation
`which works well when only one B picture exists between the reference
`pictures before and after. A larger number of B pictures would require
`weighted interpolation but MPEG—2 does not support this.
`Typically two B pictures are inserted between P pictures or between I
`and P pictures. As can be seen, B pictures are never predicted from one
`another, only from I or P pictures. A typical GOP for broadcasting
`purposes might have the structure IBBPBBPBBPBB. Note that the last B
`pictures in the GOP require the l' picture in the next GOP for decoding
`and so the GOPs are not
`truly independent. Independence can be
`obtained by creating a closed GOP which may contain B pictures but
`which ends with a P picture. It is also legal to have a B picture in which
`every macroblock is forward predicted, needing no future picture for
`decoding.
`Bidirectional coding is very powerful. Figure 5.25 is a constant quality
`curve showing how the bit rate changes with the type of coding. On the
`left, only I or spatial coding is used, whereas on the right an IBBP
`structure is used. This means that there are two bidirectionally coded
`pictures in between a spatially coded picture (I) and a forward predicted
`picture (P). Note how for the same quality the system which only uses
`spatial coding needs two and a half
`times the bit
`rate that
`the
`bidirectionally coded system needs.
`
`AO
`
`01D
`
`
`
`rate(Mbit{5)
`
`ND Bit
`
`l
`
`_. O
`
`”see
`is
`I
`Figure 5.25 Bidirectional coding is very powerful as it oilows the some quality with
`only 40% of the bit rote of intro-coding. However, the encoding and decoding
`deloys must increase. Coding over Cl longer time span is more efficient but editing
`is more difficult.
`
`
`
`16
`
`
`
`MPEG-2 video compression
`
`187
`
`Clearly information in the future has yet to be transmitted and so is not
`normally available to the decoder. MPEG—2 gets aroLmd the problem by
`sending pictures in the wrong order. Picture reordering requires delay in
`the encoder and a delay in the decoder to put the order right again. Thus
`the overall coder: delay must rise when bidirectional coding is used. This
`is quite consistent with Figure 1.5 which showed that as the compression
`factor rises the latency must also rise.
`Figure 5.26 shows that although the original picture sequence is
`IBBPBBPBBIBB .
`. ., this is transmitted as IPBBPBBIBB .
`.
`. so that the
`future picture is already in the decoder before bidirectional decoding
`begins. Note that the I picture of the next GOP is actually sent before the
`last 3 pictures of the current GOP.
`
`
`
`As transmitted
`Figure 5.26 Comparison of pictures before and after compression showing
`sequence change and varying amount of data needed by each picture type. i,
`P. B pictures use unequal amounts at data.
`
`Figure 5.26 also shows that the amount of data required by each picture
`is dramatically different. 1 pictures have only spatial redundancy and so
`need a lot of data to describe them. P pictures need less data because they
`are created by shifting the I picture with vectors and then adding a
`prediction error picture. 8 pictures need the least data of all because they
`can be created from I or P.
`With pictures requiring a variable length of time to transmit, arriving in
`the wrong order, the decoder needs some help. This takes the form of
`
`picture—type flags and time stamps which will be described in section 6.2.
`
`5.10
`
`Coding applications
`
`Figure 5.27 shows a variety of GOP structures. The simplest is the III .
`.
`.
`sequence in which every picture is intra-coded. Pictures can be fully
`decoded without reference to any other pictures and so editing is
`
`;:
`
`i
`l
`
`.
`|
`5
`
`i
`!
`
`l
`
`:
`
`‘
`
`‘.
`1
`l
`
`l l
`
`I
`
`rward
`'0111 a
`rward
`some
`ying a
`
`ilation
`arence
`squire
`
`veen I
`in one
`asting
`last B
`:oding
`an be
`as but
`
`which
`.re for
`
`[uality
`)n the
`IBBP
`ngEd
`d1CtEd
`y uses
`at
`the
`
`W WET“
`:cllting
`
`
`
`17
`
`
`
`214 MPEG-2
`
`magnitude of the prediction errors. The sub-carrier level may be low but
`it can be present over
`the whole screen and require an excess of
`coefficients to describe it.
`
`Composite video should not in general be used as a source for MPEG-2
`encoding, but where this is inevitable the standard of the decoder must be
`much higher than average, especially in the residual sub-carrier specifica-
`tion. Some MPEG preprocessors support high-grade composite decoding
`options.
`Judder from conventional linear standards convertors degrades the
`performance of MPEG-2. The optic flow axis is corrupted and linear
`filtering causes multiple images which confuse motion estimators and
`result in larger prediction errors. If standards conversion is necessary, the
`MPEG-2 system must be used to encode the signal in its original format
`and the standards convertor should be installed after the decoder. If a
`
`standards convertor has to be used before the encoder, then it must be a
`
`type which has effective motion compensation.
`Film weave causes movement of one picture with respect to the next and
`this results in more vector activity and larger prediction errors. Movement
`of the centre of the film frame along the optical axis causes magnification
`changes which also result in excess prediction error data. Film grain has the
`same effect as noise: it is random and so cannot be compressed.
`Perhaps because it is relatively uncommon, MPEG-2 cannot handle
`image rotation well because the motion—compensation system is only
`designed for translational motion. Where a rotating object is highly
`detailed, such as in certain fairground rides, the motion-compensation
`failure requires a significant amount of prediction error data and if a
`suitable bit rate is not available the level of artifacts will rise.
`
`Flash guns used by still photographers are a serious hazard to MPEG-2
`especially when long GOPs are used. At a press conference where a series
`of flashes may occur,
`the resultant video contains intermittent white
`frames which defeat prediction. A huge prediction error is required to
`turn the previous picture into a white picture, followed by another huge
`prediction error to return the white frame to the next picture. The output
`buffer fills and heavy requantizing is employed. After a few flashes the
`picture has generally gone to tiles.
`
`5.20
`
`Processing MPEG-2 and concatenation
`
`Concatenation loss occurs when the losses introduced by one codec are
`compounded by a second codec. All practical compressers, MPEG—2
`included, are lossy because what comes out of the decoder is not bit—
`identical to what went into the encoder. The bit differences are controlled
`
`so that they have minimum visibility to a human viewer.
`
`
`
`18
`
`
`
`3w but-
`:ess of
`
`[PEG—2
`nust be
`ecifica-
`
`coding
`
`[es the
`linear
`-rs and
`
`try, the
`format
`
`er. If 3
`.st be at
`
`ext and
`rement
`
`ication
`has the
`
`handle
`
`-.s only
`highly
`isation
`
`1d if a
`
`[PEG-2
`15eries
`white
`ired to
`
`-r huge
`output
`ies the
`
`iec are
`
`[PEG-2
`
`lOt bit-
`trolled
`
`MPEG-2 video compression
`
`215
`
`MPEG—2 is a toolbox which allows a variety of manipulations to be
`performed in both the spatial and the temporal domain. There is a limit
`to the compression which can be used on a single frame, and if higher
`compression factors are needed, temporal coding will have to be used.
`The longer the run of pictures considered, the lower the bit rate needed,
`but the harder it becomes to edit.
`
`The most editable form of MPEG-2 is to use I pictures only. As there is
`no temporal coding, pure cut edits can be made between pictures. The
`next best thing is to use a repeating [B structure which is locked to the
`odd / even field structure. Cut edits cannot be made as the B pictures are
`bidirectionally coded and need data from both adjacent I pictures for
`decoding. The B picture has to be decoded prior to the edit and re-
`encoded after the edit. This will cause a small concatenation loss.
`
`Beyond the 13 structure processing gets harder. if a long GOP is used
`for the best compression factor, an lBBPBBP .
`.
`. structure results. Editing
`this is very difficult because the pictures are sent out of order so that
`bidirectional decoding can be used. MPEG allows closed GOPs where the
`last B picture is coded wholly from the previous pictures and does not
`need the 1' picture in the next GOP. The bitstream can be switched at this
`point but only if the GOP structures in the two source video signals are
`synchronized (makes colour
`framing seem easy). Consequently in
`practice a long GOP bitstream will need to be decoded prior to any
`production step. Afterwards it will need to be re—encoded.
`This is known as naive concatenation and an enormous pitfall awaits.
`Unless the GOP structure of the output is identical to and synchronized
`with the input the results will be disappointing. The worst case is where
`an 1' picture is encoded from a picture which was formerly a B picture. It
`is easy enough to lock the GOP structure of a coder to a single input, but
`if an edit is made between two inputs, the GOP timings could well be
`different.
`
`As there are so many structures allowed in MPEG, there will be a need
`to convert between them. If this has to be done, it should only be in the
`direction which increases the GOP length and reduces the bit rate. Going
`the other way is inadvisable. The ideal way of converting from, say, the
`18 structure of a news system to the fBBP structure of an emission system
`is to use a recompressor. This is a kind of standards converter which will
`give better results than a decode followed by an encode.
`The DCT part of MPEG-2 itself is lossless. If all the coefficients are
`preserved intact an inverse transform yields the same pixel data.
`Unfortunately this does not yield enough compression for many
`applications.
`in practice the coefficients are made less accurate by
`removing bits, starting at the least significant end and working upwards.
`This process is weighted, or made progressively more aggressive as
`spatial frequency increases. Small-value coefficients may be truncated to
`
`
`
`19
`
`
`
`216 MPEG—2
`
`zero and large-value coefficients are most cearsely truncated at high
`spatial frequencies where the effect is least visible.
`Figure 5.43(a) shows what happens in the ideal case where two identical
`coders are put in tandem and synchronized. The first coder quantizes the
`coefficients to finite accuracy and causes a loss on decoding. However,
`when the second coder performs the DCT calculation, the coefficients
`obtained will be identical to the quantized coefficients in the first coder
`and so if the second weighting and requantizing step is identical the same
`truncated coefficient data will result and there will be no further loss of
`
`quality.3
`
`Same
`quality
`Reduced
`quality “——’
`
`i
`'
`
`Coder makes
`decisions and
`approximations
`
`in
`
`(a)
`
`
`
`COdeC
`
`' OUt
`
`C