throbber
S5mDIdC0rr
`
`ifiv....-.,l...r...
`
`.f
`
`1
`
`NEULION 1025
`
`

`

`MPEG-2
`
`!
`
`John Watkinson
`
`"m
`
`”Q____:-
`
`1
`
`I
`
`(9
`Focal Press
`
`OXFORD AUCKLAND BOSTON iOHANNESBURG MELBOURNE NEW DELHI
`
`
`
`2
`
`

`

`Focal Press
`
`An imprint of Butterworth-Heinemann
`Linacre House, Jordan Hill, Oxford OX2 SDP
`225 Wildwood Avenue, Woburn, MA 01801-2041
`A division of Reed Educational and Professional Publishing Ltd
`
`& A member of the Reed Elsevier plc group
`
`First published 1999
`
`© John Watkinson 1999
`
`All rights reserved. No part of this publication may be reproduced in
`any material form (including photocopying or storing in any medium by
`electronic means and whether or not transiently or incidentally to some
`other use of this publication) without the written permission of the
`copyright holder except in accordance with the provisions of the Copyright.
`Designs and Patents Act 1988 or under the terms of a licence issued by the
`Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London,
`England Wll’ 9HE. Applications for the copyright holder’s written
`permission to reproduce any part of this publication should be addressed
`to the publishers
`
`British Library Cataloguing in Publication Data
`A catalogue record For this book is available from the British Library
`
`Library of Congress Cataloguing in Publication Data
`A catalogue record for this book is available from the Library of Congress
`
`ISBN 0 240 51510 2
`
`Composition by Genesis Typesetting, Rochester, Kent
`Printed and bound in Great Britain
`
`
`
`i.
`
`'-
`
`,
`
`f
`
`E
`"
`
`:t’
`f.
`
`“a
`._v,
`i”!
`'3
`t
`
`SE.
`is
`
`.13
`
`it
`
`g
`h;
`
`5 i
`
`.‘
`
`3;is?
`
`I
`
`{hunkTrimfur
`{ETC},
`
`(mum luJ'ItJll liri'mirrrrr
`
`o;
`
`FUR EVERY TITLE THAT WE PL'BLISII. llL'l'l'EflWORI'li-IIEINFMANN
`WILL PAY FOR BTCV T0 PLANT AND CARE FOR A TREE.
`
`
`
`
`3
`
`

`

`Contents
`
`Preface
`
`Acknowledgements
`
`Chapter 1
`
`Introduction to compression
`
`1.1 What is MPEG-2?
`
`1.2 Why compression is necessary
`1.3
`Some applications of MPEG—2
`1.4
`Lossless and perceptive coding
`1.5
`Compression principles
`1.6 Audio compression
`1.6.1
`Sub—band coding
`1.6.2 Transform coding
`1.6.3 Predictive coding
`Video compression
`1.7.1 Lntra-coded compression
`1.7.2
`Inter-coded compression
`1.7.3
`Introduction to motion compensation
`1.7.4 Film-originated video compression
`MPEG—2 profiles and levels
`MPEG—2 bitstreams
`
`1.8
`1.9
`
`1.7
`
`I
`1‘
`L
`
`1.10 Drawbacks of compression
`1.11 Compression preprocessing
`1.12 Some guidelines
`References
`
`
`
`‘E
`
`ix
`
`xi
`
`1
`
`1
`
`3
`4
`6
`8
`12
`12
`13
`13
`13
`15
`15
`17
`18
`20
`22
`
`23
`24
`25
`26
`
`
`
`4
`
`

`

`vi Contents
`
`Chapter 2 Fundamentals
`
`2.1
`2.2
`
`2.3
`2.4
`
`2.5
`2.6
`2.7
`2.8
`2.9
`2.10
`
`2.11
`2.12
`
`2.13
`2.14
`
`2.15
`2.16
`2.17
`2.18
`2.19
`2.20
`2.21
`2.22
`
`2.23
`2.24
`
`2.25
`
`What is an audio signal?
`What is a video signal?
`Types of video
`What is a digital signal?
`Introduction to conversion
`
`Sampling and aliasing
`Reconstruction
`
`Filter design
`Sampling clock jitter
`Choice of audio sampling rate
`
`Video sampling structures
`The phase-locked loop
`Quantizing
`Quantizing error
`Dither
`
`Binary codes for audio
`Binary codes for component video
`Introduction to digital processes
`
`Logic elements
`Storage elements
`Binary adding
`Gain control by multiplication
`Multiplexing principles
`Packets
`
`Statistical multiplexing
`References
`
`Chapter 3 Processing for compression
`
`3.1
`3.2
`3.3
`3.4
`3.5
`3.6
`3.7
`3.8
`3.9
`3.10
`
`Filters
`
`Downsampling filters
`The quadrature mirror filter
`Filtering for video noise reduction
`Transforms
`The Fourier transform
`
`The discrete cosine transform (DCT)
`The wavelet transform
`
`Motion compensation
`Motion-estimation techniques
`3.10.1 Block matching
`3.10.2 Gradient matching
`3.10.3 Phase correlation
`
`2'7
`
`27
`27
`28
`30
`33
`34
`37
`39
`42
`
`46
`
`49
`50
`53
`56
`58
`64
`65
`66
`68
`70
`73
`
`74
`75
`75
`76
`
`77
`
`77
`83
`87
`91
`92
`96
`104
`108
`
`110
`111
`111
`112
`113
`
`
`
`5
`
`

`

`3.11 Compression and requantizing
`References
`
`Chapter 4 Audio compression
`4.1
`Introduction
`4.2
`The ear
`4.3
`The cochlea
`4.4
`Level and loudness
`4.5
`Frequency discriminatiou
`46
`Critical bands
`4.7
`Beats
`4.8
`Codec level calibration
`4.9 Quality measurement
`4.10 The limits
`4.11 Compression applications
`4.12 History of MPEG audio coding
`4.13 MPEG audio compression tools
`4.14 Transform coding
`4.15 MPEG Layer I audio coding
`4.16 MPEG Layer II audio coding
`4.17 MPEG Layer 1H audio coding
`4.18 Dolby Arc-3
`4.19 Compression in stereo
`References
`
`Chapter 5 MPEG-2 video compression
`
`Contents
`
`vii
`
`118
`123
`
`124
`124
`125
`126
`128
`130
`131
`133
`136
`137
`138
`139
`139
`141
`144
`146
`150
`151
`151
`153
`159
`
`160
`
`The eye
`5.1
`5.2 Dynamic resolution
`5.3
`Contrast
`5.4
`Colour vision
`5.5
`Colour difference signals
`56
`Progressive or interlaced scan?
`5.7
`Spatial and temporal redundancy in MPEG
`5.8
`I and P coding
`5.9
`Bidirectional coding
`5.10 Coding applications
`5.11 Spatial compression
`5.12 Scanning and run-length/variable-length coding
`5.13 A bidirectional coder
`5.14 Slices
`
`160
`164
`168
`169
`171
`174
`179
`183
`184
`187
`188
`192
`197
`200
`
`'I
`
`I j
`
`i
`
`‘
`
`_-
`f
`
`.
`
`i
`.
`l
`'
`.
`i
`g
`=
`
`1‘
`ii
`iT:
`ii;
`S
`
`.
`
`'
`
`17
`27
`27
`
`:3
`:5:
`37
`39
`42
`44
`:3
`50
`53
`56
`58
`64
`65
`66
`68
`70
`
`73
`'77::
`75
`76
`
`77
`77
`83
`87
`91
`92
`96
`104
`108
`110
`111
`111
`112
`113
`
`I
`
`l
`
`'
`
`.
`l
`i
`l
`l
`I
`t
`i
`l
`
`l
`
`6
`
`

`

`viii Contents
`
`5.15
`5.16
`
`5.17
`5.18
`5.19
`5.20
`
`Handling interlaced pictures
`An MPEG-2 coder
`
`The Elementary Stream
`An MPEG—2 decoder
`
`Coding artifacts
`Processing MPEG-2 and concatenation
`References
`
`Chapter 6 Program and transport streams
`
`Introduction
`
`Packets and time stamps
`
`Transport streams
`Clock references
`
`Program Specific Information (PSI)
`Multiplexing
`Remultiplexing
`Reference
`
`6.1
`6.2
`6.3
`6.4
`6.5
`6.6
`
`6.7
`
`Glossary
`
`Index
`
`201
`206
`208
`209
`212
`214
`221
`
`222
`
`222
`
`222
`225
`226
`228
`
`229
`231
`232
`
`233
`
`239
`
`
`
`7
`
`

`

`.empel—
`es build
`nd short
`lly their
`JI‘I tables
`
`bles are
`
`11055. In
`of code
`
`tng used
`in string
`
`g where
`obtain a
`ting the
`ick is to
`2 human
`3d signal
`ne to the
`As these
`they can
`
`using the
`1 will be
`.9 notice—
`
`arations.
`
`masking,
`ear/brain
`presence
`ut, then it
`
`Jtizing of
`.e coarser.
`
`uantizing
`
`it the ear
`nt bands.
`ntly. The
`
`introduction to compression
`
`13
`
`quantizing error which results is confined to the frequency limits of the
`band and so it can be arranged to be masked by the program material.
`The techniques used in Layers 1 and 2 of MPEG audio are based on sub-
`band coding as are those used in DCC (Digital Compact Cassette).
`
`i.é.2
`
`Transform coding
`
`In transform coding the time—domain audio waveform is converted into
`a frequency domain representation such as a Fourier, discrete cosine or
`wavelet transform (see Chapter 3). Transform coding takes advantage of
`the fact that
`the amplitude or envelope of an audio signal changes
`relatively slowly and so the coefficients of
`the transform can be
`transmitted reiatively infrequently. Clearly Such an approach breaks
`down in the presence of transients and adaptive systems are required in
`practice. Transients cause the coefficients to be updated frequently
`whereas in stationary parts of the signal such as sustained notes the
`update rate can be reduced. Discrete cosine transform (DCT) coding is
`used in Layer III of MPEG audio and in the compression system of the
`Sony MiniDisc.
`
`1.6.3
`
`Predictive coding
`
`In a predictive coder there are two identical predictors, one in the coder
`and one in the decoder. Their job is to examine a run of previous sample
`code values and to extrapolate forward to estimate or predict what the
`next code value will be. This is subtracted from the actual next code value
`
`at the encoder to produce a prediction error which is transmitted. The
`decoder then adds the prediction error to its own prediction to obtain the
`output code value again. Predictive coders work with a short encode and
`decode delay and are useful in telephony where a long loop delay causes
`problems.
`
`1.7
`
`Video compression
`
`Video signals exist in four dimensions: these are the attributes of the
`sample,
`the horizontal and vertical spatial axes and the time axis.
`Compression can be applied in any or all of those four dimensions.
`MPEG-2 assumes 8—bit colour difference signals as the input, requiring
`rounding if the source is 10-bit. The sampling rate of the colour signals is
`less thanrthat of the luminance. This is done by downsampling the colour
`samples horizontally and generally vertically as well. Essentially an
`
`
`
`8
`
`

`

`14 MPEG—2
`
`
`
`‘
`
`
`
`
`
`"‘fl1'}._"I‘i"12"."__'
`
`'
`;
`
`i
`*1
`_
`.
`t”
`
`-
`
`-.
`.,
`.
`
`J‘
`'l
`.3,
`
`'~
`"3f
`if
`;f .'
`33:
`a,'1‘
`
`j
`l
`
`ff..—
`ith
`-
`,_
`is
`
`- g
`“Ti
`
`MPEG-2 system has three parallel simultaneous channels, one for
`luminance and two colour difference, which after coding are multiplexed
`into a single bitstream.
`Figure 1.7(a) shows that when individual pictures are compressed
`without reference to any other pictures, the time axis does not enter the
`process which is therefore described as infra-coded (intra = within)
`compression. The term spatial coding will also be found. It is an advantage
`of intra—coded video that there is no restriction to the editing which can
`be carried out on the picture sequence. As a result compressed VTRs such
`as Digital Betacam, DVC and D—9 use spatial coding. Cut editing may
`take place on the compressed data directly if necessary. As spatial coding
`treats each picture independently,
`it can employ certain techniques
`developed for the compression of still pictures. The ISO IPEG (Joint
`Photographic Experts Group) compression standard35'6 are in this
`category. Where a succession of JPEG coded images are used for
`television, the term ‘Motion IPEG' will be found.
`
`
`
`Spatial or
`intra~codlng
`explores
`redundancy
`within a picture
`
`Temporal or
`inter-coding
`explores
`redundancy
`between pictures
`
`I
`I"
`lb)
`(0) Spatial or intro-coding works on Individual images. (b) Temporal or
`Figure 1.7
`inter-coding works on successive images.
`
`Greater compression factors can be obtained by taking account of the
`redundancy from one picture to the next. This involves the time axis, as
`Fi ure 1.7(b shows, and the
`rocess is known as inter-coded inter =
`s
`p
`between) or terrrpoml compression.
`Temporal coding allows a higher compression factor, but has the
`disadvantage that an individual picture may exist only in terms of the
`differences from a previous picture. Clearly editing must be undertaken
`with caution and arbitrary cuts simply cannot be performed on the MPEG
`bitstream. If a previous picture is removed by an edit, the difference data
`will then be insufficient to recreate the current picture.
`
`
`
`9
`
`

`

`l
`
`’
`
`i
`
`.
`
`‘
`
`«r
`
`.e
`1)
`;e
`n
`'h
`.y
`ig
`as
`1t
`is
`3r
`
`3,
`
`:he
`as
`=
`
`the
`the
`:en
`EC
`ata
`
`Introduction to compression
`
`15
`
`7.7. i
`
`Intro-coded compression
`
`Intra-coding works in three dimensions on the horizontal and vertical
`spatial axes and on the sample values. Analysis of typical
`television
`pictures reveals that while there is a high spatial frequency content due to
`detailed areas of the picture, there is a relatively small amount of energy
`at such frequencies. Often pictures contain sizeable areas in which the
`same or similar pixel values exist. This gives rise to low spatial
`frequencies. The average brightness of the picture results in a substantial
`zero-frequency component. Simply omitting the high—frequency compo-
`nents is unacceptable as this causes an obvious softening of the picture.
`A coding gain can be obtained by taking advantage of the fact that the
`amplitude of the spatial components falls with frequency. It
`is also
`possible to take advantage of the eye’s reduced sensitivity to noise in high
`spatial frequencies. If the spatial frequency spectrum is divided into
`frequency bands the high-frequency bands can be described by fewer bits
`not only because their amplitudes are smaller but also because more noise
`can be tolerated. The wavelet transform and the discrete cosine transform
`
`used in MPEG allows two-dimensional pictures to be described in the
`frequency domain and these are discussed in Chapter 3.
`
`1.7.2
`
`inter—coded compression
`
`
`
`Inter-coding takes further advantage of the similarities betWeen succes-
`sive pictures in real material. instead of sending information for each
`picture separately,
`inter-coders will send the difference between the
`previous picture and the current picture in a form of differential coding.
`Figure 1.8 shows the principle. A picture store is required at the coder to
`allow comparison to be made between successive pictures and a similar
`store is required at the decoder to make the previous picture available.
`The difference data may be treated as a picture itself and subjected to
`some form of transform—based spatial compression.
`The simple system of Figure 1.8(a) is of limited use as in the case of a
`transmission error, every subsequent picture would be affected. Channel
`switching in a television set would also be impossible. In practical
`systems a modification is required. One approach is the so-called ’leaky
`predictor’ in which the next picture is predicted from a limited number of
`previous pictures rather than from an indefinite number. As a result
`errors cannot propagate indefinitely. The approach used in MPEG is that
`periodically some absolute picture data are transmitted in place of
`difference data.
`Figure 1.8(b) shows that absolute picture data, known as I or intm
`pictures are interleaved with pictures which are created using difference
`
`1'1
`
`_
`1‘
`
`‘
`l
`
`l
`'
`
`ll 1
`it; "‘
`
`
`
`10
`
`

`

`16 MPEG—2
`
`............
`
`Input
`
`pixel
`
`
`Pixel
`in
`previous
`picture
`
`
`
`
`Time
`
`Y
`
`1 picture delay
`
`Input
`
`Pixel
`difference
`
`(a)
`
`l
`
`D
`
`D
`
`D
`
`I
`
`
`Send
`picture
`
`Send
`difference
`
`Send
`difference
`
`
`Send
`difference
`
`Send
`picture
`
`Send
`difference
`
`I = Intracoded-picture
`D = Differentially coded picture
`
`(*9)
`Figure 1.8 An inter-coded system (0) uses a delay to calculate the pixel
`differences between successive pictures, To prevent error propagation,
`infra—coded pictures (to) may be used periodically.
`
`data, known as P or predicted pictures. The 1 pictures require a large
`amount of data, whereas the P pictures require less data. As a result the
`instantaneous data rate varies dramatically and buffering has to be used
`to allow a constant transmission rate. The leaky predictor needs less
`buffering as the compression factor does not change so much from picture
`to picture.
`e called
`The 1 picture and all the P pictures prior to the next I picture ar
`a group of pictures (GOP). For a high compression factor, a large number
`of P pictures should be present between 1' pictures, making a long GOP-
`
`
`
`-T:"_Fla$r£um
`
`
`
`eq-
`
`'-...:..-.,...~:u.-i.r9'
`
`
`
`11
`
`

`

`introduction to compression
`
`17
`
`However, a long GOP delays recovery from a transmission error. The
`compressed bitstream can only be edited at I pictures as shown.
`In the case of moving objects, although their appearance may not
`change greatly from picture to picture, the data representing them on a
`fixed sampling grid will change and so large differences will be generated
`between successive pictures. It is a great advantage if the effect of motion
`can be removed from difference data so that they only reflect the changes
`in appearance of a moving object since a much greater coding gain can
`then be obtained. This is the objective of motion compensation.
`
`i.7.3
`
`introduction to motion compensation
`
`In real television program material objects move around before a fixed
`camera or the camera itself moves. Motion compensation is a process
`which effectively measures motion of objects from one picture to the next
`so that it can allow for that motion when looking for redundancy between
`pictures. Figure 1.9 shows that moving pictures can be expressed in a
`three-dimensional space which results from the screen area moving along
`the time axis. In the case of still objects, the only motion is along the time
`axis. However, when an object moves, it does so along the optic flow axis
`which is not parallel to the time axis. The optic flow axis joins the same
`point on a moving object as it takes on various screen positions.
`
`
`
`
`Optic flaw
`axis
`
`
`
`
`
`Horizontal
`position
`
`
`
`position
`
`
`
`
`Figure 1.9 Objects trove] in a three-dimensional space along the optic flow axis
`which is only parallel to the time axis if there is no movement.
`
`Time
`
`re
`re
`
`3d
`55
`re
`
`er
`
`)P.
`
`
`
`12
`
`

`

`MPEG-2 video compression
`
`183
`
`how motion is to be measured; it simply defines how a decoder will
`interpret the vectors. Encoder designers are free to use any motion-
`estimation system provided that the right vector protocol
`is created.
`Chapter 3 contrasted a number of motion-estimation techniques.
`Figure 5.21(a) shows that a macroblock contains both luminance and
`colour difference data at different resolutions. Most of the MPEG-2
`Profiles use a 4:2:0 structure which means that the colour is down—
`sampled by a factor of two in both axes. Thus in a 16 X 16 pixel block,
`there are only 8 X 8 colour difference sampling sites. MPEG—2 is based
`upon the 8 X 8 DCT (see section 3.7) and so the 16 X 16 block is the screen
`area which contains an 8 X 8 colour difference sampling block. Thus in
`4:220 in each macroblock there are four luminance DCT blocks, one R — Y
`DCT block and one B ~ Y DCT block, all steered by the same vector.
`
`1xCr
`
`16
`
`8
`
`B
`
`>16
`
`—-
`
`4xY
`
`4:2:0
`
`16
`
`_
`'
`4 - 2 - 2
`
`> 15
`
`B
`
`8
`
`—-
`
`4 x Y
`
`(8)
`
`2 x Cr
`
`2 x Cb
`
`(b)
`
`Figure 5.21 The structure of o mocrobiock. (A mocroblock is the screen oreo
`steered by one vector.) to) In 4:2:0. there are two chromo DCT blooks per
`mocroblock whereos in 4:22 (b) there ore four. 4:2:2 needs 33% more data than
`
`4:20.
`In the 42:2 Profile of MPEG—2, shown in Figure 5.2103), the chroma is
`not downsampled vertically, and so there is twice as much chroma data
`in each macroblock which is otherwise substantially the same.
`
`5.8
`
`I and P coding
`
`Predictive (P) coding cannot be used indefinitely, as it is prone to error
`propagation. A further problem is that it becomes impossible to decode
`the transmission if reception begins part-way through. In real video
`
`3::
`
`iction
`
`d
`
`;e
`ture. (b)
`
`n—ideot.
`)und. or
`Ming
`
`re moving
`
`undaryof
`the an?“
`.e movmg
`cause the
`der might
`TM data.
`naccurate
`ror data is
`m motion
`achieved
`lot specify
`
`i
`
`i
`
`.
`
`I
`
`.
`=
`
`i
`
`1
`i
`I
`g
`'
`l
`I
`I
`1
`i
`5
`i
`
`9
`
`13
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`184 MPEG-2
`
`little
`signals, cuts or edits can be present across which there is
`redundancy and which make motion estimators throw up their hands.
`In the absence of redundancy over a cut, there is nothing to be done but to
`send the new picture information in absolute form. This is called l coding
`where l is an abbreviation of infra coding. As i coding needs no previous
`picture for decoding, then decoding can begin at I coded information.
`MPEG-2 is effectively a toolkit and there is no compulsion to use all the
`tools available. Thus an encoder may choose whether to use i or P coding,
`either once and for all or dynamically on a macroblock by macroblock
`basis.
`For practical reasons, an entire frame may be encoded as l macroblocks
`periodically. This creates a place where the bitstream might be edited or
`where decoding could begin.
`Figure 5.22 shows a typical application of the Simple Profile of MPEG-
`2. Periodically an I picture is created. Between 1' pictures are P pictures
`which are based on the picture before. These P pictures predominantly
`contain macroblocks having vectors and prediction errors. However, it is
`perfectly legal for P pictures to contain I macroblocks. This might be
`useful where, for example, a camera pan introduces new material at the
`edge of the screen which cannot be created from an earlier picture.
`
`lam/ohm ; ..W‘x . Fix
`'1 PPPPli PPPPli
`P
`.._—‘—————I'-
`a
`GOP
`
`I = lntra—coded picture
`P = Predicted picture
`""x = Picture difference
`
`(vectors plus prediction error)
`Figure 5.22 A Simple Profile MPEG-2 signal may contoin periodic i pictures with 0
`number of P pictures between.
`
`Note that although what is sent is called a P picture, it is not a picture
`at all. It is a set of instructions to convert the previous picture into the
`current picture. If the previous picture is lost, decoding is impossible. An
`I picture together with all of the pictures before the next I picture form a
`Group of Pictures (GOP).
`
`5.9
`
`Bidirectional coding
`
`Motion-compensated predictive coding is a useful compression tech-
`nique, but it does have the drawback that it can only take data from 2
`previous picture. Where moving objects reveal a background this i:
`completely unknown in previous pictures and forward prediction fails
`
`
`
`14
`
`

`

`5
`
`little
`
`lands.
`e but to
`
`coding
`revious
`in.
`
`3 all the
`
`:oding,
`roblock
`
`)blocks
`lited or
`
`MPEG—
`:ictures
`
`inantly
`rer, it is
`
`with 0
`
`picture
`nto the
`ble. An
`
`form a
`
`n tech-
`
`from a
`this is
`In fails.
`
`MPEG—2 video compression
`
`185
`
`T=N
`
`T=N+1
`
`Revealed area is
`not in picture N
`but is in picture
`N + 2
`
`T=N+2
`
`Figure 5.23 In bidirectional coding the revealed background can be efficiently
`coded by bringing doto book from 0 future picture.
`
`However, more of the background is visible in later pictures. Figure 5.23
`shows the concept. In the centre of the diagram, 3 moving object has
`revealed some background. The previous picture can contribute nothing,
`whereas the next picture contains all that is required.
`Bidirectional coding is shown in Figure 5.24. A bidirectional or B
`macroblock can be created using a combination of motion compensation
`
`F(“Ward prediction
`
`Iflflfl
`
`Bidirectional
`prediction
`
`1 = Intra-or spatially coded
`'anchor' picture
`P 2 Forward predicted. Coder sends
`difference between land P decoder.
`Adds ditterence to create P
`
`B = Bidirectionally coded picture can be
`coded from a previous
`ior P picture or a later [or Ppicture.
`B piclures are not coded from each other
`
`In bidirectional coding. a number of 8 pictures can be inserted
`Figure 5.24
`between periodlc forward predicted pictures. See text.
`
`
`
`15
`
`

`

`
`
`186 MPEG-2
`
`and the addition of a prediction error. This can be done by forward
`prediction from a previous picture or backward prediction from a
`subsequent picture. It is also possible to use an average of both forward
`and backward prediction. On noisy material this may result in some
`reduction in bit rate. The technique is also a useful way of portraying a
`dissolve.
`The averaging process in MPEG—2 is a simple linear interpolation
`which works well when only one B picture exists between the reference
`pictures before and after. A larger number of B pictures would require
`weighted interpolation but MPEG—2 does not support this.
`Typically two B pictures are inserted between P pictures or between I
`and P pictures. As can be seen, B pictures are never predicted from one
`another, only from I or P pictures. A typical GOP for broadcasting
`purposes might have the structure IBBPBBPBBPBB. Note that the last B
`pictures in the GOP require the l' picture in the next GOP for decoding
`and so the GOPs are not
`truly independent. Independence can be
`obtained by creating a closed GOP which may contain B pictures but
`which ends with a P picture. It is also legal to have a B picture in which
`every macroblock is forward predicted, needing no future picture for
`decoding.
`Bidirectional coding is very powerful. Figure 5.25 is a constant quality
`curve showing how the bit rate changes with the type of coding. On the
`left, only I or spatial coding is used, whereas on the right an IBBP
`structure is used. This means that there are two bidirectionally coded
`pictures in between a spatially coded picture (I) and a forward predicted
`picture (P). Note how for the same quality the system which only uses
`spatial coding needs two and a half
`times the bit
`rate that
`the
`bidirectionally coded system needs.
`
`AO
`
`01D
`
`
`
`rate(Mbit{5)
`
`ND Bit
`
`l
`
`_. O
`
`”see
`is
`I
`Figure 5.25 Bidirectional coding is very powerful as it oilows the some quality with
`only 40% of the bit rote of intro-coding. However, the encoding and decoding
`deloys must increase. Coding over Cl longer time span is more efficient but editing
`is more difficult.
`
`
`
`16
`
`

`

`MPEG-2 video compression
`
`187
`
`Clearly information in the future has yet to be transmitted and so is not
`normally available to the decoder. MPEG—2 gets aroLmd the problem by
`sending pictures in the wrong order. Picture reordering requires delay in
`the encoder and a delay in the decoder to put the order right again. Thus
`the overall coder: delay must rise when bidirectional coding is used. This
`is quite consistent with Figure 1.5 which showed that as the compression
`factor rises the latency must also rise.
`Figure 5.26 shows that although the original picture sequence is
`IBBPBBPBBIBB .
`. ., this is transmitted as IPBBPBBIBB .
`.
`. so that the
`future picture is already in the decoder before bidirectional decoding
`begins. Note that the I picture of the next GOP is actually sent before the
`last 3 pictures of the current GOP.
`
`
`
`As transmitted
`Figure 5.26 Comparison of pictures before and after compression showing
`sequence change and varying amount of data needed by each picture type. i,
`P. B pictures use unequal amounts at data.
`
`Figure 5.26 also shows that the amount of data required by each picture
`is dramatically different. 1 pictures have only spatial redundancy and so
`need a lot of data to describe them. P pictures need less data because they
`are created by shifting the I picture with vectors and then adding a
`prediction error picture. 8 pictures need the least data of all because they
`can be created from I or P.
`With pictures requiring a variable length of time to transmit, arriving in
`the wrong order, the decoder needs some help. This takes the form of
`
`picture—type flags and time stamps which will be described in section 6.2.
`
`5.10
`
`Coding applications
`
`Figure 5.27 shows a variety of GOP structures. The simplest is the III .
`.
`.
`sequence in which every picture is intra-coded. Pictures can be fully
`decoded without reference to any other pictures and so editing is
`
`;:
`
`i
`l
`
`.
`|
`5
`
`i
`!
`
`l
`
`:
`
`‘
`
`‘.
`1
`l
`
`l l
`
`I
`
`rward
`'0111 a
`rward
`some
`ying a
`
`ilation
`arence
`squire
`
`veen I
`in one
`asting
`last B
`:oding
`an be
`as but
`
`which
`.re for
`
`[uality
`)n the
`IBBP
`ngEd
`d1CtEd
`y uses
`at
`the
`
`W WET“
`:cllting
`
`
`
`17
`
`

`

`214 MPEG-2
`
`magnitude of the prediction errors. The sub-carrier level may be low but
`it can be present over
`the whole screen and require an excess of
`coefficients to describe it.
`
`Composite video should not in general be used as a source for MPEG-2
`encoding, but where this is inevitable the standard of the decoder must be
`much higher than average, especially in the residual sub-carrier specifica-
`tion. Some MPEG preprocessors support high-grade composite decoding
`options.
`Judder from conventional linear standards convertors degrades the
`performance of MPEG-2. The optic flow axis is corrupted and linear
`filtering causes multiple images which confuse motion estimators and
`result in larger prediction errors. If standards conversion is necessary, the
`MPEG-2 system must be used to encode the signal in its original format
`and the standards convertor should be installed after the decoder. If a
`
`standards convertor has to be used before the encoder, then it must be a
`
`type which has effective motion compensation.
`Film weave causes movement of one picture with respect to the next and
`this results in more vector activity and larger prediction errors. Movement
`of the centre of the film frame along the optical axis causes magnification
`changes which also result in excess prediction error data. Film grain has the
`same effect as noise: it is random and so cannot be compressed.
`Perhaps because it is relatively uncommon, MPEG-2 cannot handle
`image rotation well because the motion—compensation system is only
`designed for translational motion. Where a rotating object is highly
`detailed, such as in certain fairground rides, the motion-compensation
`failure requires a significant amount of prediction error data and if a
`suitable bit rate is not available the level of artifacts will rise.
`
`Flash guns used by still photographers are a serious hazard to MPEG-2
`especially when long GOPs are used. At a press conference where a series
`of flashes may occur,
`the resultant video contains intermittent white
`frames which defeat prediction. A huge prediction error is required to
`turn the previous picture into a white picture, followed by another huge
`prediction error to return the white frame to the next picture. The output
`buffer fills and heavy requantizing is employed. After a few flashes the
`picture has generally gone to tiles.
`
`5.20
`
`Processing MPEG-2 and concatenation
`
`Concatenation loss occurs when the losses introduced by one codec are
`compounded by a second codec. All practical compressers, MPEG—2
`included, are lossy because what comes out of the decoder is not bit—
`identical to what went into the encoder. The bit differences are controlled
`
`so that they have minimum visibility to a human viewer.
`
`
`
`18
`
`

`

`3w but-
`:ess of
`
`[PEG—2
`nust be
`ecifica-
`
`coding
`
`[es the
`linear
`-rs and
`
`try, the
`format
`
`er. If 3
`.st be at
`
`ext and
`rement
`
`ication
`has the
`
`handle
`
`-.s only
`highly
`isation
`
`1d if a
`
`[PEG-2
`15eries
`white
`ired to
`
`-r huge
`output
`ies the
`
`iec are
`
`[PEG-2
`
`lOt bit-
`trolled
`
`MPEG-2 video compression
`
`215
`
`MPEG—2 is a toolbox which allows a variety of manipulations to be
`performed in both the spatial and the temporal domain. There is a limit
`to the compression which can be used on a single frame, and if higher
`compression factors are needed, temporal coding will have to be used.
`The longer the run of pictures considered, the lower the bit rate needed,
`but the harder it becomes to edit.
`
`The most editable form of MPEG-2 is to use I pictures only. As there is
`no temporal coding, pure cut edits can be made between pictures. The
`next best thing is to use a repeating [B structure which is locked to the
`odd / even field structure. Cut edits cannot be made as the B pictures are
`bidirectionally coded and need data from both adjacent I pictures for
`decoding. The B picture has to be decoded prior to the edit and re-
`encoded after the edit. This will cause a small concatenation loss.
`
`Beyond the 13 structure processing gets harder. if a long GOP is used
`for the best compression factor, an lBBPBBP .
`.
`. structure results. Editing
`this is very difficult because the pictures are sent out of order so that
`bidirectional decoding can be used. MPEG allows closed GOPs where the
`last B picture is coded wholly from the previous pictures and does not
`need the 1' picture in the next GOP. The bitstream can be switched at this
`point but only if the GOP structures in the two source video signals are
`synchronized (makes colour
`framing seem easy). Consequently in
`practice a long GOP bitstream will need to be decoded prior to any
`production step. Afterwards it will need to be re—encoded.
`This is known as naive concatenation and an enormous pitfall awaits.
`Unless the GOP structure of the output is identical to and synchronized
`with the input the results will be disappointing. The worst case is where
`an 1' picture is encoded from a picture which was formerly a B picture. It
`is easy enough to lock the GOP structure of a coder to a single input, but
`if an edit is made between two inputs, the GOP timings could well be
`different.
`
`As there are so many structures allowed in MPEG, there will be a need
`to convert between them. If this has to be done, it should only be in the
`direction which increases the GOP length and reduces the bit rate. Going
`the other way is inadvisable. The ideal way of converting from, say, the
`18 structure of a news system to the fBBP structure of an emission system
`is to use a recompressor. This is a kind of standards converter which will
`give better results than a decode followed by an encode.
`The DCT part of MPEG-2 itself is lossless. If all the coefficients are
`preserved intact an inverse transform yields the same pixel data.
`Unfortunately this does not yield enough compression for many
`applications.
`in practice the coefficients are made less accurate by
`removing bits, starting at the least significant end and working upwards.
`This process is weighted, or made progressively more aggressive as
`spatial frequency increases. Small-value coefficients may be truncated to
`
`
`
`19
`
`

`

`216 MPEG—2
`
`zero and large-value coefficients are most cearsely truncated at high
`spatial frequencies where the effect is least visible.
`Figure 5.43(a) shows what happens in the ideal case where two identical
`coders are put in tandem and synchronized. The first coder quantizes the
`coefficients to finite accuracy and causes a loss on decoding. However,
`when the second coder performs the DCT calculation, the coefficients
`obtained will be identical to the quantized coefficients in the first coder
`and so if the second weighting and requantizing step is identical the same
`truncated coefficient data will result and there will be no further loss of
`
`quality.3
`
`Same
`quality
`Reduced
`quality “——’
`
`i
`'
`
`Coder makes
`decisions and
`approximations
`
`in
`
`(a)
`
`
`
`COdeC
`
`' OUt
`
`C

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket