`
`1
`
`/
`
`I
`
`Cor’nCastLEx; 11 16
`
`1
`
`Comcast, Ex. 1116
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MPEG-2
`
`John Watkinson
`
`G9
`
`Focal Press
`OXFORD AUCKLAND BOSTON JOHANNESBURG MELBOURNE NEW DELHI
`
`2
`
`
`
`Focal Press
`
`An imprint of Butterworth—Heinemann
`Linacre House, Jordan Hill, Oxford OX2 8DP
`225 Wildwood Avenue, Woburn, MA 01801—2041
`A division of Reed Educational and Professional Publishing Ltd
`
`'& A member of the Reed Elsevier plc group
`
`First published 1999
`
`© John Watkinson 1999
`
`All rights reserved. No part of this publication may be reproduced in
`any material form (including photocopying or storing in any medium by
`electronic means and whether or not transiently or incidentally to some
`other use of this publication) without the written permission of the
`copyright holder except in accordance with the provisions of the Copyright,
`Designs and Patents Act 1988 or under the terms of a licence issued by the
`Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London,
`England W1P 9HE. Applications for the copyright holder’s written
`permission to reproduce any part of this publication should be addressed
`to the publishers
`
`British Library Cataloguing in Publication Data
`A catalogue record for this book is available from the British Library
`
`Library of Congress Cataloguing in Publication Data
`A catalogue record for this book is available from the Library of Congress
`
`ISBN 0 240 51510 2
`
`Composition by Genesis Typesetting, Rochester, Kent
`Printed and bound in Great Britain
`
`LANT A
`
`Conservation Volunteers
`
`‘
`,
`131191:
`Brilirh'l'fustfor
`
`‘
`
`FOR EVERY TITLE THAT WE PUBLISH. BUTTERWOR’I‘II-HEINEMANN
`WILL PAY FOR ETCV TO PLANT AND CARE FOR A TREE.
`
`
`
`
`
`3
`
`
`
`
`
`
`Contents
`
`
`Preface
`
`Acknowledgements
`
`Chapter 1
`
`Introduction to compression
`
`1.1
`1.2
`1.3
`1.4
`
`1.5
`1.6
`
`1.7
`
`What is MPEG-2?
`
`Why compression is necessary
`Some applications of MPEG-2
`Lossless and perceptive coding
`Compression principles
`Audio compression
`1.6.1 Sub-band coding
`1.6.2 Transform coding
`1.6.3 Predictive coding
`
`Video compression
`1.7.1
`Intra-coded compression
`1.7.2
`Inter~coded compression
`1.7.3
`Introduction to motion compensation
`
`1.8
`1.9
`
`1.7.4 Film—originated video compression
`MPEG-2 profiles and levels
`MPEG-2 bitstreams
`
`1.10 Drawbacks of compression
`1.11 Compression preprocessing
`1.12 Some guidelines
`References
`
`ix
`
`xi
`
`OOQbPUJH
`
`12
`12
`13
`13
`13
`
`15
`15
`17
`18
`20
`22
`
`23
`24
`25
`
`26
`
`4
`
`
`
`vi Contents
`
`Chapter 2 Fundamentals
`
`2.1
`2.2
`
`2.3
`2.4
`
`2.5
`2.6
`2.7
`2.8
`2.9
`2.10
`2.11
`2.12
`2.13
`2.14
`2.15
`
`2.16
`2.17
`2.18
`2.19
`2.20
`2.21
`2.22
`2.23
`2.24
`2.25
`
`What is an audio signal?
`What is a video signal?
`Types of Video
`What is a digital signal?
`Introduction to conversion
`
`Sampling and aliasing
`Reconstruction
`
`Filter design
`Sampling clock jitter
`Choice of audio sampling rate
`Video sampling structures
`The phase-locked loop
`Quantizing
`Quantizing error
`Dither
`
`Binary codes for audio
`Binary codes for component Video
`Introduction to digital processes
`Logic elements
`Storage elements
`Binary adding
`Gain control by multiplication
`Multiplexing principles
`Packets
`
`Statistical multiplexing
`References
`
`Chapter 3 Processing for compression
`
`3.1
`3.2
`3.3
`3.4
`3.5
`3.6
`3.7
`
`3.8
`3.9
`3.10
`
`Filters
`
`Downsampling filters
`The quadrature mirror filter
`Filtering for Video noise reduction
`Transforms
`
`‘
`The Fourier transform
`The discrete cosine transform (DCT)
`The wavelet transform
`
`Motion compensation
`Motion-estimation techniques
`3.10.1 Block matching
`3.10.2 Gradient matching
`3.10.3 Phase correlation
`
`27
`
`27
`27
`28
`30
`33
`34
`37
`39
`42
`44
`46
`49
`50
`53
`56
`58
`
`64
`65
`
`66
`68
`70
`73
`74
`
`75
`75
`76
`
`77
`
`77
`
`83
`87
`
`91
`92
`
`96
`104
`108
`110
`111
`111
`112
`113
`
`
`
`5
`
`
`
`
`
`Contents
`
`vii
`
`3.11 Compression and requantizing
`References
`
`Chapter 4 Audio compression
`
`4.1
`4.2
`4.3
`
`4.4
`4.5
`
`4.6
`4.7
`4.8
`4.9
`4.10
`4.11
`4.12
`
`4.13
`4.14
`4.15
`4.16
`4.17
`4.18
`4.19
`
`Introduction
`The ear
`The cochlea
`Level and loudness
`
`Frequency discrimination
`Critical bands
`Beats
`Codec level calibration
`
`Quality measurement
`The limits
`
`Compression applications
`History of MPEG audio coding
`MPEG audio compression tools
`Transform coding
`MPEG Layer I audio coding
`MPEG Layer ll audio coding
`MPEG Layer Ill audio coding
`Dolby AC-3
`Compression in stereo
`References
`
`Chapter 5 MPEG-2 video compression
`
`5.1
`5.2
`5.3
`5.4
`5.5
`5.6
`5.7
`5.8
`5.9
`5.10
`5.11
`5.12
`5.13
`5.14
`
`>
`The eye
`Dynamic resolution
`Contrast
`Colour Vision
`
`Colour difference signals
`Progressive or interlaced scan?
`Spatial and temporal redundancy in MPEG
`I and P coding
`Bidirectional coding
`Coding applications
`Spatial compression
`Scanning and run-length/Variable-length coding
`A bidirectional coder
`Slices
`
`118
`123
`
`124
`
`124
`125
`126
`
`128
`130
`
`131
`133
`136
`137
`138
`
`139
`139
`141
`
`144
`146
`150
`151
`151
`153
`159
`
`160
`
`160
`164
`168
`169
`171
`174
`
`179
`183
`184
`187
`
`188
`192
`197
`200
`
`6
`
`
`
`viii Contents
`
`5.15
`5.16
`5.17
`5.18
`5.19
`5.20
`
`Handling interlaced pictures
`An MPEG-2 coder
`
`The Elementary Stream
`An MPEG-2 decoder
`
`Coding artifacts
`Processing MPEG—2 and concatenation
`References
`
`Chapter 6 Program and transport streams
`
`Introduction
`
`Packets and time stamps
`Transport streams
`Clock references
`
`Program Specific Information (PSI)
`Multiplexing
`Remultiplexing
`Reference
`
`6.1
`6.2
`6.3
`6.4
`6.5
`6.6
`6.7
`
`Glossary
`
`Index
`
`201
`206
`208
`209
`212
`214
`221
`
`222
`
`222
`222
`225
`
`226
`228
`229
`231
`232
`
`233
`
`239
`
`
`
`7
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Preface
`
`Digital compression technology has been employed for a long time, but
`until recently the technology was too complex for everyday applications.
`The onward march of LSl technology means that increasingly complex
`processes become available at moderate cost. Digital compression has
`now reached the stage where it can economically be applied to Video and
`audio systems on a wide scale. MPEG-2 has assisted in the process by
`providing interoperability standards for audio and video compression
`and communication.
`
`Economic compression does more than make existing processes
`cheaper. It allows new developments which were impossible without it.
`This book recognizes the wide application of MPEG by treating the
`subject from first principles without assuming any particular background
`for the reader. MPEG is not the only compression technique and other
`approaches will be mentioned here for completeness. While compression
`is traditionally described mathematically, it is the author’s View that
`mathematics is no more than a technical form of shorthand describing a
`process. As such it cannot explain anything and has no place in a book of
`this kind where instead the explanations are in plain English.
`An introductory chapter is included which suggests some applications
`of MPEG and how it works in a simplified form. Additionally a
`fundamentals chapter contains all the background necessary to follow the
`rest of the book.
`
`Compression theory is balanced with a great deal of practice. The
`creation of MPEG Elementary Streams and their multiplexing into
`transport streams is dealt with in detail along with the problems involved
`in synchronizing all the signals in a multiplex. The practical testing of
`MPEG transport streams is also considered.
`
`8
`
`
`
`x Preface
`
`Throughout this book the reader will find notes of caution and outlines
`of various pitfalls for the unwary. Compression is a useful tool in certain
`applications but it is a dangerous master and if used indiscriminately the
`results are bound to be disappointing. Unfortunately there are signs that
`this is happening. Various descriptions of kinds of artifacts and
`impairments which result from the misuse of compression are included
`here. Compression is rather like a box of fireworks; used wisely a pleasing
`display results — used carelessly it can blow up in your face.
`To suggest that compression should never be used to ensure that
`disasters are avoided is as naive as trying to ban fireworks. The solution
`is the same: to explain the dangers and propose safe practices. Retire to a
`safe place and read this book.
`
`John Watkinson
`
`Burghfield Common 1999
`
`
`
`
`
`
`
`
`
`9
`
`
`
`
`
`Acknowledgements
`
`Information for this book has come from a range of sources to Whom I am
`indebted. I would specifically mention the publications of the ISO, the
`ABS and SMPTE which provided essential groundwork. I have had
`numerous discussions with Peter de With of Philips, Steve Lyman of CBC,
`Bruce Devlin and Mike Knee of Snell and Wilcox and Peter Kraniauskas
`
`which have all been extremely useful. The assistance of MicroSoft Corp.
`and Tektronix Inc. is also appreciated.
`
`10
`
`10
`
`
`
`
`
`l
`
`Introduction to compression
`
`Ll
`
`What is MPEG-2?
`
`MPEG is actually an acronym for the moving pictures experts group
`which was formed by the ISO (International Standards Organization) to
`set standards for audio and video compression and transmission. The
`first compression standard for audio and video was MPEG-11'2 but this
`was of limited application and the subsequent MPEG-2 standard was
`considerably broader in scope and of wider appeal.3 For example, MPEG—
`2 supports interlace whereas MPEG-1 did not.
`Compression is summarized in Figure 1.1. It will be seen in (a) that the
`data rate is reduced at source by the compressor. The compressed data are
`then passed through a communication channel and returned to the
`original rate by the expander. The ratio between the source data rate and
`the channel data rate is called the compression factor. The term coding gain
`is also used. Sometimes a compressor and expander in series are referred
`to as a compander. The compressor may equally well be referred to as a
`coder and the expander a decoder in which case the tandem pair may be
`called a codec.
`
`In audio and video compression, where the encoder is more complex
`than the decoder the system is said to be asymmetrical. Figure 1.1(b)
`shows that MPEG works in this way. The encoder needs to be algorithmic
`or adaptive whereas the decoder is ’dumb’ and carries out fixed actions.
`This is advantageous in applications such as broadcasting where the
`number of expensive complex encoders is small but the number of simple
`inexpensive decoders is large; In point—to—point applications the advan-
`tage of asymmetrical coding is not so great.
`The approach of the ISO to standardization in MPEG is novel because
`it is not the encoder which is standardized. Figure 1.2(a) shows that
`instead the way in which a decoder shall interpret the bitstream is
`
`11
`
`11
`
`
`
`2 MPEG—2
`
`Data
`source
`———>
`
`
`Transmission
`Data
`
`
`
`Compressor
`channel
`Expander
`
`or
`or
`
`coder
`decoder
`
`
`
`(a)
`
` ln
`
`MPEG-
`
`.
`compliant
`bitstream
`
`____,
`
`‘Smart’
`encoder
`
`Encoder is
`algorithmic,
`i.e. it does different
`things according to
`nature of input
`
`~>Comp|ex to
`make
`
`Decoder is
`deterministic,
`i.e. it always does
`what the bitstream
`tells it to do
`
`+Simple to
`make
`
`Asymmetrical
`coding system
`
`Expensive
`coder
`
`Inexpensive
`decoder
`
`<-—-_~———-——>
`ideal for
`Few
`broadcast
`encoders
`
`Many
`decoders
`
`(b)
`
`in (a) a compression system consists of compressor or coder, 0
`Figure 1.1
`transmission channel and a matching expander or decoder. The combination of
`coder and decoder is known as a codec. (b) MPEG is asymmetrical since the
`encoder is much more complex than the decoder.
`
`defined. A decoder which can successfully interpret the bitstream is said
`to be compliant. Figure 1.2(b) shows that the advantage of standardizing
`the decoder is that over time encoding algorithms can improve yet
`compliant decoders will continue to function with them.
`Manufacturers can supply encoders using algorithms which are
`proprietary and their details do not need to be published. A useful result
`is that there can be competition between different encoder designs which
`means that better designs will evolve. The user will have greater choice
`because different levels of cost and complexity can exist in a range of
`coders yet a compliant decoder will operate with them all.
`MPEG is, however, much more than a compression scheme as it also
`standardizes the protocol and syntax under which it
`is possible to
`combine or multiplex audio data with video data to produce a digital
`equivalent of a television program. Many such programs can be
`combined in a single multiplex and MPEG defines the way in which such
`multiplexes can be created and transported. The definitions include the
`metadata which decoders require to demultiplex correctly and which
`users will need to locate programs of interest.
`
`
`
`12
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction to compression
`
`3
`
`Video
`
`Bitst1ream
`
`Video
`
`MPEG
`defines
`this!
`
`Encoder is
`not specified
`by MPEG
`except that
`it produces
`compliant
`bitstream
`Not this
`L___l__.__l
`
`Compliant
`decoder
`must interpret
`all legal
`MPEG
`bitstreams
`
` Today’s
`
`encoder
`
`Compliant
`bitstream
`
`Today’s
`decoder
`
`Tomorrow’s
`encoder
`
`
`
`Com llant
`bitstfeam
`
` -
`
`Today’s
`decoder
`still works
`
`2:2);er
`
`-
`Corn Ilant
`bitstioeam
`
`Today’s
`decoder
`still works
`
`(b)
`
`(C)
`
`(or) MPEG defines the protocol of the bitstream between encoder and
`Figure 1.2
`decoder. The decoder is defined by implication, the encoder is left very much to
`the designer. (b) This approach allows future encoders of better performance to
`remain compatible with existing decoders. (c) This approach also allows an
`encoder to produce a standard bitstream while its technical operation remains a
`commercial secret.
`
`As with all video systems there is a requirement for synchronizing or
`genlocking and this is particularly complex when a multiplex is
`assembled from many signals which are not necessarily synchronized to
`one another.
`
`1.2
`
`Why compression is necessary
`
`Compression, bit rate reduction and data reduction are all terms which
`mean basically the same thing in this context. In essence the same (or
`nearly the same) information is carried using a smaller quantity or rate of
`data, It should be pointed out that, in audio, compression traditionally
`means a process in which the dynamic range of the sound is reduced. In
`
`13
`
`13
`
`
`
`4 MPEG—2 ,
`
`the context of MPEG the same word means that the bit rate is reduced,
`ideally leaving the dynamics of the signal unchanged. Provided the
`context is clear, the two meanings can co-exist without a great deal of
`confusion.
`
`There are several reasons why compression techniques are popular:
`
`1. Compression extends the playing time of a given storage device.
`2. Compression allows miniaturization. With less data to store, the same
`playing time is obtained with smaller hardware. This is useful in ENG
`(electronic news gathering) and consumer devices.
`3. Tolerances can be relaxed. With less data to record, storage density can
`be reduced making equipment which is more resistant to adverse
`environments and which requires less maintenance.
`4. In transmission systems, compression allows a reduction in bandwidth
`which will generally result in a reduction in cost. It may make possible
`some process which would be impracticable without it.
`5. If a given bandwidth is available to an uncompressed signal,
`compression allows faster than real—time transmission in the same
`bandwidth.
`
`6. If a given bandwidth is available, compression allows a better-quality
`signal in the same bandwidth.
`
`1.3
`
`Some applications of MPEG-2
`
`The applications of audio and Video compression are limitless and the
`ISO has done well to provide standards which are appropriate to the
`wide range of possible compression products. MPEG-2 embraces video
`pictures from the tiny screen of a videophone to the high-definition
`images needed for electronic cinema. Audio coding stretches from speech
`grade mono to multichannel surround sound.
`Figure 1.3 shows the use of a codec with a recorder. The playing time
`of the medium is extended in proportion to the compression factor. In the
`case of tapes, the access time is improved because the length of tape
`needed for a given recording is reduced and so it can be rewound more
`quickly.
`In the case of DVD (Digital Video Disk aka Digital Versatile Disk) the
`challenge was to store an entire movie on one 12 cm disk. The storage
`density available with today’s optical disk technology is such that
`recording of conventional uncompressed video would be out of the
`question.
`In communications, the cost of data links is often roughly proportional
`to the data rate and so there is simple economic pressure to use a high
`compression factor. However, it should be borne in mind that implement-
`
`
`
`
`
`14
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction to compressiOn
`
`5
`
`ing the codec also has a cost Which rises with compression factor and so
`a degree of compromise will be inevitable.
`In the case of Video—on-Demand, technology exists to convey full
`bandwidth Video to the home, but to do so for a single individual at the
`moment would be prohibitively expensive. Without compression, HDTV
`(high~definition television) requires too much bandwidth. With compres—
`sion, HDTV can be transmitted to the home in a similar bandwidth to an
`
`existing analog SDTV channel. Compression does not make Video-on—
`Demand or HDTV possible, it makes them economically viable.
`
`Data
`source
`—————>
`
`
`Compressor
`or
`
`
`coder
`Storage
`device
`tape,
`
`
`disk,
`RAM:
`
` Expander etc.
`
`or
`
`
`decoder
`
`Data
`
`
`
`Figure 1.3 Compression can be used around a recording medium. The storage
`capacity may be increased or the access time reduced according to the
`application.
`
`is»
`
`In workstations designed for the editing of audio and/ or video, the
`source material is stored on hard disks for rapid access. While top—grade
`systems may function without compression, many systems use compres-
`sion to offset the high cost of disk storage. When a workstation is used for
`ofl—line editing, a high compression factor can be used and artifacts will be
`visible in the picture.
`This is of no consequence as the picture is only seen by the editor who
`uses it to make an EDL (Edit Decision List) which is no more than a list
`
`of actions and the timecodes at which they occur. The original
`uncompressed material is then conformed to the EDL to obtain a high-
`quality edited work. When art-line editing is being performed, the output
`of the workstation is the finished product and clearly a lower compres-
`sion factor will have to be used.
`
`Perhaps it is in broadcasting where the use of compression will have its
`greatest impact. There is only one electromagnetic spectrum and pressure
`from other services such as cellular telephones makes efficient use of
`bandwidth mandatory. Analog television broadcasting is an old technol—
`ogy and makes very inefficient use of bandwidth. Its replacement by a
`compressed digital transmission will be inevitable for the practical reason
`that the bandwidth is needed elsewhere.
`
`15
`
`15
`
`
`
`6 MPEG—2
`
`Fortunately, in broadcasting there is a mass market for decoders and
`these can be implemented'as low-cost integrated circuits. Fewer encoders
`are needed and so it is less important if these are expensive. While the
`cost of digital storage goes down year on year, the cost of electromagnetic
`spectrum goes up. Consequently in the future the pressure to use
`compression in recording will ease whereas the pressure to use it in radio
`communications will increase.
`
`1.4
`
`Lossless and perceptive coding
`
`Although there are many different coding techniques, all of them fall into
`one or other of these categories. In lossless coding, the data from the
`expander are identical bit—for—bit with the original source data. The so-
`called ’stacker’ programs which increase the apparent capacity of disk
`drives in personal computers use lossless codecs. Clearly with computer
`programs the corruption of a single bit can be catastrophic. Lossless
`coding is generally restricted to compression factors of around 221.
`It is important to appreciate that a lossless coder cannot guarantee a
`particular compression factor and the communications link or recorder
`used with it must be able to function with the variable output data rate.
`Source data which result in poor compression factors on a given codec are
`described as difficult. It should be pointed out that the difficulty is often
`a function of the codec. In other words, data which one codec finds
`
`difficult may not be found difficult by another. Lossless codecs can be
`included in bit-error-rate testing schemes. It is also possible to cascade or
`concatenate lossless codecs without any special precautions.
`In lossy coding data from the expander are not identical bit-for-bit with
`the source data and as a result comparing the input with the output is
`bound to reveal differences. Lossy codecs are not suitable for computer
`data, but are used in MPEG as they allow greater compression factors
`than lossless codecs. Successful lossy codecs are those in which the errors
`are arranged so that a human viewer or listener finds them subjectively
`difficult to detect. Thus lossy codecs must be based on an understanding
`of psychoacoustic and psychovisual perception and are often called
`perceptive codes.
`In perceptive coding, the greater the compression factor required, the
`more accurately must the human senses be modelled. Perceptive coders
`can be forced to operate at a fixed compression factor. This is convenient
`for practical transmission applications where a fixed data rate is easier to
`handle than a variable rate. The result of a fixed compression factor is that
`the subjective quality can vary with the ’difficulty’ of the input material.
`Perceptive codecs should not be concatenated indiscriminately especially
`if they use different algorithms. As the reconstructed signal from a
`
`
`
`
`
`16
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction to compression
`
`7
`
`perceptive codec is not bit-for—bit accurate, clearly such a codec cannot be
`included in any'vbit-error—rate testing system as the coding differences
`would be indistinguishable from real errors.
`.Although the adoption of digital techniques is recent, compression
`itself is as old as television. Figure 1.4 shows some of the compression
`techniques used in traditional television systems.
`
`Progressive
`scan source
`
`Compress
`(Iossy)
`
`.
`
`Interlaced
`signal
`
`.(a)
`
`(b)
`
`RGB
`
`Compress
`
`Y‘ Pr. Pb
`
`..._._.
`
`(lossy)
`
`SECAM
`
`source
`
`(C)
`
`Figure 1.4 Compression is as old as television. (a) interlace is a primitive way of
`halving the bandwidth. (b) Colour difference working invisibly reduces colour
`resolution. (0) Composite video transmits colour in the some bandwidth as
`monochrome
`
`One of the oldest techniques is interlace which has been used in analog
`television from the very beginning as a primitive way of reducing
`bandwidth. As will be seen in Chapter 5, interlace is not without its
`problems, particularly in motion rendering. MPEG-2 supports interlace
`simply because legacy interlaced signals exist and there is a requirement
`to compress them. This should not be taken to mean that it is a good
`idea.
`
`The generation of colour difference signals from RGB in video
`represents an application of perceptive coding. The human visual system
`(HVS) sees no change in quality although the bandwidth of the colour
`difference signals is reduced. This is because human perception of detail
`in colour changes is much less than in brightness changes. This approach
`is sensibly retained in MPEG.
`Composite video systems such as PALLNTSC and SECAM are all
`analog compression schemes which embed a subcarrier in the luminance
`signal so that colour pictures are available in the same bandwidth as
`monochrome.
`In comparison with a progressive scan RGB picture,
`interlaced composite video has a compression factor of 6:1.
`
`17
`
`17
`
`
`
`8 MPEG~2
`
`In a sense MPEG-2 can be considered to be a modern digital equivalent
`of analog composite Video as it has most of the same attributes. For
`example, the eight-field sequence of PAL subcarrier which makes editing
`diffficult has its equivalent in the GOP (group of pictures) of MPEG.
`
`1.5
`
`Compression principles
`
`In a PCM digital system the bit rate is the product of the sampling rate
`and the number of bits in each sample and this is generally constant.
`Nevertheless the information rate of a real signal varies. In all real signals,
`part of the signal is obvious from what has gone before or what may come
`later and a suitable receiver can predict that part so that only the true
`information actually has to be sent. If the characteristics of a predicting
`receiver are known, the transmitter can omit parts of the message in the
`knowledge that the receiver has the ability to recreate it. Thus all encoders
`must contain a model of the decoder.
`
`is the unpredictable or
`it
`information is that
`One definition of
`surprising element of data. Newspapers are a good example of
`information because they only mention items which are surprising.
`Newspapers never carry items about individuals who have not been
`involved in an accident as this is the normal case. Consequently the
`phrase ’no news is good news’
`is remarkably true because if an
`information channel exists but nothing has been sent then it is most likely
`that nothing remarkable has happened.
`The unpredictability of the punch line is a useful measure of how funny a
`joke is. Often the build-up paints a certain picture in the listener’s
`imagination, which the punch line destroys utterly. One of the author’s
`favourites is the one about the newly married couple who didn’t know the
`difference between putty and petroleum jelly — their windows fell out.
`The difference between the information rate and the overall bit rate is
`
`known as the redundancy. Compression systems are designed to eliminate
`as much of that redundancy as practicable or perhaps affordable. One way
`in which this can be done is to exploit statistical predictability in signals.
`The information content or entropy of a sample is a function of how
`different it is from the predicted value. Most signals have some degree of
`predictability. A sine wave is highly predictable, because all cycles look the
`same. According to Shannon’s theory, any signal which is totally
`predictable carries no information. In the case of the sine wave this is clear
`because it represents a single frequency and so has no bandwidth.
`At
`the opposite extreme a signal such as noise is completely
`unpredictable and as a result all codecs find noise difi‘z’cnlt. There are two
`consequences of this characteristic. First, a codec which is designed using
`the statistics of real material should not be tested with random noise
`
`
`
`
`
`
`
`18
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction to compression *9
`
`because it is not a representative test. Second, a codec which performs
`well with clean source material may perform badly with source material
`containing superimposed noise. Most practical compression units require
`some form of pre-processing before the compression stage proper and
`appropriate noise reduction should be incorporated into the pre-
`processing if noisy signals are anticipated. It will also be necessary to
`restrict the degree of compression applied to noisy signals.
`All real signals fall part way between the extremes of total predictabil-
`ity and total unpredictability or noisiness. If the bandwidth (set by the
`sampling rate) and the dynamic range (set by the wordlength) of the
`transmission system are used to delineate an area, this sets a limit on the
`information capacity of the system. Figure 1.5(a) shows that most real
`signals only occupy part of that area. The signal may not contain all
`frequencies, or it may not have full dynamics at certain frequencies.
`
`
`
`ideal
`iossiess
`coder
`
`:>
`
`Redundancy
`
`‘Lossy’
`coder
`
`FREQUENCY & _____
`
`3‘9“?“
`level
`
`a
`:3:
`.5
`w
`(I)
`95>-
`E
`0
`O
`
`
`
`25
`‘3
`8
`‘7:
`
`Q
`§
`E
`O
`O
`
`.
`Better
`quality
`
`Latency (delay)
`
`(b)
`
`Complexity
`
`(c)
`
`(a) A perfect coder removes only the redundancy from the input
`Figure 1.5
`signal and results in subjectively iossiess coding. If the remaining entropy is beyond
`the capacity of the channel some of it must be lost and the codec will then be
`lossy. An imperfect coder will also be lossy as it fails to keep all entropy. (b) As the
`compression factor rises, the complexity must also rise to maintain quality. (c) High
`compression factors also tend to increase latency or delay through the system.
`
`19
`
`19
`
`
`
`, 10 MPEG-2,
`
`Entropy can be thought of as a measure of the actual area occupied by
`the signal. This is the area that must be transmitted if there are to be no
`subjective differences or artifacts in the received signal. The remaining
`area is called the redundancy because it adds nothing to the information
`conveyed. Thus an ideal coder could be imagined which miraculously
`sorts out the entropy from the redundancy and only sends the former. An
`ideal decoder would then recreate the original
`impression of
`the
`information quite perfectly.
`As the ideal is approached, the coder complexity and the latency or
`delay both rise. Figure 1.5(b) shows how complexity increases with
`compression factor and (c) shows how increasing the codec latency can
`improve the compression factor. Obviously we would have to provide a
`channel which could accept whatever entropy the coder extracts in order
`to haVe transparent quality. As a result moderate coding gains which only
`remove redundancy need not cause artifacts and result in systems which
`are described as subjectively lossless.
`If the channel capacity is not sufficient for that, then the coder will have
`to discard some of the entropy and with it useful information. Larger
`coding gains which remove some of the entropy must result in artifacts.
`It will also be seen from Figure 1.5 that an imperfect coder will fail to
`separate the redundancy and may discard entropy instead, resulting in
`artifacts at a sub-optimal compression factor.
`A single variable-rate transmission or recording channel is inconven-
`ient and unpopular with channel providers because it is difficult to
`police. The requirement can be overcome by combining several com-
`pressed channels into one constant rate transmission in a way which
`flexibly allocates data rate between the channels. Provided the material is
`unrelated, the probability of all channels reaching peak entropy at once is
`very small and so those channels which are at one instant passing easy
`material will free up transmission capacity for those channels which are
`handling difficult material. This is the principle of statistical multi—
`plexing.
`Where the same type of source material is used consistently, e.g.
`English text, then it is possible to perform a statistical analysis on the
`frequency with which particular letters are used. Variable—length coding
`is used in which frequently used letters are allocated short codes and
`letters which occur infrequently are allocated long codes. This results in
`a lossless code. The well-known Morse code used for telegraphy is an
`example of this approach. The letter e is the most frequent in English and
`is sent with a single dot.
`An infrequent letter such as z is allocated a long complex pattern. It
`should be clear that codes of this kind which rely on a prior knowledge
`of the statistics of the signal are only effective with signals actually having
`those statistics.
`If Morse code is used with another language,
`the
`
`
`
`
`
`20
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction to compression
`
`1]
`
`transmission becomes significantly less efficient because the statistics are
`quite different; the letter 2, for example, is quite common in Czech.
`The Huffman code4 is one which is designed for use with a data source
`having known statistics and shares the same principles with the Morse
`code. The probability of the different code values to be transmitted is
`studied, and the most frequent codes are arranged to be transmitted with
`short wordlength symbols. As the probability of a code value falls, it will
`be allocated longer wordlength. The Huffman code is used in conjunction
`with a number of compression techniques and is shown in Figure 1.6.
`
`0000
`
`0.1
`
`—_<l>o
`0.1—3
`
`1
`
`0001
`
`_
`
`130
`
`C
`
`D
`
`E
`
`F
`
`0010 0.1—$0
`
`0011
`
`0.1
`
`t
`i
`i
`Input Output Probabilities
`values codes
`
`Figure 1.6 The Huffman code achieves compression by allocating short codes to
`frequent values. To aid descriolizing the short codes are not prefixes of longer
`codes.
`
`The input or source codes are assembled in order of descending
`probability. The two lowest probabilities are distinguished by a single
`code bit and their probabilities are combined. The process of combining
`probabilities is continued until unity is reached and at each stage a bit is
`used to distinguish the path. The bit will be a zero for the most probable
`path and one for the least. The compressed output is obtained by reading
`the bits which describe which path to take going from right to left.
`In the case of computer data, there is no control over the data statistics.
`Data to be recorded could be instructions, images, tables, text files and so
`on; each having their own code value distributions. In this case a coder
`relying on fixed