`
`lain E. G. Richardsnn
`
`H. 264 and
`MPEG-4
`
`MCOMPRESSION
`
`
` {WWI L EY
`
`
`
`H.264 and MPEG-4 Video
`Compression
`
`
`
`
`
`H.264 and MPEG-4 Video
`Compression
`Video Coding for Next-generation Multimedia
`
`Iain E. G. Richardson
`The Robert Gordon University, Aberdeen, UK
`
`
`
`Copyright C(cid:1) 2003
`
`John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
`West Sussex PO19 8SQ, England
`
`Telephone
`
`(+44) 1243 779777
`
`Email (for orders and customer service enquiries): cs-books@wiley.co.uk
`Visit our Home Page on www.wileyeurope.com or www.wiley.com
`
`All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system
`or transmitted in any form or by any means, electronic, mechanical, photocopying, recording,
`scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988
`or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham
`Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher.
`Requests to the Publisher should be addressed to the Permissions Department, John Wiley &
`Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed
`to permreq@wiley.co.uk, or faxed to (+44) 1243 770620.
`
`This publication is designed to provide accurate and authoritative information in regard to
`the subject matter covered. It is sold on the understanding that the Publisher is not engaged
`in rendering professional services. If professional advice or other expert assistance is
`required, the services of a competent professional should be sought.
`
`Other Wiley Editorial Offices
`
`John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
`
`Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
`
`Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
`
`John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
`
`John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
`
`John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears
`in print may not be available in electronic books.
`
`British Library Cataloguing in Publication Data
`
`A catalogue record for this book is available from the British Library
`
`ISBN 0-470-84837-5
`
`Typeset in 10/12pt Times roman by TechBooks, New Delhi, India
`Printed and bound in Great Britain by Antony Rowe, Chippenham, Wiltshire
`This book is printed on acid-free paper responsibly manufactured from sustainable forestry
`in which at least two trees are planted for each one used for paper production.
`
`
`
`To Phyllis
`To Phyllis
`
`
`
`
`
`Contents
`
`About the Author
`
`Foreword
`
`Preface
`
`Glossary
`
`1 Introduction
`1.1 The Scene
`1.2 Video Compression
`1.3 MPEG-4 and H.264
`1.4 This Book
`1.5 References
`
`2 Video Formats and Quality
`2.1 Introduction
`2.2 Natural Video Scenes
`2.3 Capture
`2.3.1 Spatial Sampling
`2.3.2 Temporal Sampling
`2.3.3 Frames and Fields
`2.4 Colour Spaces
`2.4.1 RGB
`2.4.2 YCbCr
`2.4.3 YCbCr Sampling Formats
`2.5 Video Formats
`2.6 Quality
`2.6.1 Subjective Quality Measurement
`2.6.2 Objective Quality Measurement
`2.7 Conclusions
`2.8 References
`
`xiii
`
`xv
`
`xix
`
`xxi
`
`1
`1
`3
`5
`6
`7
`
`9
`9
`9
`10
`11
`11
`13
`13
`14
`15
`17
`19
`20
`21
`22
`24
`24
`
`
`
`•viii
`
`3 Video Coding Concepts
`3.1 Introduction
`3.2 Video CODEC
`3.3 Temporal Model
`3.3.1 Prediction from the Previous Video Frame
`3.3.2 Changes due to Motion
`3.3.3 Block-based Motion Estimation and Compensation
`3.3.4 Motion Compensated Prediction of a Macroblock
`3.3.5 Motion Compensation Block Size
`3.3.6 Sub-pixel Motion Compensation
`3.3.7 Region-based Motion Compensation
`3.4 Image model
`3.4.1 Predictive Image Coding
`3.4.2 Transform Coding
`3.4.3 Quantisation
`3.4.4 Reordering and Zero Encoding
`3.5 Entropy Coder
`3.5.1 Predictive Coding
`3.5.2 Variable-length Coding
`3.5.3 Arithmetic Coding
`3.6 The Hybrid DPCM/DCT Video CODEC Model
`3.7 Conclusions
`3.8 References
`
`4 The MPEG-4 and H.264 Standards
`4.1 Introduction
`4.2 Developing the Standards
`4.2.1 ISO MPEG
`4.2.2 ITU-T VCEG
`4.2.3 JVT
`4.2.4 Development History
`4.2.5 Deciding the Content of the Standards
`4.3 Using the Standards
`4.3.1 What the Standards Cover
`4.3.2 Decoding the Standards
`4.3.3 Conforming to the Standards
`4.4 Overview of MPEG-4 Visual/Part 2
`4.5 Overview of H.264 / MPEG-4 Part 10
`4.6 Comparison of MPEG-4 Visual and H.264
`4.7 Related Standards
`4.7.1 JPEG and JPEG2000
`4.7.2 MPEG-1 and MPEG-2
`4.7.3 H.261 and H.263
`4.7.4 Other Parts of MPEG-4
`4.8 Conclusions
`4.9 References
`
`CONTENTS
`
`27
`27
`28
`30
`30
`30
`32
`33
`34
`37
`41
`42
`44
`45
`51
`56
`61
`61
`62
`69
`72
`82
`83
`
`85
`85
`85
`86
`87
`87
`88
`88
`89
`90
`90
`91
`92
`93
`94
`95
`95
`95
`96
`97
`97
`98
`
`
`
`CONTENTS
`
`5 MPEG-4 Visual
`5.1 Introduction
`5.2 Overview of MPEG-4 Visual (Natural Video Coding)
`5.2.1 Features
`5.2.2 Tools, Objects, Profiles and Levels
`5.2.3 Video Objects
`5.3 Coding Rectangular Frames
`5.3.1 Input and Output Video Format
`5.3.2 The Simple Profile
`5.3.3 The Advanced Simple Profile
`5.3.4 The Advanced Real Time Simple Profile
`5.4 Coding Arbitrary-shaped Regions
`5.4.1 The Core Profile
`5.4.2 The Main Profile
`5.4.3 The Advanced Coding Efficiency Profile
`5.4.4 The N-bit Profile
`5.5 Scalable Video Coding
`5.5.1 Spatial Scalability
`5.5.2 Temporal Scalability
`5.5.3 Fine Granular Scalability
`5.5.4 The Simple Scalable Profile
`5.5.5 The Core Scalable Profile
`5.5.6 The Fine Granular Scalability Profile
`5.6 Texture Coding
`5.6.1 The Scalable Texture Profile
`5.6.2 The Advanced Scalable Texture Profile
`5.7 Coding Studio-quality Video
`5.7.1 The Simple Studio Profile
`5.7.2 The Core Studio Profile
`5.8 Coding Synthetic Visual Scenes
`5.8.1 Animated 2D and 3D Mesh Coding
`5.8.2 Face and Body Animation
`5.9 Conclusions
`5.10 References
`
`6 H.264/MPEG-4 Part 10
`6.1 Introduction
`6.1.1 Terminology
`6.2 The H.264 CODEC
`6.3 H.264 structure
`6.3.1 Profiles and Levels
`6.3.2 Video Format
`6.3.3 Coded Data Format
`6.3.4 Reference Pictures
`6.3.5 Slices
`6.3.6 Macroblocks
`
`•ix
`
`99
`99
`100
`100
`100
`103
`104
`106
`106
`115
`121
`122
`124
`133
`138
`141
`142
`142
`144
`145
`148
`148
`149
`149
`152
`152
`153
`153
`155
`155
`155
`156
`156
`156
`
`159
`159
`159
`160
`162
`162
`162
`163
`163
`164
`164
`
`
`
`•x
`
`6.4 The Baseline Profile
`6.4.1 Overview
`6.4.2 Reference Picture Management
`6.4.3 Slices
`6.4.4 Macroblock Prediction
`6.4.5 Inter Prediction
`6.4.6 Intra Prediction
`6.4.7 Deblocking Filter
`6.4.8 Transform and Quantisation
`6.4.9 4 × 4 Luma DC Coefficient Transform and Quantisation
`(16 × 16 Intra-mode Only)
`6.4.10 2 × 2 Chroma DC Coefficient Transform and Quantisation
`6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse
`Transform Process
`6.4.12 Reordering
`6.4.13 Entropy Coding
`6.5 The Main Profile
`6.5.1 B Slices
`6.5.2 Weighted Prediction
`6.5.3 Interlaced Video
`6.5.4 Context-based Adaptive Binary Arithmetic Coding (CABAC)
`6.6 The Extended Profile
`6.6.1 SP and SI slices
`6.6.2 Data Partitioned Slices
`6.7 Transport of H.264
`6.8 Conclusions
`6.9 References
`
`7 Design and Performance
`7.1 Introduction
`7.2 Functional Design
`7.2.1 Segmentation
`7.2.2 Motion Estimation
`7.2.3 DCT/IDCT
`7.2.4 Wavelet Transform
`7.2.5 Quantise/Rescale
`7.2.6 Entropy Coding
`7.3 Input and Output
`7.3.1 Interfacing
`7.3.2 Pre-processing
`7.3.3 Post-processing
`7.4 Performance
`7.4.1 Criteria
`7.4.2 Subjective Performance
`7.4.3 Rate–distortion Performance
`
`CONTENTS
`
`165
`165
`166
`167
`169
`170
`177
`184
`187
`
`194
`195
`
`196
`198
`198
`207
`207
`211
`212
`212
`216
`216
`220
`220
`222
`222
`
`225
`225
`225
`226
`226
`234
`238
`238
`238
`241
`241
`242
`243
`246
`246
`247
`251
`
`
`
`CONTENTS
`
`7.4.4 Computational Performance
`7.4.5 Performance Optimisation
`7.5 Rate control
`7.6 Transport and Storage
`7.6.1 Transport Mechanisms
`7.6.2 File Formats
`7.6.3 Coding and Transport Issues
`7.7 Conclusions
`7.8 References
`
`8 Applications and Directions
`8.1 Introduction
`8.2 Applications
`8.3 Platforms
`8.4 Choosing a CODEC
`8.5 Commercial issues
`8.5.1 Open Standards?
`8.5.2 Licensing MPEG-4 Visual and H.264
`8.5.3 Capturing the Market
`8.6 Future Directions
`8.7 Conclusions
`8.8 References
`
`Bibliography
`
`Index
`
`•xi
`
`254
`255
`256
`262
`262
`263
`264
`265
`265
`
`269
`269
`269
`270
`270
`272
`273
`274
`274
`275
`276
`276
`
`277
`
`279
`
`
`
`
`
`About the Author
`
`Iain Richardson is a lecturer and researcher at The Robert Gordon University, Aberdeen,
`Scotland. He was awarded the degrees of MEng (Heriot-Watt University) and PhD (The
`Robert Gordon University) in 1990 and 1999 respectively. He has been actively involved in
`research and development of video compression systems since 1993 and is the author of over
`40 journal and conference papers and two previous books. He leads the Image Communica-
`tion Technology Research Group at The Robert Gordon University and advises a number of
`companies on video compression technology issues.
`
`
`
`
`
`Foreword
`
`Work on the emerging “Advanced Video Coding” standard now known as ITU-T Recom-
`mendation H.264 and as ISO/IEC 14496 (MPEG-4) Part 10 has dominated the video coding
`standardization community for roughly the past three years. The work has been stimulating,
`intense, dynamic, and all consuming for those of us most deeply involved in its design. The
`time has arrived to see what has been accomplished.
`Although not a direct participant, Dr Richardson was able to develop a high-quality,
`up-to-date, introductory description and analysis of the new standard. The timeliness of this
`book is remarkable, as the standard itself has only just been completed.
`The new H.264/AVC standard is designed to provide a technical solution appropriate
`for a broad range of applications, including:
`
`r Broadcast over cable, satellite, cable modem, DSL, terrestrial.
`r Interactive or serial storage on optical and magnetic storage devices, DVD, etc.
`r Conversational services over ISDN, Ethernet, LAN, DSL, wireless and mobile networks,
`modems.
`r Video-on-demand or multimedia streaming services over cable modem, DSL, ISDN, LAN,
`wireless networks.
`r Multimedia messaging services over DSL, ISDN.
`
`The range of bit rates and picture sizes supported by H.264/AVC is correspondingly broad,
`addressing video coding capabilities ranging from very low bit rate, low frame rate, “postage
`stamp” resolution video for mobile and dial-up devices, through to entertainment-quality
`standard-definition television services, HDTV, and beyond. A flexible system interface for the
`coded video is specified to enable the adaptation of video content for use over this full variety
`of network and channel-type environments. However, at the same time, the technical design
`is highly focused on providing the two limited goals of high coding efficiency and robustness
`to network environments for conventional rectangular-picture camera-view video content.
`Some potentially-interesting (but currently non-mainstream) features were deliberately left out
`(at least from the first version of the standard) because of that focus (such as support of
`arbitrarily-shaped video objects, some forms of bit rate scalability, 4:2:2 and 4:4:4 chroma
`formats, and color sampling accuracies exceeding eight bits per color component).
`
`
`
`•xvi
`
`Foreword
`
`In the work on the new H.264/AVC standard, a number of relatively new technical
`developments have been adopted. For increased coding efficiency, these include improved
`prediction design aspects as follows:
`
`r Variable block-size motion compensation with small block sizes,
`r Quarter-sample accuracy for motion compensation,
`r Motion vectors over picture boundaries,
`r Multiple reference picture motion compensation,
`r Decoupling of referencing order from display order,
`r Decoupling of picture representation methods from the ability to use a picture for reference,
`r Weighted prediction,
`r Improved “skipped” and “direct” motion inference,
`r Directional spatial prediction for intra coding, and
`r In-the-loop deblocking filtering.
`
`In addition to improved prediction methods, other aspects of the design were also enhanced
`for improved coding efficiency, including:
`
`r Small block-size transform,
`r Hierarchical block transform,
`r Short word-length transform,
`r Exact-match transform,
`r Arithmetic entropy coding, and
`r Context-adaptive entropy coding.
`
`And for robustness to data errors/losses and flexibility for operation over a variety of network
`environments, some key design aspects include:
`
`r Parameter set structure,
`r NAL unit syntax structure,
`r Flexible slice size,
`r Flexible macroblock ordering,
`r Arbitrary slice ordering,
`r Redundant pictures,
`r Data partitioning, and
`r SP/SI synchronization switching pictures.
`
`Prior to the H.264/AVC project, the big recent video coding activity was the MPEG-4 Part 2
`(Visual) coding standard. That specification introduced a new degree of creativity and flex-
`ibility to the capabilities of the representation of digital visual content, especially with its
`coding of video “objects”, its scalability features, extended N-bit sample precision and 4:4:4
`color format capabilities, and its handling of synthetic visual scenes. It introduced a number
`of design variations (called “profiles” and currently numbering 19 in all) for a wide variety
`of applications. The H.264/AVC project (with only 3 profiles) returns to the narrower and
`more traditional focus on efficient compression of generic camera-shot rectangular video pic-
`tures with robustness to network losses – making no attempt to cover the ambitious breadth of
`MPEG-4 Visual. MPEG-4 Visual, while not quite as “hot off the press”, establishes a landmark
`in recent technology development, and its capabilities are yet to be fully explored.
`
`
`
`Foreword
`
`•xvii
`
`Most people first learn about a standard in publications other than the standard itself.
`My personal belief is that if you want to know about a standard, you should also obtain a
`copy of it, read it, and refer to that document alone as the ultimate authority on its content,
`its boundaries, and its capabilities. No tutorial or overview presentation will provide all of the
`insights that can be obtained from careful analysis of the standard itself.
`At the same time, no standardized specification document (at least for video coding), can
`be a complete substitute for a good technical book on the subject. Standards specifications are
`written primarily to be precise, consistent, complete, and correct and not to be particularly
`readable. Standards tend to leave out information that is not absolutely necessary to comply
`with them. Many people find it surprising, for example, that video coding standards say almost
`nothing about how an encoder works or how one should be designed. In fact an encoder is
`essentially allowed to do anything that produces bits that can be correctly decoded, regardless
`of what picture quality comes out of that decoding process. People, however, can usually only
`understand the principles of video coding if they think from the perspective of the encoder, and
`nearly all textbooks (including this one) approach the subject from the encoding perspective.
`A good book, such as this one, will tell you why a design is the way it is and how to make
`use of that design, while a good standard may only tell you exactly what it is and abruptly
`(deliberately) stop right there.
`In the case of H.264/AVC or MPEG-4 Visual, it is highly advisable for those new to the
`subject to read some introductory overviews such as this one, and even to get a copy of an
`older and simpler standard such as H.261 or MPEG-1 and try to understand that first. The
`principles of digital video codec design are not too complicated, and haven’t really changed
`much over the years – but those basic principles have been wrapped in layer-upon-layer of
`technical enhancements to the point that the simple and straightforward concepts that lie at
`their core can become obscured. The entire H.261 specification was only 25 pages long, and
`only 17 of those pages were actually required to fully specify the technology that now lies at
`the heart of all subsequent video coding standards. In contrast, the H.264/AVC and MPEG-4
`Visual and specifications are more than 250 and 500 pages long, respectively, with a high
`density of technical detail (despite completely leaving out key information such as how to
`encode video using their formats). They each contain areas that are difficult even for experts
`to fully comprehend and appreciate.
`Dr Richardson’s book is not a completely exhaustive treatment of the subject. However,
`his approach is highly informative and provides a good initial understanding of the key con-
`cepts, and his approach is conceptually superior (and in some aspects more objective) to other
`treatments of video coding publications. This and the remarkable timeliness of the subject
`matter make this book a strong contribution to the technical literature of our community.
`
`Gary J. Sullivan
`
`Biography of Gary J. Sullivan, PhD
`
`Gary J. Sullivan is the chairman of the Joint Video Team (JVT) for the development of the latest
`international video coding standard known as H.264/AVC, which was recently completed as a
`joint project between the ITU-T video coding experts group (VCEG) and the ISO/IEC moving
`picture experts group (MPEG).
`
`
`
`•xviii
`
`Foreword
`
`He is also the Rapporteur of Advanced Video Coding in the ITU-T, where he has
`led VCEG (ITU-T Q.6/SG16) for about seven years. He is also the ITU-T video liaison
`representative to MPEG and served as MPEG’s (ISO/IEC JTC1/SC29/WG11) video chair-
`man from March of 2001 to May of 2002.
`He is currently a program manager of video standards and technologies in the eHome A/V
`platforms group of Microsoft Corporation. At Microsoft he designed and remains active in
`the extension of DirectX® Video Acceleration API/DDI feature of the Microsoft Windows®
`operating system platform.
`
`
`
`Preface
`
`With the widespread adoption of technologies such as digital television, Internet streaming
`video and DVD-Video, video compression has become an essential component of broad-
`cast and entertainment media. The success of digital TV and DVD-Video is based upon the
`10-year-old MPEG-2 standard, a technology that has proved its effectiveness but is now
`looking distinctly old-fashioned. It is clear that the time is right to replace MPEG-2 video
`compression with a more effective and efficient technology that can take advantage of recent
`progress in processing power. For some time there has been a running debate about which
`technology should take up MPEG-2’s mantle. The leading contenders are the International
`Standards known as MPEG-4 Visual and H.264.
`This book aims to provide a clear, practical and unbiased guide to these two standards
`to enable developers, engineers, researchers and students to understand and apply them effec-
`tively. Video and image compression is a complex and extensive subject and this book keeps
`an unapologetically limited focus, concentrating on the standards themselves (and in the case
`of MPEG-4 Visual, on the elements of the standard that support coding of ‘real world’ video
`material) and on video coding concepts that directly underpin the standards. The book takes an
`application-based approach and places particular emphasis on tools and features that are help-
`ful in practical applications, in order to provide practical and useful assistance to developers
`and adopters of these standards.
`I am grateful to a number of people who helped to shape the content of this book. I
`received many helpful comments and requests from readers of my book Video Codec Design.
`Particular thanks are due to Gary Sullivan for taking the time to provide helpful and detailed
`comments, corrections and advice and for kindly agreeing to write a Foreword; to Harvey
`Hanna (Impact Labs Inc), Yafan Zhao (The Robert Gordon University) and Aitor Garay for
`reading and commenting on sections of this book during its development; to members of the
`Joint Video Team for clarifying many of the details of H.264; to the editorial team at John
`Wiley & Sons (and especially to the ever-helpful, patient and supportive Kathryn Sharples);
`to Phyllis for her constant support; and finally to Freya and Hugh for patiently waiting for the
`long-promised trip to Storybook Glen!
`I very much hope that you will find this book enjoyable, readable and above all useful.
`Further resources and links are available at my website, http://www.vcodex.com/. I always
`appreciate feedback, comments and suggestions from readers and you will find contact details
`at this website.
`
`Iain Richardson
`
`
`
`
`
`Glossary
`
`4:2:2 (sampling)
`
`4:4:4 (sampling)
`
`4:2:0 (sampling)
`
`Sampling method: chrominance components have half the horizontal
`and vertical resolution of luminance component
`Sampling method: chrominance components have half the horizontal
`resolution of luminance component
`Sampling method: chrominance components have same resolution as
`luminance component
`arithmetic coding Coding method to reduce redundancy
`artefact
`Visual distortion in an image
`ASO
`Arbitrary Slice Order, in which slices may be coded out of raster
`sequence
`Binary Alpha Block, indicates the boundaries of a region (MPEG-4
`Visual)
`Body Animation Parameters
`BAP
`Region of macroblock (8 × 8 or 4 × 4) for transform purposes
`Block
`Motion estimation carried out on rectangular picture areas
`block matching
`Square or rectangular distortion areas in an image
`blocking
`Coded picture (slice) predicted using bidirectional motion compensation
`B-picture (slice)
`Context-based Adaptive Binary Arithmetic Coding
`CABAC
`Context-based Arithmetic Encoding
`CAE
`Context Adaptive Variable Length Coding
`CAVLC
`Colour difference component
`chrominance
`Common Intermediate Format, a colour image format
`CIF
`COder / DECoder pair
`CODEC
`Method of representing colour images
`colour space
`Discrete Cosine Transform
`DCT
`Direct prediction A coding mode in which no motion vector is transmitted
`DPCM
`Differential Pulse Code Modulation
`DSCQS
`Double Stimulus Continuous Quality Scale, a scale and method for
`subjective quality measurement
`Discrete Wavelet Transform
`
`BAB
`
`DWT
`
`
`
`•xxii
`
`GLOSSARY
`
`Exp-Golomb
`FAP
`FBA
`FGS
`field
`flowgraph
`FMO
`
`Full Search
`GMC
`
`GOP
`H.261
`H.263
`H.264
`HDTV
`Huffman coding
`HVS
`
`JPEG2000
`latency
`Level
`loop filter
`Macroblock
`
`Coding method to reduce redundancy
`entropy coding
`error concealment Post-processing of a decoded image to remove or reduce visible error
`effects
`Exponential Golomb variable length codes
`Facial Animation Parameters
`Face and Body Animation
`Fine Granular Scalability
`Odd- or even-numbered lines from an interlaced video sequence
`Pictorial representation of a transform algorithm (or the algorithm itself)
`Flexible Macroblock Order, in which macroblocks may be coded out of
`raster sequence
`A motion estimation algorithm
`Global Motion Compensation, motion compensation applied to a
`complete coded object (MPEG-4 Visual)
`Group Of Pictures, a set of coded video images
`A video coding standard
`A video coding standard
`A video coding standard
`High Definition Television
`Coding method to reduce redundancy
`Human Visual System, the system by which humans perceive and
`interpret visual images
`hybrid (CODEC) CODEC model featuring motion compensation and transform
`IEC
`International Electrotechnical Commission, a standards body
`Inter (coding)
`Coding of video frames using temporal prediction or compensation
`interlaced (video) Video data represented as a series of fields
`intra (coding)
`Coding of video frames without temporal prediction
`I-picture (slice)
`Picture (or slice) coded without reference to any other frame
`ISO
`International Standards Organisation, a standards body
`ITU
`International Telecommunication Union, a standards body
`JPEG
`Joint Photographic Experts Group, a committee of ISO (also an image
`coding standard)
`An image coding standard
`Delay through a communication system
`A set of conformance parameters (applied to a Profile)
`Spatial filter placed within encoding or decoding feedback loop
`Region of frame coded as a unit (usually 16× 16 pixels in the original
`frame)
`Region of macroblock with its own motion vector (H.264)
`
`Macroblock
`partition
`Macroblock
`sub-partition
`media processor
`motion
`compensation
`motion estimation Estimation of relative motion between two or more video frames
`
`Region of macroblock with its own motion vector (H.264)
`
`Processor with features specific to multimedia coding and processing
`Prediction of a video frame with modelling of motion
`
`
`
`GLOSSARY
`
`•xxiii
`
`motion vector
`
`Vector indicating a displaced block or region to be used for motion
`compensation
`Motion Picture Experts Group, a committee of ISO/IEC
`MPEG
`A multimedia coding standard
`MPEG-1
`A multimedia coding standard
`MPEG-2
`A multimedia coding standard
`MPEG-4
`Network Abstraction Layer
`NAL
`objective quality Visual quality measured by algorithm(s)
`OBMC
`Overlapped Block Motion Compensation
`Picture (coded)
`Coded (compressed) video frame
`P-picture (slice)
`Coded picture (or slice) using motion-compensated prediction from one
`reference frame
`A set of functional capabilities (of a video CODEC)
`profile
`progressive (video) Video data represented as a series of complete frames
`PSNR
`Peak Signal to Noise Ratio, an objective quality measure
`QCIF
`Quarter Common Intermediate Format
`quantise
`Reduce the precision of a scalar or vector quantity
`rate control
`Control of bit rate of encoded video signal
`rate–distortion
`Measure of CODEC performance (distortion at a range of coded bit
`rates)
`Raw Byte Sequence Payload
`Red/Green/Blue colour space
`‘Ripple’-like artefacts around sharp edges in a decoded image
`Real Time Protocol, a transport protocol for real-time data
`Reversible Variable Length Code
`Coding a signal into a number of layers
`Intra-coded slice used for switching between coded bitstreams (H.264)
`A region of a coded picture
`Synthetic Natural Hybrid Coding
`Inter-coded slice used for switching between coded bitstreams (H.264)
`Texture region that may be incorporated in a series of decoded frames
`(MPEG-4 Visual)
`Redundancy due to the statistical distribution of data
`
`RBSP
`RGB
`ringing (artefacts)
`RTP
`RVLC
`scalable coding
`SI slice
`slice
`SNHC
`SP slice
`sprite
`
`statistical
`redundancy
`Lossless or near-lossless video quality
`studio quality
`subjective quality Visual quality as perceived by human observer(s)
`subjective
`Redundancy due to components of the data that are subjectively
`redundancy
`insignificant
`sub-pixel (motion Motion-compensated prediction from a reference area that may be
`compensation)
`formed by interpolating between integer-valued pixel positions
`test model
`A software model and document that describe a reference
`implementation of a video coding standard
`Image or residual data
`Motion compensation featuring a flexible hierarchy of partition sizes
`(H.264)
`
`Texture
`Tree-structured
`motion
`compensation
`
`
`
`•xxiv
`
`TSS
`VCEG
`VCL
`video packet
`VLC
`VLD
`VLE
`VLSI
`VO
`VOP
`VQEG
`VQEG
`Weighted
`prediction
`YCbCr
`YUV
`
`GLOSSARY
`
`Three Step Search, a motion estimation algorithm
`Video Coding Experts Group, a committee of ITU
`Video Coding Layer
`Coded unit suitable for packetisation
`Variable Length Code
`Variable Length Decoder
`Variable Length Encoder
`Very Large Scale Integrated circuit
`Video Object
`Video Object Plane
`Video Quality Experts Group
`Video Quality Experts Group
`Motion compensation in which the prediction samples from two
`references are scaled
`Luminance, Blue chrominance, Red chrominance colour space
`A colour space (see YCbCr)
`
`
`
`1 I
`
`ntroduction
`
`1.1 THE SCENE
`
`Scene 1: Your avatar (a realistic 3D model with your appearance and voice) walks through
`a sophisticated virtual world populated by other avatars, product advertisements and video
`walls. On one virtual video screen is a news broadcast from your favourite channel; you want
`to see more about the current financial situation and so you interact with the broadcast and
`pull up the latest stock market figures. On another screen you call up a videoconference link
`with three friends. The video images of the other participants, neatly segmented from their
`backgrounds, are presented against yet another virtual backdrop.
`Scene 2: Your new 3G vidphone rings; you flip the lid open and answer the call. The face
`of your friend appears on the screen and you greet each other. Each sees a small, clear image
`of the other on the phone’s screen, without any of the obvious ‘blockiness’ of older-model
`video phones. After the call has ended, you call up a live video feed from a football match. The
`quality of the basic-rate stream isn’t too great and you switch seamlessly to the higher-quality
`(but more expensive) ‘premium’ stream. For a brief moment the radio signal starts to break
`up but all you notice is a slight, temporary distortion in the video picture.
`These two scenarios illustrate different visions of the next generation of multimedia
`applications. The first is a vision of MPEG-4 Visual: a rich, interactive on-line world bring-
`ing together synthetic, natural, video, image, 2D and 3D ‘objects’. The second is a vision
`of H.264/AVC: highly efficient and reliable video communications, supporting two-way,
`‘streaming’ and broadcast applications and robust to channel transmission problems. The
`two standards, each with their advantages and disadvantages and each with their supporters
`and critics, are contenders in the race to provide video compression for next-generation comm-
`unication applications.
`Turn on the television and surf through tens or hundreds of digital channels. Play your
`favourite movies on the DVD player and breathe a sigh of relief that you can throw out your
`antiquated VHS tapes. Tune in to a foreign TV news broadcast on the web (still just a postage-
`stamp video window but the choice and reliability of video streams is growing all the time).
`Chat to your friends and family by PC videophone. These activities are now commonplace and
`unremarkable, demonstrating that digital video is well on the way to becoming a ubiquitous
`
`H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia.
`C(cid:1) 2003 John Wiley & Sons, Ltd. ISBN: 0-470-84837-5
`Iain E. G. Richardson.
`
`
`
`•2
`
`INTRODUCTION
`
`and essential component of the entertainment, computing, broadcasting and communications
`industries.
`Pervasive, seamless, high-quality digital video has been the goal of companies, re-
`searchers and standards bodies over the last two decades. In some areas (for example broadcast
`television and consumer video storage), digital video has clearly captured the market, whilst
`in others (videoconferencing, video email, mobile video), market success is perhaps still too
`early to judge. However, there is no doubt that digital video is a globally important industry
`which will continue to pervade businesses, networks and homes. The continuous evolution of
`the digital video industry is being driven by commercial and technical forces. The commercial
`drive comes from the huge revenue potential of persuading consumers and businesses (a) to
`replace analogue technology and older digital technology with new, efficient, high-quality
`digital video products and (b) to adopt new communication and entertainment products that
`have been made possible by the move to digital video. The technical drive comes from con-
`tinuing improvements in processing performance, the availability of higher-capacity storage
`and transmission mechanisms and research and development of video and image processing
`technology.
`Getting digital video from its source (a camera or a stored clip) to its destination (a dis-
`play) involves a chain of components or processes. Key to this chain are the processes of
`compression (encoding) and decompression (decoding), in which bandwidth-intensive ‘raw’
`digital video is reduced to a manageable size for transmission or storage, then reconstructed for
`display. Getting the compression and decompression processes ‘right’ can give a significant
`technical and commercial edge to a product, by providing better image quality, greater relia-
`bility and/or more flexibility than competing solutions. There is therefore a keen interest in the
`continuing development and improvement of video compression and decompression methods
`and systems. The interested parties include entertainment, communication and broadcasting
`companies, software and hardware developers, researchers and holders of potentially lucrative
`patents on new compression algorithms.
`The early successes in the digital video industry (notably broadcast digital television
`and DVD-Video) were underpinned by international standard ISO/IEC 13818 [1], popularly
`known as ‘MPEG-2’ (after the working group that developed the standard, the Moving Picture
`Experts Group). Anticipation of a need for better compression tools has led to the development
`of two further standards for video compression, known as ISO/IEC 14496 Part 2 (‘MPEG-4
`Visual’) [2] and ITU-T Recommendation H.264/ISO/IEC 14496 Part 10 (‘H.264’) [3]. MPEG-
`4 Visual and H.264 share the same ancestry and some common features (they both draw on
`well-proven techniques from earlier standards) but have not