`
`I T U - T
`
`TELECOMMUNICATION
`STANDARDIZATION SECTOR
`OF ITU
`
`H.262
`
`( 0 7 / 9 5 )
`
`TRANSMISSION OF NON-TELEPHONE SIGNALS
`
`INFORMATION TECHNOLOGY -
`
`GENERIC CODING OF MOVING
`
`PICTURES AND ASSOCIATED AUDIO
`
`INFORMATION: VIDEO
`
`ITU-T Recommendation H.262
`
`(Previously "CCITT Recommendation")
`
`Vedanti Systems Limited - Ex. 2012
`Page 1
`
`
`
`FOREWORD
`
`the United Nations Specialized Agency in the field of
`ITU (International Telecommunication Union) is
`telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of the ITU.
`Some 179 member countries, 84 telecom operating entities, 145 scientific and industrial organizations and
`38 international organizations participate in ITU-T which is the body which sets world telecommunications standards
`(Recommendations).
`
`The approval of Recommendations by the Members of ITU-T is covered by the procedure laid down in WTSC
`Resolution No. 1 (Helsinki, 1993). In addition, the World Telecommunication Standardization Conference (WTSC),
`which meets every four years, approves Recommendations submitted to it and establishes the study programme for the
`following period.
`
`In some areas of information technology which fall within ITU-T's purview, the necessary standards are prepared on a
`collaborative basis with ISO and IEC. The text of ITU-T Recommendation H.262 was approved on 10th of July 1995.
`The identical text is also published as ISO/IEC International Standard 13818-2.
`
`In this Recommendation, the expression "Administration" is used for conciseness to indicate both a telecommunication
`administration and a recognized operating agency.
`
`NOTE
`
`All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
`mechanical, including photocopying and microfilm, without permission in writing from the ITU.
`
`© ITU 1996
`
`Vedanti Systems Limited - Ex. 2012
`Page 2
`
`
`
`CONTENTS
`
`Summary
`
`Introduction
`
`Intro. 1 Purpose
`
`Intro. 2 Application
`
`Intro. 3 Profiles and levels
`
`Intro. 4 The scalable and the non-scalable syntax
`
`1
`
`2
`
`3
`
`4
`
`Scope
`
`Normative references
`
`Definitions
`
`Abbreviations and symbols
`
`4.1 Arithmetic operators
`
`4.2
`
`Logical operators
`
`4.3 Relational operators
`
`4.4 Bitwise operators
`
`4.5 Assignment
`
`4.6 Mnemonics
`
`4.7 Constants
`
`5
`
`Conventions
`
`5. 1 Method of describing bitstream syntax
`
`5.2 Definition of functions
`
`5.3 Reserved, forbidden and markerbit
`
`5.4 Arithmetic precision
`
`6
`
`Video bitstream syntax and semantics
`
`6.1
`
`Structure of coded video data
`
`Page
`
`iii
`
`iv
`
`iv
`
`iv
`
`iv
`
`v
`
`1
`
`1
`
`2
`
`8
`
`8
`
`8
`
`9
`
`9
`
`9
`
`9
`
`9
`
`9
`
`9
`
`10
`
`11
`
`11
`
`11
`
`11
`
`6.2 Video bitstream syntax
`
`6.3 Video bitstream semantics
`
`7
`
`The video decoding process
`
`7.1 Higher syntactic structures
`
`7.2 Variable length decoding
`
`7.3
`
`7.4
`
`7.5
`
`Inverse scan
`
`Inverse quantisation
`
`Inverse DCT
`
`7.6 Motion compensation
`
`7.7
`
`7.8
`
`7.9
`
`Spatial scalability
`
`SNR scalability
`
`Temporal scalability
`
`7.10 Data partitioning
`
`7.11 Hybrid scalability
`
`7.12 Output of the decoding process
`
`8
`
`Profiles and levels
`
`8.1
`
`8.2
`
`ISO/IEC 11172-2 compatibility
`
`Relationship between defined profiles
`
`8.3 Relationship between defined levels
`
`8.4
`
`8. 5
`
`Scalable layers
`
`Parameter values for defined profiles, levels and layers
`
`23
`
`39
`
`62
`
`63
`
`63
`
`66
`
`68
`
`72
`
`73
`
`87
`
`100
`
`105
`
`108
`
`109
`
`110
`
`114
`
`115
`
`115
`
`117
`
`118
`
`121
`
`ITU-T Rec. H.262 (1995 E)
`
`i
`
`Vedanti Systems Limited - Ex. 2012
`Page 3
`
`
`
`Annex A - Discrete cosine transform
`
`Annex B - Variable length code tables
`
`B.1 Macroblock addressing
`
`B.2 Macroblock type
`
`B. 3 Macroblock pattern
`
`B.4 Motion vectors
`
`B.5 DCT coefficients
`
`Annex C - Video buffering verifier
`
`Annex D - Features supported by the algorithm
`
`D.1 Overview
`
`D.2 Video formats
`
`D.3
`
`Picture quality
`
`D.4 Data rate control
`
`D.5 Low delay mode
`
`D.6 Random access/channel hopping
`
`D.7
`
`Scalability
`
`D.8 Compatibility
`
`D.9 Differences between this Specification and ISO/IEC 11172-2
`
`D.10 Complexity
`
`D.11 Editing encoded bitstreams
`
`D.12 Trick modes
`
`D.13 Error resilience
`
`D.14 Concatenated sequences
`
`Annex E - Profile and level restrictions
`
`E. 1
`
`E.2
`
`Syntax element restrictions in profiles
`
`Permissible layer combinations
`
`Annex F - Bibliography
`
`Page
`
`125
`
`126
`
`126
`
`127
`
`132
`
`133
`
`134
`
`143
`
`148
`
`148
`
`148
`
`149
`
`149
`
`150
`
`150
`
`150
`
`157
`
`157
`
`160
`
`160
`
`160
`
`161
`
`168
`
`169
`
`169
`
`180
`
`201
`
`ii
`
`ITU-T Rec. H.262 (1995 E)
`
`Vedanti Systems Limited - Ex. 2012
`Page 4
`
`
`
`Summary
`
`This Recommendation | International Standard specifies coded representation of video data and the decoding process
`required to reconstruct pictures. It provides a generic video coding scheme which serves a wide range of applications,
`bit rates, picture resolutions and qualities. Its basic coding algorithm is a hybrid of motion compensated prediction
`and DCT. Pictures to be coded can be either interlaced or progressive. Necessary algorithmic elements are integrated
`into a single syntax, and a limited number of subsets are defined in terms of Profile (functionalities) and
`Level (parameters) to facilitate practical use of this generic video coding standard.
`
`ITU-T Rec. H.262 (1995 E)
`
`iii
`
`Vedanti Systems Limited - Ex. 2012
`Page 5
`
`
`
`Introduction
`
`Intro. 1
`
`Purpose
`
`This Part of this Specification was developed in response to the growing need for a generic coding method of moving
`pictures and of associated sound for various applications such as digital storage media, television broadcasting and
`communication. The use of this Specification means that motion video can be manipulated as a form of computer data
`and can be stored on various storage media, transmitted and received over existing and future networks and distributed
`on existing and future broadcasting channels.
`
`Intro. 2
`
`Application
`
`The applications of this Specification cover, but are not limited to, such areas as listed below:
`
`BSS
`
`Broadcasting Satellite Service (to the home)
`
`CATV Cable TV Distribution on optical networks, copper, etc.
`
`CDAD Cable Digital Audio Distribution
`
`DSB
`
`Digital Sound Broadcasting (terrestrial and satellite broadcasting)
`
`DTTB Digital Terrestrial Television Broadcasting
`
`EC
`
`Electronic Cinema
`
`ENG
`
`Electronic News Gathering (including SNG, Satellite News Gathering)
`
`FSS
`
`Fixed Satellite Service (e.g. to head ends)
`
`HTTHome Television Theatre
`
`IPC
`
`Interpersonal Communications (videoconferencing, videophone, etc.)
`
`ISM
`
`Interactive Storage Media (optical disks, etc.)
`
`MMM Multimedia Mailing
`
`NCA
`
`News and Current Affairs
`
`NDB
`
`Networked Database Services (via ATM, etc.)
`
`RVS
`
`Remote Video Surveillance
`
`SSM
`
`Serial Storage Media (digital VTR, etc.)
`
`Intro. 3
`
`Profiles and levels
`
`This Specification is intended to be generic in the sense that it serves a wide range of applications, bitrates, resolutions,
`qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and
`communications. In the course of creating this Specification, various requirements from typical applications have been
`considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax.
`Hence, this Specification will facilitate the bitstream interchange among different applications.
`
`Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
`of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in
`clause 3.
`
`A "profile" is a defined subset of the entire bitstream syntax that is defined by this Specification. Within the bounds
`imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of
`encoders and decoders depending upon the values taken by parameters in the bitstream. For instance, it is possible to
`specify frame sizes as large as (approximately) 214 samples wide by 214 lines high. It is currently neither practical nor
`economic to implement a decoder capable of dealing with all possible frame sizes.
`
`In order to deal with this problem, "levels" are defined within each profile. A level is a defined set of constraints
`imposed on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take
`the form of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height
`multiplied by frame rate).
`
`Bitstreams complying with this Specification use a common syntax. In order to achieve a subset of the complete syntax,
`flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur
`later in the bitstream. In order to specify constraints on the syntax (and hence define a profile) it is thus only necessary to
`constrain the values of these flags and parameters that specify the presence of later syntactic elements.
`
`iv
`
`ITU-T Rec. H.262 (1995 E)
`
`Vedanti Systems Limited - Ex. 2012
`Page 6
`
`
`
`Intro. 4
`
`The scalable and the non-scalable syntax
`
`The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super
`set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools
`for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction
`of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers,
`starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-
`scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.
`
`Intro. 4.1
`
`Overview of the non-scalable syntax
`
`The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good
`image quality. The algorithm is not lossless as the exact sample values are not preserved during coding. Obtaining good
`image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture
`coding alone. The need for random access, however, is best satisfied with pure intra picture coding. The choice of the
`techniques is based on the need to balance a high image quality and compression ratio with the requirement to make
`random access to the coded bitstream.
`
`A number of techniques are used to achieve high compression. The algorithm first uses block-based motion
`compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current
`picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion
`vectors are defined for each 16-sample by 16-line region of the picture. The prediction error, is further compressed using
`the Discrete Cosine Transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that
`discards the less important information. Finally, the motion vectors are combined with the quantised DCT information,
`and encoded using variable length codes.
`
`Intro. 4.1.1 Temporal processing
`
`Because of the conflicting requirements of random access and highly efficient compression, three main picture types are
`defined. Intra Coded Pictures (I-Pictures) are coded without reference to other pictures. They provide access points to
`the coded sequence where decoding can begin, but are coded with only moderate compression. Predictive Coded
`Pictures (P-Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive
`coded picture and are generally used as a reference for further prediction. Bidirectionally-predictive Coded Pictures
`(B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion
`compensation. Bidirectionally-predictive coded pictures are never used as references for prediction (except in the case
`that the resulting picture is used as a reference in a spatially scalable enhancement layer). The organisation of the three
`picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the
`application. Figure Intro. 1 illustrates an example of the relationship among the three different picture types.
`
`Bidirectional Interpolation
`
`Figure Intro. 1 - Example of temporal picture structure
`
`ITU-T Rec. H.262 (1995 E)
`
`v
`
`Vedanti Systems Limited
`
`- Ex. 2012
`Page 7
`
`
`
`Intro. 4.1.2 Coding interlaced video
`
`Each frame of interlaced video consists of two fields which are separated by one field-period. The Specification allows
`either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field
`encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video
`scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the
`first, works better when there is fast movement.
`
`Intro. 4.1.3 Motion representation - Macroblocks
`
`As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off
`between the coding gain provided by using motion information and the overhead needed to represent it. Each
`macroblock can be temporally predicted in one of a number of different ways. For example, in frame encoding, the
`prediction from the previous reference frame can itself be either frame-based or field-based. Depending on the type of
`the macroblock, motion vector information and other side information is encoded with the compressed prediction error
`in each macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using
`variable length codes. The maximum length of the motion vectors that may be represented can be programmed, on a
`picture-by-picture basis, so that the most demanding applications can be met without compromising the performance of
`the system in more normal situations.
`
`It is the responsibility of the encoder to calculate appropriate motion vectors. This Specification does not specify how
`this should be done.
`
`Intro. 4.1.4 Spatial redundancy reduction
`
`Both source pictures and prediction errors have high spatial redundancy. This Specification uses a block-based DCT
`method with visually weighted quantisation and run-length coding. After motion compensated prediction or
`interpolation, the resulting prediction error is split into 8 by 8 blocks. These are transformed into the DCT domain where
`they are weighted before being quantised. After quantisation many of the DCT coefficients are zero in value and so
`two-dimensional run-length and variable length coding is used to encode the remaining DCT coefficients efficiently.
`
`Intro. 4.1.5 Chrominance formats
`
`In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this Specification supports 4:2:2 and 4:4:4 chrominance
`formats.
`
`Intro. 4.2
`
`Scalable extensions
`
`The scalability tools in this Specification are designed to support applications beyond that supported by single layer
`video. Among the noteworthy applications areas addressed are video telecommunications, video on Asynchronous
`Transfer Mode networks (ATM), inteiworking of video standards, video service hierarchies with multiple spatial,
`temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution
`HDTV, etc. Although a simple solution to scalable video is
`the simulcast technique which is based on
`transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable
`video coding, in which the bandwidth allocated to a given reproduction of video can be partially re-utilised in coding of
`the next reproduction of video. In scalable video coding, it is assumed that given a coded bitstream, decoders of various
`complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to
`have increased complexity when compared to a single layer encoder. However, this Recommendation | International
`Standard provides several different forms of scalabilities that address non-overlapping applications with corresponding
`complexities. The basic scalability tools offered are:
`
`- data partitioning;
`
`- SNR scalability;
`
`- spatial scalability; and
`
`- temporal scalability.
`
`Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the
`case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed,
`whereas in hybrid scalability up to three layers are supported. Tables Intro. 1 to Intro. 3 provide a few example
`applications of various scalabilities.
`
`vi
`
`ITU-T Rec. H.262 (1995 E)
`
`Vedanti Systems Limited - Ex. 2012
`Page 8
`
`
`
`Table Intro. 1 - Applications of SNR scalability
`
`Lower layer
`
`Enhancement layer
`
`Application
`
`Recommendation
`ITU-R BT.601
`
`High Definition
`
`Same resolution and format as
`lower layer
`
`Same resolution and format as
`lower layer
`
`Two quality service for Standard TV (SDTV)
`
`Two quality service for HDTV
`
`4:2:0 high definition
`
`4:2:2 chroma simulcast
`
`Video production / distribution
`
`Table Intro. 2 - Applications of spatial scalability
`
`Base
`
`Enhancement
`
`Application
`
`Progressive (30 Hz)
`
`Progressive (30 Hz)
`
`Interlace (30 Hz)
`
`Interlace (30 Hz)
`
`HDTV/SDTV scalability
`
`Progressive (30 Hz)
`
`Interlace (30 Hz)
`
`ISO/IEC 11172-2/compatibility with this Specification
`
`Interlace (30 Hz)
`
`Progressive (60 Hz) Migration to high resolution progressive HDTV
`
`Table Intro. 3 - Applications of temporal scalability
`
`Base
`
`Enhancement
`
`Higher
`
`Application
`
`Progressive (30 Hz)
`
`Progressive (30 Hz)
`
`Progressive (60 Hz) Migration to high resolution progressive
`HDTV
`
`Interlace (30 Hz)
`
`Interlace (30 Hz)
`
`Progressive (60 Hz)
`
`Migration to high resolution progressive
`HDTV
`
`Intro. 4.2.1 Spatial scalable extension
`
`Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video
`standards, video database browsing, interworking of HDTV and TV, etc., i.e. video systems with the primary common
`feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two
`spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic
`spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial
`resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this
`Specification, or the ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement layer.
`The latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover,
`spatial scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of
`spatial scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer
`can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a
`channel with poor error performance.
`
`Intro. 4.2.2 SNR scalable extension
`
`SNR scalability is a tool intended for use in video applications involving telecommunications, video services with
`multiple qualities, standard TV and HDTV, i.e. video systems with the primary common feature that a minimum of two
`layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but
`different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video
`quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the
`
`ITU-T Rec. H.262 (1995 E)
`
`vii
`
`Vedanti Systems Limited - Ex. 2012
`Page 9
`
`
`
`lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may
`either use this Specification or ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement
`layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors
`as the more important data of the lower layer can be sent over channel with better error performance, while the less
`critical enhancement layer data can be sent over a channel with poor error performance.
`
`Intro. 4.2.3 Temporal scalable extension
`
`Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications
`to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may
`be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less
`expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal
`scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the
`basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these
`layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal
`resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated
`systems of the future may decode both layers and provide high temporal resolution video while maintaining
`interworking with earlier generation systems. An additional advantage of temporal scalability is its ability to provide
`resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error
`performance, while the less critical enhancement layer can be sent over a channel with poor error performance.
`
`Intro. 4.2.4 Data partitioning extension
`
`Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a
`video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is
`partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, low
`frequency DCT coefficients) are transmitted in the channel with the better error performance, and less critical data (such
`as higher frequency DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to
`channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be
`decoded on a decoder that is not intended for decoding data partitioned bitstreams.
`
`viii
`
`ITU-T Rec. H.262 (1995 E)
`
`Vedanti Systems Limited - Ex. 2012
`Page 10
`
`
`
`INTERNATIONAL STANDARD
`
`ITU-T RECOMMENDATION
`
`ISO/IEC 13818-2 : 1995 (E)
`
`INFORMATION TECHNOLOGY -
`GENERIC CODING OF MOVING PICTURES AND
`ASSOCIATED AUDIO INFORMATION: VIDEO
`
`1
`
`Scope
`
`This Recommendation | International Standard specifies the coded representation of picture information for digital
`storage media and digital video communication and specifies the decoding process. The representation supports constant
`bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing,
`as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures.
`This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward
`compatible with EDTV, HDTV, SDTV formats.
`
`This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and
`communication. The storage media may be directly connected to the decoder, or via communications means such as
`busses, LANs, or telecommunications links.
`
`2
`
`Normative references
`
`The following Recommendations and International Standards contain provisions which through reference in this text,
`constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
`were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
`Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
`edition of the Recommendations and Standards indicated below. Members of IEC and ISO maintain registers of
`currently valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of
`currently valid ITU-T Recommendations.
`
`- Recommendations and Reports of the CCIR, 1990, XVIIth Plenary Assembly, Dusseldorf 1990,
`Volume XI - Part 1 Broadcasting Service (Television) - Recommendation ITU-R BT.601-3 Encoding
`parameters of digital television for studios.
`
`- CCIR Volume X and XI Part 3 - Recommendation ITU-R BR.648 Recording of audio signals.
`
`- CCIR Volume X and XI Part 3 - Report ITU-R 955-2 Satellite sound broadcasting to vehicular, portable
`andfixed receivers in the range 500 - 3000 MHz.
`
`- ISO/IEC 11172-1:1993, Information technology - Coding of moving pictures and associated audio for
`digital storage media at up to about 1,5 Mbit/s - Part 1 : Systems.
`
`- ISO/IEC 11172-2:1993, Information technology - Coding of moving pictures and associated audio for
`digital storage media at up to about 1,5 Mbit/s - Part 2 : Video.
`
`- ISO/IEC 11172-3:1993, Information technology - Coding of moving pictures and associated audio for
`digital storage media at up to about 1,5 Mbit/s - Part 3 : Audio.
`
`- IEEE Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine Transform, IEEE
`Std 1180-1990, December 6, 1990.
`
`- IEC Publication 908:1987, Compact disc digital audio system.
`
`- IEC Publication 461:1986, Time and control code for video tape recorders.
`
`- ITU-T Recommendation H.261 (1993), Video codec for audiovisual services at p l 64 kbit/s.
`
`- CCITT Recommendation T.81 (1992) (JPEG) ISO/IEC 10918-1:1994, Information technology - Digital
`compression and coding of continuous-tone still images - Requirements and guidelines.
`
`ITU-T Rec. H.262 (1995 E)
`
`1
`
`Vedanti Systems Limited - Ex. 2012
`Page 11
`
`
`
`ISO/IEC 13818-2 : 1995 (E)
`
`3
`
`Definitions
`
`For the purposes of this Recommendation | International Standard, the following definitions apply.
`
`3.1
`
`AC coefficient: Any DCT coefficient for which the frequency in one or both dimensions is non-zero.
`
`big picture: A coded picture that would cause VBV buffer underflow as defined in C.7. Big pictures can only
`3.2
`occur in sequences where lowdelay is equal to 1. "Skipped picture" is a term that is sometimes used to describe the
`same concept.
`
`3.3
`
`3.4
`
`B-field picture: A field structure B-Picture.
`
`B-frame picture: A frame structure B-Picture.
`
`B-picture; bidirectionally predictive-coded picture: A picture that is coded using motion compensated
`3.5
`prediction from past and/or future reference fields or frames.
`
`backward compatibility: A newer coding standard is backward compatible with an older coding standard if
`3.6
`decoders designed to operate with the older coding standard are able to continue to operate by decoding all or part of a
`bitstream produced according to the newer coding standard.
`
`backward motion vector: A motion vector that is used for motion compensation from a reference frame or
`3.7
`reference field at a later time in display order.
`
`3.8
`
`3.9
`
`backward prediction: Prediction from the future reference frame (field).
`
`base layer: First, independently decodable layer of a scalable hierarchy.
`
`3.10
`
`bitstream; stream: An ordered series of bits that forms the coded representation of the data.
`
`3.11
`
`bitrate: The rate at which the coded bitstream is delivered from the storage medium to the input of a decoder.
`
`3.12
`
`block: An 8-row by 8-column matrix of samples, or 64 DCT coefficients (source, quantised or dequantised).
`
`bottom field: One of two fields that comprise a frame. Each line of a bottom field is spatially located
`3.13
`immediately below the corresponding line of the top field.
`
`byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8 bits from the first bit in
`3.14
`the stream.
`
`3.15
`
`byte: Sequence of 8 bits.
`
`channel: A digital medium that stores or transports a bitstream constructed according to ITU-T Rec. H.262 |
`3.16
`ISO/IEC 13818-2.
`
`3.17
`
`chrominance format: Defines the number of chrominance blocks in a macroblock.
`
`chroma simulcast: A type of scalability (which is a subset of SNR scalability) where the enhancement layer(s)
`3.18
`contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chrominance
`components.
`
`chrominance component: A matrix, block or single sample representing one of the two colour difference
`3.19
`signals related to the primary colours in the manner defined in the bitstream. The symbols used for the chrominance
`signals are Cr and Cb.
`
`3.20
`
`coded B-frame: A B-frame picture or a pair of B-field pictures.
`
`3.21
`
`coded frame: A coded frame is a coded I-frame, a coded P-frame or a coded B-frame.
`
`coded I-frame: An I-frame picture or a pair of field pictures, where the first field picture is an I-picture and
`3.22
`the second field picture is an I-picture or a P-picture.
`
`3.23
`
`coded P-frame: A P-frame picture or a pair of P-field pictures.
`
`coded picture: A coded picture is made of a picture header, the optional extensions immediately following it,
`3.24
`and the following picture data. A coded picture may be a coded frame or a coded field.
`
`2
`
`ITU-T Rec. H.262 (1995 E)
`
`Vedanti Systems Limited - Ex. 2012
`Page 12
`
`
`
`ISO/IEC 13818-2 : 1995 (E)
`
`coded video bitstream: A coded representation of a series of one or more pictures as defined in ITU-T
`3.25
`Rec. H.262 | ISO/IEC 13818-2.
`
`coded order: The order in which the pictures are transmitted and decoded. This order is not necessarily the
`3.26
`same as the display order.
`
`3.27
`
`coded representation: A data element as represented in its encoded form.
`
`coding parameters: The set of user-definable parameters that characterise a coded video bitstream. Bitstreams
`3.28
`are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.
`
`component: A matrix, block or single sample from one of the three matrices (luminance and two
`3.29
`chrominance) that make up a picture.
`
`3.30
`
`compression: Reduction in the number of bits used to represent an item of data.
`
`3.31
`
`constant bitrate coded video: A coded video bitstream with a constant bitrate.
`
`3.32
`
`constant bitrate: Operation where the bitrate is constant from start to finish of the coded bitstream.
`
`3.33
`
`data element: An item of data as represented before encoding and after decoding.
`
`data partitioning: A method for dividing a bitstream into two separate bitstreams for error resilience
`3.34
`purposes. The two bitstreams have to be recombined before decoding.
`
`3.35
`
`D-Picture: A type of picture that shall not be used except in ISO/IEC 11172-2.
`
`3.36
`
`DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.
`
`3.37
`
`DCT coefficient: The amplitude of a specific cosine basis function.
`
`3.38
`
`decoder input buffer: The First-In First-Out (FIFO) buffer specified in the video buffering verifier.
`
`3.39
`
`decoder: An embodiment of a decoding process.
`
`decoding (process): The process defined in ITU-T Rec. H.262 | ISO/IEC 13818-2 that reads an input coded
`3.40
`bitstream and produces decoded pictures.
`
`dequantisation: The process of rescaling the quantised DCT coefficients after their representation in the
`3.41
`bitstream has been decoded and before they are presented to the inverse DCT.
`
`3.42
`
`digital storage media; DSM: A digital storage or transmission device