`
`)45(cid:13)4
`
`TELECOMMUNICATION
`STANDARDIZATION SECTOR
`OF ITU
`
`((cid:14)(cid:18)(cid:22)(cid:18)
`(07/95)
`
`42!.3-)33)/.(cid:0)(cid:0)/&(cid:0)(cid:0)./.(cid:13)4%,%0(/.%(cid:0)(cid:0)3)’.!,3
`
`).&/2-!4)/.(cid:0)(cid:0)4%#(./,/’9(cid:0)(cid:0)
`’%.%2)#(cid:0)(cid:0)#/$).’(cid:0)(cid:0)/&(cid:0)(cid:0)-/6).’
`0)#452%3(cid:0)(cid:0)!.$(cid:0)(cid:0)!33/#)!4%$(cid:0)(cid:0)!5$)/
`).&/2-!4)/.(cid:26)(cid:0)(cid:0)6)$%/
`
`)45(cid:13)4(cid:0)(cid:0)Recommendation(cid:0)(cid:0)((cid:14)(cid:18)(cid:22)(cid:18)
`
`(Previously “CCITT Recommendation”)
`
`VIMEO/IAC EXHIBIT 1028
`VIMEO ET AL., v. BT, IPR2019-00833
`
`
`
`FOREWORD
`
`the field of
`in
`ITU (International Telecommunication Union) is the United Nations Specialized Agency
`telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of the ITU.
`Some 179 member countries, 84 telecom operating entities, 145 scientific and industrial organizations and
`38 international organizations participate in ITU-T which is the body which sets world telecommunications standards
`(Recommendations).
`
`The approval of Recommendations by the Members of ITU-T is covered by the procedure laid down in WTSC
`Resolution No. 1 (Helsinki, 1993). In addition, the World Telecommunication Standardization Conference (WTSC),
`which meets every four years, approves Recommendations submitted to it and establishes the study programme for the
`following period.
`
`In some areas of information technology which fall within ITU-T’s purview, the necessary standards are prepared on a
`collaborative basis with ISO and IEC. The text of ITU-T Recommendation H.262 was approved on 10th of July 1995.
`The identical text is also published as ISO/IEC International Standard 13818-2.
`
`___________________
`
`In this Recommendation, the expression “Administration” is used for conciseness to indicate both a telecommunication
`administration and a recognized operating agency.
`
`NOTE
`
`All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
`mechanical, including photocopying and microfilm, without permission in writing from the ITU.
`
` ITU 1996
`
`ª
`
`
`CONTENTS
`
`1
`2
`3
`4
`
`5
`
`Summary....................................................................................................................................................................
`Introduction ...............................................................................................................................................................
`Intro. 1 Purpose ...........................................................................................................................................
`Intro. 2 Application .....................................................................................................................................
`Intro. 3 Profiles and levels...........................................................................................................................
`Intro. 4 The scalable and the non-scalable syntax .......................................................................................
`Scope ..............................................................................................................................................................
`Normative references .....................................................................................................................................
`Definitions......................................................................................................................................................
`Abbreviations and symbols ............................................................................................................................
`4.1
`Arithmetic operators ...........................................................................................................................
`4.2
`Logical operators ................................................................................................................................
`4.3
`Relational operators ............................................................................................................................
`4.4
`Bitwise operators ................................................................................................................................
`4.5
`Assignment .........................................................................................................................................
`4.6 Mnemonics..........................................................................................................................................
`4.7
`Constants.............................................................................................................................................
`Conventions....................................................................................................................................................
`5.1 Method of describing bitstream syntax ...............................................................................................
`5.2
`Definition of functions ........................................................................................................................
`5.3
`Reserved, forbidden and marker_bit ...................................................................................................
`5.4
`Arithmetic precision............................................................................................................................
`Video bitstream syntax and semantics ...........................................................................................................
`6.1
`Structure of coded video data .............................................................................................................
`6.2
`Video bitstream syntax........................................................................................................................
`6.3
`Video bitstream semantics ..................................................................................................................
`The video decoding process ...........................................................................................................................
`7.1
`Higher syntactic structures..................................................................................................................
`7.2
`Variable length decoding ....................................................................................................................
`7.3
`Inverse scan.........................................................................................................................................
`7.4
`Inverse quantisation ............................................................................................................................
`7.5
`Inverse DCT........................................................................................................................................
`7.6 Motion compensation..........................................................................................................................
`7.7
`Spatial scalability ................................................................................................................................
`7.8
`SNR scalability ...................................................................................................................................
`7.9
`Temporal scalability............................................................................................................................
`7.10 Data partitioning .................................................................................................................................
`7.11 Hybrid scalability................................................................................................................................
`7.12 Output of the decoding process...........................................................................................................
`Profiles and levels ..........................................................................................................................................
`8.1
`ISO/IEC 11172-2 compatibility ..........................................................................................................
`8.2
`Relationship between defined profiles ................................................................................................
`8.3
`Relationship between defined levels...................................................................................................
`8.4
`Scalable layers ....................................................................................................................................
`8.5
`Parameter values for defined profiles, levels and layers .....................................................................
`
`6
`
`7
`
`8
`
`Page
`iii
`iv
`iv
`iv
`iv
`v
`1
`1
`2
`8
`8
`8
`9
`9
`9
`9
`9
`9
`9
`10
`11
`11
`11
`11
`23
`39
`62
`63
`63
`66
`68
`72
`73
`87
`100
`105
`108
`109
`110
`114
`115
`115
`117
`118
`121
`
`ITU-T Rec. H.262 (1995 E)
`
`i
`
`
`
`Annex A – Discrete cosine transform.....................................................................................................................
`
`Annex B – Variable length code tables ..................................................................................................................
`B.1 Macroblock addressing .......................................................................................................................
`B.2 Macroblock type .................................................................................................................................
`B.3 Macroblock pattern .............................................................................................................................
`B.4 Motion vectors ....................................................................................................................................
`B.5 DCT coefficients.................................................................................................................................
`
`Annex C – Video buffering verifier .......................................................................................................................
`
`Annex D – Features supported by the algorithm ....................................................................................................
`D.1 Overview.............................................................................................................................................
`D.2 Video formats......................................................................................................................................
`D.3
`Picture quality .....................................................................................................................................
`D.4 Data rate control..................................................................................................................................
`D.5
`Low delay mode..................................................................................................................................
`D.6
`Random access/channel hopping ........................................................................................................
`D.7
`Scalability ...........................................................................................................................................
`D.8
`Compatibility ......................................................................................................................................
`D.9 Differences between this Specification and ISO/IEC 11172-2 ...........................................................
`D.10 Complexity..........................................................................................................................................
`D.11 Editing encoded bitstreams .................................................................................................................
`D.12 Trick modes ........................................................................................................................................
`D.13 Error resilience....................................................................................................................................
`D.14 Concatenated sequences......................................................................................................................
`
`Annex E – Profile and level restrictions .................................................................................................................
`E.1
`Syntax element restrictions in profiles................................................................................................
`E.2
`Permissible layer combinations...........................................................................................................
`
`Annex F – Bibliography .........................................................................................................................................
`
`Page
`
`125
`
`126
`126
`127
`132
`133
`134
`
`143
`
`148
`148
`148
`149
`149
`150
`150
`150
`157
`157
`160
`160
`160
`161
`168
`
`169
`169
`180
`
`201
`
`ii
`
`ITU-T Rec. H.262 (1995 E)
`
`
`
`Summary
`
`This Recommendation | International Standard specifies coded representation of video data and the decoding process
`required to reconstruct pictures. It provides a generic video coding scheme which serves a wide range of applications,
`bit rates, picture resolutions and qualities. Its basic coding algorithm is a hybrid of motion compensated prediction
`and DCT. Pictures to be coded can be either interlaced or progressive. Necessary algorithmic elements are integrated
`into a single syntax, and a limited number of subsets are defined in terms of Profile (functionalities) and
`Level (parameters) to facilitate practical use of this generic video coding standard.
`
`ITU-T Rec. H.262 (1995 E)
`
`iii
`
`
`
`Introduction
`
`Intro. 1
`
`Purpose
`
`This Part of this Specification was developed in response to the growing need for a generic coding method of moving
`pictures and of associated sound for various applications such as digital storage media, television broadcasting and
`communication. The use of this Specification means that motion video can be manipulated as a form of computer data
`and can be stored on various storage media, transmitted and received over existing and future networks and distributed
`on existing and future broadcasting channels.
`
`Intro. 2
`
`Application
`
`The applications of this Specification cover, but are not limited to, such areas as listed below:
`
`BSS
`
`Broadcasting Satellite Service (to the home)
`
`CATV Cable TV Distribution on optical networks, copper, etc.
`
`CDAD Cable Digital Audio Distribution
`
`DSB
`
`Digital Sound Broadcasting (terrestrial and satellite broadcasting)
`
`DTTB Digital Terrestrial Television Broadcasting
`
`EC
`
`ENG
`
`FSS
`
`Electronic Cinema
`
`Electronic News Gathering (including SNG, Satellite News Gathering)
`
`Fixed Satellite Service (e.g. to head ends)
`
`HTTHome Television Theatre
`
`IPC
`
`ISM
`
`Interpersonal Communications (videoconferencing, videophone, etc.)
`
`Interactive Storage Media (optical disks, etc.)
`
`MMM Multimedia Mailing
`
`NCA
`
`NDB
`
`RVS
`
`SSM
`
`News and Current Affairs
`
`Networked Database Services (via ATM, etc.)
`
`Remote Video Surveillance
`
`Serial Storage Media (digital VTR, etc.)
`
`Intro. 3
`
`Profiles and levels
`
`This Specification is intended to be generic in the sense that it serves a wide range of applications, bitrates, resolutions,
`qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and
`communications. In the course of creating this Specification, various requirements from typical applications have been
`considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax.
`Hence, this Specification will facilitate the bitstream interchange among different applications.
`
`Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
`of the syntax are also stipulated by means of “profile” and “level”. These and other related terms are formally defined in
`clause 3.
`
`A “profile” is a defined subset of the entire bitstream syntax that is defined by this Specification. Within the bounds
`imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of
`encoders and decoders depending upon the values taken by parameters in the bitstream. For instance, it is possible to
`specify frame sizes as large as (approximately) 214 samples wide by 214 lines high. It is currently neither practical nor
`economic to implement a decoder capable of dealing with all possible frame sizes.
`
`In order to deal with this problem, “levels” are defined within each profile. A level is a defined set of constraints
`imposed on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take
`the form of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height
`multiplied by frame rate).
`
`Bitstreams complying with this Specification use a common syntax. In order to achieve a subset of the complete syntax,
`flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur
`later in the bitstream. In order to specify constraints on the syntax (and hence define a profile) it is thus only necessary to
`constrain the values of these flags and parameters that specify the presence of later syntactic elements.
`
`iv
`
`ITU-T Rec. H.262 (1995 E)
`
`
`
`Intro. 4
`
`The scalable and the non-scalable syntax
`
`The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super
`set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools
`for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction
`of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers,
`starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-
`scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.
`
`Intro. 4.1
`
`Overview of the non-scalable syntax
`
`The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good
`image quality. The algorithm is not lossless as the exact sample values are not preserved during coding. Obtaining good
`image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture
`coding alone. The need for random access, however, is best satisfied with pure intra picture coding. The choice of the
`techniques is based on the need to balance a high image quality and compression ratio with the requirement to make
`random access to the coded bitstream.
`
`A number of techniques are used to achieve high compression. The algorithm first uses block-based motion
`compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current
`picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion
`vectors are defined for each 16-sample by 16-line region of the picture. The prediction error, is further compressed using
`the Discrete Cosine Transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that
`discards the less important information. Finally, the motion vectors are combined with the quantised DCT information,
`and encoded using variable length codes.
`
`Intro. 4.1.1 Temporal processing
`
`Because of the conflicting requirements of random access and highly efficient compression, three main picture types are
`defined. Intra Coded Pictures (I-Pictures) are coded without reference to other pictures. They provide access points to
`the coded sequence where decoding can begin, but are coded with only moderate compression. Predictive Coded
`Pictures (P-Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive
`coded picture and are generally used as a reference for further prediction. Bidirectionally-predictive Coded Pictures
`(B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion
`compensation. Bidirectionally-predictive coded pictures are never used as references for prediction (except in the case
`that the resulting picture is used as a reference in a spatially scalable enhancement layer). The organisation of the three
`picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the
`application. Figure Intro. 1 illustrates an example of the relationship among the three different picture types.
`
`Bidirectional Interpolation
`
`I
`
`B
`
`B
`
`P
`
`B
`
`B
`
`B
`
`P
`
`T1516650-94/d01
`
`Prediction
`
`FIGURE Intro. 1/H.262...[D01] = 8 CM
`
`Figure Intro. 1 – Example of temporal picture structure
`
`ITU-T Rec. H.262 (1995 E)
`
`v
`
`
`
`Intro. 4.1.2 Coding interlaced video
`
`Each frame of interlaced video consists of two fields which are separated by one field-period. The Specification allows
`either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field
`encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video
`scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the
`first, works better when there is fast movement.
`
`Intro. 4.1.3 Motion representation – Macroblocks
`
`As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off
`between the coding gain provided by using motion information and the overhead needed to represent it. Each
`macroblock can be temporally predicted in one of a number of different ways. For example, in frame encoding, the
`prediction from the previous reference frame can itself be either frame-based or field-based. Depending on the type of
`the macroblock, motion vector information and other side information is encoded with the compressed prediction error
`in each macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using
`variable length codes. The maximum length of the motion vectors that may be represented can be programmed, on a
`picture-by-picture basis, so that the most demanding applications can be met without compromising the performance of
`the system in more normal situations.
`
`It is the responsibility of the encoder to calculate appropriate motion vectors. This Specification does not specify how
`this should be done.
`
`Intro. 4.1.4
`
`Spatial redundancy reduction
`
`Both source pictures and prediction errors have high spatial redundancy. This Specification uses a block-based DCT
`method with visually weighted quantisation and run-length coding. After motion compensated prediction or
`interpolation, the resulting prediction error is split into 8 by 8 blocks. These are transformed into the DCT domain where
`they are weighted before being quantised. After quantisation many of the DCT coefficients are zero in value and so
`two-dimensional run-length and variable length coding is used to encode the remaining DCT coefficients efficiently.
`
`Intro. 4.1.5 Chrominance formats
`
`In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this Specification supports 4:2:2 and 4:4:4 chrominance
`formats.
`
`Intro. 4.2
`
`Scalable extensions
`
`The scalability tools in this Specification are designed to support applications beyond that supported by single layer
`video. Among the noteworthy applications areas addressed are video telecommunications, video on Asynchronous
`Transfer Mode networks (ATM), interworking of video standards, video service hierarchies with multiple spatial,
`temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution
`HDTV, etc. Although a simple solution to scalable video is the simulcast technique which is based on
`transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable
`video coding, in which the bandwidth allocated to a given reproduction of video can be partially re-utilised in coding of
`the next reproduction of video. In scalable video coding, it is assumed that given a coded bitstream, decoders of various
`complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to
`have increased complexity when compared to a single layer encoder. However, this Recommendation | International
`Standard provides several different forms of scalabilities that address non-overlapping applications with corresponding
`complexities. The basic scalability tools offered are:
`
`–
`
`–
`
`–
`
`–
`
`data partitioning;
`
`SNR scalability;
`
`spatial scalability; and
`
`temporal scalability.
`
`Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the
`case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed,
`whereas in hybrid scalability up to three layers are supported. Tables Intro. 1 to Intro. 3 provide a few example
`applications of various scalabilities.
`
`vi
`
`ITU-T Rec. H.262 (1995 E)
`
`
`
`Table Intro. 1 – Applications of SNR scalability
`
`Lower layer
`
`Enhancement layer
`
`Application
`
`Recommendation
`ITU-R BT.601
`
`High Definition
`
`Same resolution and format as
`lower layer
`
`Same resolution and format as
`lower layer
`
`Two quality service for Standard TV (SDTV)
`
`Two quality service for HDTV
`
`4:2:0 high definition
`
`4:2:2 chroma simulcast
`
`Video production / distribution
`
`Table Intro. 2 – Applications of spatial scalability
`
`Base
`
`Enhancement
`
`Application
`
`Progressive (30 Hz)
`
`Progressive (30 Hz)
`
`Interlace (30 Hz)
`
`Interlace (30 Hz)
`
`HDTV/SDTV scalability
`
`Progressive (30 Hz)
`
`Interlace (30 Hz)
`
`ISO/IEC 11172-2/compatibility with this Specification
`
`Interlace (30 Hz)
`
`Progressive (60 Hz)
`
`Migration to high resolution progressive HDTV
`
`Table Intro. 3 – Applications of temporal scalability
`
`Base
`
`Enhancement
`
`Higher
`
`Application
`
`Progressive (30 Hz)
`
`Progressive (30 Hz)
`
`Progressive (60 Hz)
`
`Interlace (30 Hz)
`
`Interlace (30 Hz)
`
`Progressive (60 Hz)
`
`Migration to high resolution progressive
`HDTV
`
`Migration to high resolution progressive
`HDTV
`
`Intro. 4.2.1
`
`Spatial scalable extension
`
`Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video
`standards, video database browsing, interworking of HDTV and TV, etc., i.e. video systems with the primary common
`feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two
`spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic
`spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial
`resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this
`Specification, or the ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement layer.
`The latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover,
`spatial scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of
`spatial scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer
`can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a
`channel with poor error performance.
`
`Intro. 4.2.2
`
`SNR scalable extension
`
`SNR scalability is a tool intended for use in video applications involving telecommunications, video services with
`multiple qualities, standard TV and HDTV, i.e. video systems with the primary common feature that a minimum of two
`layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but
`different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video
`quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the
`
`ITU-T Rec. H.262 (1995 E)
`
`vii
`
`
`
`lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may
`either use this Specification or ISO/IEC 11172-2 Standard for the lower layer and this Specification for the enhancement
`layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors
`as the more important data of the lower layer can be sent over channel with better error performance, while the less
`critical enhancement layer data can be sent over a channel with poor error performance.
`
`Intro. 4.2.3 Temporal scalable extension
`
`Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications
`to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may
`be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less
`expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal
`scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the
`basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these
`layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal
`resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated
`systems of the future may decode both layers and provide high temporal resolution video while maintaining
`interworking with earlier generation systems. An additional advantage of temporal scalability is its ability to provide
`resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error
`performance, while the less critical enhancement layer can be sent over a channel with poor error performance.
`
`Intro. 4.2.4 Data partitioning extension
`
`Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a
`video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is
`partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, low
`frequency DCT coefficients) are transmitted in the channel with the better error performance, and less critical data (such
`as higher frequency DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to
`channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be
`decoded on a decoder that is not intended for decoding data partitioned bitstreams.
`
`viii
`
`ITU-T Rec. H.262 (1995 E)
`
`
`
`INTERNATIONAL STANDARD
`
`ISO/IEC 13818-2 : 1995 (E)
`
`ITU-T Rec. H.262 (1995 E)
`
`ITU-T RECOMMENDATION
`
`ISO/IEC 13818-2 : 1995 (E)
`
`INFORMATION TECHNOLOGY –
`GENERIC CODING OF MOVING PICTURES AND
`ASSOCIATED AUDIO INFORMATION: VIDEO
`
`1
`
`Scope
`
`This Recommendation | International Standard specifies the coded representation of picture information for digital
`storage media and digital video communication and specifies the decoding process. The representation supports constant
`bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing,
`as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures.
`This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward
`compatible with EDTV, HDTV, SDTV formats.
`
`This Recommendation | International Stan