`
`’l 9 lTU—T Video Coding Standards
`H.261 and H.263
`
`This chapter introduces ITU-T video coding standards H.261 and H.263, which are established
`mainly for videophony and videoconferencing. The basic technical detail of H.261 is presented.
`The technical
`improvements with which H.263 achieves high coding efficiency are discussed.
`Features of H.263+, H.263++, and H.26L are presented.
`
`19.1
`
`INTRODUCTION
`
`Very low bit rate video coding has found many industry applications such as wireless and network
`communications. The rapid convergence of standardization of digital video-coding standards is the
`reflection of several factors:
`the maturity of technologies in terms of algorithmic performance.
`hard ware implementation wrth VLSI technology, and the market need for rapid advances in wireless
`and network communications. As stated in the previous chapters. these standards include IPEG for
`still
`image coding and MPEG—1}? for CD-ROM storage and digital
`television applications. In
`Pflfalle] With the ISOXIEC development of the MPEG- 1/2 standards, the ITU-T has developed H.261
`“TILT. 1993) for vtdcotelcphony and videoconferencing applications in an ISDN environment.
`
`19.2 H.261 VIDEO-CODING STANDARD
`
`The H.261 video-coding standard was developed by ITU-T study group XV during 1988 to 1993.
`It was adopted in 1990 and the final revision approved in 1993. This is also referred to as the P x 64
`Standard because it encodes the digital video signals at the bit rates ofP x 64 ths, where P is an
`integer from t
`to 30. Le... at the bit rates 64 Kbps to [.92 Mbps.
`
`19.2.1 Ovecwrw or H.261 VIDEO-CODING STANDARD
`
`The H26! video-coding standard has many features in common with the MPEG-1 video-codtng
`standard. However, since they target different applications, there exist many differences between
`the two standards, such as data rates, pieture quality. end-to-end delay, and others. Before indicating
`the differences between the two coding standards. we describe the major similarity between H.261
`and MPEG-li’2. First both standards are used to code similar video format. H26! is mainly used
`to code the video with the common intermediate format (CIF) or quarter-C113. (QCIF) spatial
`rtisolation for teleconferencing application. MPEG-1 uses CIF. SfF, or higher spatial resolution for
`CD-ROM applications, The original motivation for developing the H.26l‘videojcoding standard
`was to provide a standard that can be used for both PAL and NTSC television Signals. But later.
`the H26] was mainly used for videoconferencing and the MPEG-1}? was used for digital televtsmn
`(BTW VCD (video co), and DVD {digital video disk]. The two TV system‘s. PAL and NTsc.
`USe different line and picture rates. The NTSC. which is used in North America and Japan, uses
`525 lines per interlaced picture at 30 framesrsecoiid- The PAL system is used for most oth:r
`countries, and it uses 625 lines per interlaced picture at 25 framesi’second. For this purpose, 1 e
`CIF was adapted as the source video format for the H.261 video coder. The CIF format conSists
`of 352 pixels/line, 288 linest'frarne. and 30 framesr'second. This format represents half the active
`
`429
`
`|PR2018—01413
`
`Sony EX1008 Page 455
`
`IPR2018-01413
`Sony EX1008 Page 455
`
`
`
`430
`
`Image and Video Compression for Multimedia Engineering
`
`lines of the PAL signal and the same picture rate of the NTSC signal. The PAL systems need only
`perform a picture rate conversion and NTSC systems need only perform a line number conversion,
`Color pictures consist of one luminance and two color-difference components (referred to as Y C“
`C, format) as specified by the CCIR601 standard. The C,, and C, components one the half-sire on
`both horizontal and vertical directions and have 1'36 pixelsftinc and 144 lincsi’l'rame. The other
`format, QCIF. is used for very low bit rate applications. The QCIF has half the number of pixels
`and half the number of lines of CIF format. Second.
`the key codin;I algorithms of H.261 and
`
`MPEG4 are very similar. Both H.261 and MPEG-1 use DCT—bnscd coding to remove intrat'riitnc
`redundancy and motion compensation to remove inlerl'rnme redundancy.
`Now let us describe the main differences between the two coding standards With respect to
`coding algorithms. The main differences include:
`
`- H.26i uses only I- and P-macroblocks but no B-niacroblocks. while M PEG-1 uses three
`macroblock types. 1-. P. and B—inacroblocks tl-IttélCt‘Ohluck is lit inii';ili'-.itne-eoded 111:th
`robloclt, P-rnacroblock is a predictive-coded mttcroblock. aind B-triuciohlock is a bidi-
`rectionally coded macroblock}, as well as tltree picture types. 1—. P—. and B-piclurcs ‘JS
`defined in Chapter 16 for the MPEG-1 standard.
`- There is a constraint of H.261 that for every 132 intert'rnme-coded iii-.icroblocks. which
`corresponds to 4 GOBs {group of blocks) or to one-third of the CIF pictures. it requires
`at least one intraframocoded macroblock. To obtain better coding performance at low-
`bit~rate applications. most encoding schemes ot‘HQot prefer not to use intrat‘rnme codingr
`on all the macroblocks of a picture, but only on it few iiiricroblocks in etcry picture with
`a rotational scheme. MPEG-1 uses the GOP (group ol‘picturesl structure. where the size
`of GOP (the distance between two l-piclures) is not specified.
`is critictil for H.261. The
`' The end-to-end delay is not a critical issue for MPEG-1. but
`video encoder and video decoder delays of H.261 need to be known to allow audio
`compensation delays to be fixed when H.261 is used in Interactive applications. Till-‘5
`will allow lip synchronization to be maintained.
`is only a
`- The accuracy of motion compensation in MPEG-1 is up to a halllpiscl, but
`full-pixel in H.261. However. H.261 uses a loop litter to smooth the previous frame. This
`filter attempts to minimize the prediction error.
`In H.261. a fixed picture aspect ratio of 4:3 is used. In MPEG—1. several picture aspect
`ratios can be used and the picture aspect ratio is defined in the picture header.
`0 Finally,
`in H.261. the encoded picture rate is restricted to allow up to three skipped
`frames. This would allow the control mechanism in the encoder some flexibility to control
`the encoded picture quality and satisfy the buffer regulation. Although MPEG-1 has no
`restriction on skipped frames.
`the encoder usually does not perform t'rarne skipping.
`Rather. the syntax for B-frnmes is exploited. as B-frames require much fewer bits than
`P-pictures.
`
`-
`
`19.2.2 TECHNICAL DETAlI. or H.261
`
`The key technologies used in the H.261 video-coding standard are the DCT and motion compen-
`sation. The main components in the encoder include DCT. prediction. quantization (Q).
`inverse
`DCT (IDCT).
`inverse quantization (IQ), 100p filter, frame memory, variable-length coding» and
`coding control unit. A typical encoder structure is shown in Figure 19.1.
`The input video source is first converted to the CIF frame and then is stored in the frame memory-
`The CIF frame is then partitioned into GOBs. The 608 contains 33 maeroblocks. which are 'ft: of
`a CIF picture or 1A of a QCIF picture. Each maeroblock consists of six 8 x 8 blocks among which
`four are luminance 0’) blocks and two are chrominanee blocks (one of Cb and one of C,).
`
`|PR2018—01413
`
`Sony EX1008 Page 456
`
`IPR2018-01413
`Sony EX1008 Page 456
`
`
`
`lTU-T Video Coding Standards H.261 and H.263
`
`431
`
`Coding Control
`
`
`
`
`
`Motion
`Comtensation-
`rim-
`
`Estimation
`
`
`
`
`FIGURE 19.1 Block diagram ot a typical 11.261 video encoder. {From lTU-T Recommendation H.261.
`March 1993. With permisrtionl
`
`For the tntral'rame mode. each 8 it 8 block is first transformed with DCT and then quantized.
`The variable-length coding (VLCJ is applied to the quantized DCT coefficients with a zigzag
`scanning order such as in MPEG-1. The resulting bits are sent to the encoder buffer to form a
`bitstrcam.
`
`For tlte interfrantc-codirig mode. frame prediction is performed with motion estimation in a
`similar manner to that to MPEG-1. but only P-macrohlocks and P-pictures. no B-maeroblocks and
`B'PiClurcs. are used. Each 8 x 8 block of differences or prediction residues is coded by the same
`DCT coding path its for intral‘rame coding. 1n the motion-compensated predictive coding.
`the
`encoder should perl'orm the motion estimation with the reconstructed pictures instead of the original
`video data, as it will be done in the decoder. Therefore. the IQ and IDCT blocks are included in
`the motion compensation loop to reduce the error propagation drift. Since the VLC operation is
`lossless, there is no need to include the VLC block in the motion compensation loop. The role of
`the spatial filler is to minimize the prediction error hy smoothing the previous frame that is used
`for motion compensation.
`_
`The loop filter is a separable 2-D spatial filter that operates on on 8 x 8 block. The corresponding
`1-D filters are ttonrecursive with coefficients Vi. 1/2. Vi. At block boundaries. the coefficients are 0,
`l. 0 to avoid the taps falling outside the block.
`it should be noted that MPEG-1 uses subpixei
`accurate motion vectors instead of a loop filter to smooth the anchor frame. The performance
`Comparison Of two methods should be interesting.
`The role of coding control includes the rate control. the buffer control. the quantization control,
`and the frame rate control. These parameters are intimately related. The coding control is not the
`part of the standard; however. it is an important part of the encoding Pmccss- For a given target
`bit rate. the encoder has to control several parameters to reach the rate target and at the same time
`provide reasonable coded picture quality.
`_
`_
`Since H.261 is a predictive coder and the VLCs are used everywhere, such as coding quantteed
`DCT coefficients and motion vectors. a single transmission error may cause a loss ofsynehrontzatton
`and censequently cause problems for the reconstruction. To enhance the performance of the. H.261
`Video coder in noisy environments. the transmitted bitstrcarn of H.261 can‘ optionally contatn a
`BC“ (Bose. Chaudhuri. and Hocqucngham} (51 1,493) forward error-correction code.
`'
`The H.261 video decoder performs the inverse operations of the encoder. Alter Opllmml error
`c'Iil‘fctnion decoding, the compressed bitstream enters the decoder buffer and then is parsed by the
`variable-length decoder (VLD). The output of the VLD is applied to the IQ and IDCT where the
`data are converted to the values in the spatial domain. For the interl‘rame-eodtng mode. the motion
`
`|PR2018—01413
`
`Sony EX1008 Page 457
`
`IPR2018-01413
`Sony EX1008 Page 457
`
`
`
`432
`
`Image and \fideo Compression for Multimedia Engineering
`
`unnnnn-nnnn
`
`-I“---Ifllfllflfl- men-“mm
`
`FIGURE 19.2 Arrangement of macroblocks in a (3013. (Front 1TU—T Recommendation H.261. March 1993.
`With permission.)
`
`compensation is performed and the data from the macrobloeks in the anchor frame are added to
`the current data to form the reconstructed data.
`
`19.2.3
`
`SYNTAX DESCRIPTION
`
`The syntax oft-1.261 video coding has a hierarchical layered structure. From the top to the bottom
`the layers are picture layer. 003 layer. macroblock layer. and block layer.
`
`19.2.3.1
`
`Picture Layer
`
`there are
`The picture layer begins with a 20-bit picture start code (PSC). Following the PSC.
`temporal reference (5-bit). picture type information (PTYPE. 6-bit], extra insertion information
`(PEI. 1-bit), and spare information (PSPARE). Then the data for GOBs are followed.
`
`19.2.3.2 COB Layer
`
`A GOB corresPonds to 176 pixels by 48 lines of 1’ and 88 pixels by 24 lines of CI, and C,. The
`(303 layer contains the following data in order: 16~bitGOB start code (GBSC), 4—bit group number
`(ON). 5~bit quantization information (GQUANT), l-bit extra insertion information (G81). and spare
`information (GSPARE). The number of bits for GSPARE is variable depending on the set of GEI
`bits. If OBI is set to “1.” then 9 bits follow, consisting of 8 bits of data and another GEI bit to
`indicate whether a further 9 bits follow, and so on. Data of the COB header are then followed by
`data for macrobloeks.
`
`19.2.3.3 Maeroblock Layer
`
`Each (303 contains 33 macroblocks. which are arranged as in Figure 19.2. A macroblock consists
`of 16 pixels by 16 lines of 1’ that spatially correspond to 8 pixels by 8 lines each of C1, and Cr
`Data in the bitstream for a macroblock consist of a macrobloek header followed by data for bloCkS-
`The macroblock header may include macrobtock address (MBA) (variable length). type information
`(MTYPE) (variable length), quantizer (MQUANT) (5 bits), motion vector data (MVD) (variable
`length), and coded block pattern (CBP) (variable length). The MBA information is always present
`and is coded by VLC. The VLC table for macrobloctc addressing is shown in Table 19.1. The
`presence of other items depends on macroblock type information. which is shown in the VLC
`Table 19-2-
`
`19.2.3.4 Block Layer
`
`Data in the block layer consists of the transformed coefficients followed by an end of block (E03)
`marker (10 bits). The data of transform coefi‘icients (TCOEFF) is first converted to the Pairs oi:
`RUN and LEVEL according to the zigzag scanning order. The RUN represents the number of
`successive zeros and the LEVEL. represents the value of nonzero coefficients. The pairs of RUN
`and LEVEL are then encoded with 111.135. The DC coeifieient of an intrablock is coded b)‘ a fixed-
`length code with 8 bits. A11 VLC tables can be found in the standard document (ITU-T, 1993).
`
`lPR2018—01413
`
`Sony EX1008 Page 458
`
`IPR2018-01413
`Sony EX1008 Page 458
`
`
`
`lTU-T Video Coding Standards H.261 and H.263
`
`433
`
`
`
`TABLE 19.1
`
`VLC Table fer Macroblock Addressing
`
`MBA
`
`Code
`
`MBA
`
`Code
`
`MBA
`
`Code
`
`i
`2
`3
`4
`5
`0
`7
`8
`9
`
`I
`011
`010
`0011
`0010
`0001 I
`00010
`0000111
`0000110
`
`I3
`14
`IS
`I6
`17
`18
`19
`20
`2|
`
`0000 |000
`00000111
`00000|l0
`00000101 11
`0000 0101 10
`0000 010101
`0000010100
`0000010011
`0000 0100 ID
`
`25
`26
`2?
`23
`29
`30
`at
`32
`33
`
`0000 0l00 000
`00000011111
`00000011|l0
`0000 0011101
`0000 0011 100
`0000 00:1 011
`00000011010
`00000011001
`0000 00“ 000
`
`0000 0100011 MBA stuffing
`22
`00001011
`10
`0000 0100 010
`Stan code
`23
`00001010
`11
`I2
`0000100|
`24
`0000 0100 00!
`
`
`00000001 111
`00000000 0000 0001
`
`TABLE 1 9.2
`
`VLC Table for Macroblock Type
`
`Prediction
`
`MQUANT MVD
`
`CBP
`
`TCDEFF
`
`VLC
`
`Intrzt
`lntra
`Inter
`Inter
`lnter+MC
`Inter+MC
`lnter+MC
`Inter+MC+FlL
`lntcr+MC+FlL
`|nter+MC+FlL
`
`Mites:
`
`x
`It
`3‘
`x
`
`x
`I.
`
`000l
`0000 00!
`l
`0000 '
`0°00 000° 1
`0000 000!
`0000 0000 Di
`00'
`0'
`0000 0]
`
`3‘
`1
`
`X
`It
`
`x
`x
`
`x
`X
`it
`x
`1:
`K
`
`l. "x" means that the item is present in the titacrnblock.
`2, It is possible to apply the filter in a non-motion-compensated macmblcck
`by declaring it as MC+FIL but with :1 zero vector.
`
`
`19.3 H.263 VIDEO-CODING STANDARD
`
`The H.263 video-coding standard (ITU-T, 1996) is specifically designed for very low bit rate
`applications such as practical video telecommunication. its technical content was completed In late
`1995 and the standard was approved in early l996.
`
`19.3.1
`
`OVERVIEW or H.263 VIDEO CODING
`
`The basic configuration of the video scurce coding algorithm of H.263 is based on the 0.261.
`Several important features that are different from H.261 include the following new options. unre-
`stricted motion vectors, syntax-based arithmetic coding. advanced prediction. and PB-frames. All
`lhese features can be used together or separately for improving the coding efficiency. The H.263
`
`|PR2018—01413
`
`Sony EX1008 Page 459
`
`IPR2018-01413
`Sony EX1008 Page 459
`
`
`
`434
`
`Image and Video Compression for Multimedia Engineering
`
`
`
`TABLE 19.3
`
`Number of Pixels per Line and the Number of Lines for Each Picture Format
`
`Picture
`Format
`
`Number of Pixels
`for Luminance (o'x)
`
`Number of Lines
`tor Luminance (dyl
`
`Number of Pixels
`for Chrominance {0W2}
`
`Number of Lines
`for Chrominancc ldyfl)
`
`Sub-QCIF
`QClF
`(HF
`4ClF
`IGCIF
`
`123
`176
`352
`704
`MOS
`
`96
`I44
`238
`576
`1152
`
`48
`64
`72
`as
`1461
`176
`283
`352
`
`3'04 576
`
`video standard can be used for both 625-line and S23-iine television standards. The. source coder
`
`operates on the noninterlaccd pictures at picture rate about 30 picturesfsceond. The pictures are
`coded as luminance and two color difference components ('i’, (3,. and (1”,). The source coder is based
`on a CIF. Actually. there are live standardized I'ormats which include sub-QC‘IF, QCIF, CIF, 4CIF.
`and IGCIF. The detail of formats is shovm in Table 19.3.
`
`It is noted that for each format, the chrontinance is a quarter the size ol‘ the luminance picture.
`i.e.. the chrominance pictures are half the size of the luminance picture in both horizontal and
`vertical directions. This is defined by the ITU-R 601 format. For CIF format.
`the number of
`pixelslline is compatible with sampling the active portion of the luminance and color difference
`signals from a 525- or 626—line source at 6.75 and 3.3?5 MHz. respectively. These frequencies have
`a simple relationship to those defined by the [TU-R 601 format.
`
`19.3.2 TECHNICAL FEATURES or H.263
`
`The H.263 encoder structure is similar to the H26] encoder with the exception that there is no
`loop filter in H.263 encoder. The main components of the encoder include block transform. motion—
`compensated prediction. block quantization, and VLC. Each picture is partitioned into groups of
`blocks, which are referred to as 00135. A GOB contains a multiple number of 16 lines. k r 16
`lines, depending on the picture format (it = l for sub—QCIF, QCIF; k = 2 for 4CIF: k = 4 for lfiCIF).
`Each (3013 is divided into macroblocks that are the same as in H.261 and each macrohlock consists
`
`of feur 8 x 8 luminance blocks and two 8 x 8 chrominance blocks. Compared with H.261, H.263
`has several new technical features for the enhancement of coding efficiency for very low bit rate
`applications. These new features include picture—extrapolating motion vectors for unreslricled
`motion vector mode). motion compensation with half-pixel accuracy. advanced prediction {which
`includes variable-biock~size motion compensation and overlapped block motion compensation).
`syntax-based arithmetic coding. and PB-frame mode.
`
`‘1 9.3.2.1 Half-Pixel Accuracy
`
`In H.263 video coding, half-pixel accuracy motion compensation is used. The half-pixel values are
`found using bilinear interpolation as shown in Figure 19.3.
`Note that H.263 uses subpixel accuracy for motion compensation instead of using a loop filter
`to smooth the anchor frames as in H.261. This is also done in other coding standards, such as
`MPEG-1 and MPEG-2. which also use haifspixel accuracy for motion compensation. In MPEG-4
`video, quarter—pixel accuracy for motion compensation has been adopted as a tool for version 32.
`
`19.3.2.2 Unrestricted Motion Vector Mode
`
`Usually motion vectors are limited within the coded picture area of anchor frames. In the unrestricted
`motion vector mode, the motion vectors are allowed to point outstde the pictures. When the values
`
`|PR2018—01413
`
`Sony EX1008 Page 460
`
`IPR2018-01413
`Sony EX1008 Page 460
`
`
`
`ITU-T Video Coding Standards H.261 and H.263
`
`435
`
`A
`
`x
`
`3
`
`C)
`
`c
`
`x
`
`C
`
`O
`
`C)
`
`b
`
`d
`
`x3
`
`x
`D
`
`X Integer pixel position
`O Halfpixcl position
`
`a==A
`b = (mama
`c=(A+C+ly2
`d = {A+B+C+D+2}r‘4
`
`“I" indicates division by truncation
`
`FIGURE 19.3 Half-pixel prediction by bilinear interpolation.
`
`of the motion vectors exceed the boundary of the anchor frame in the unrestricted motion vector
`mode. the picture-extrapolating method is used. The values of reference pixels outside the picture
`boundary will take the values of boundary pixels. The extension of the motion vector range is also
`applied to the unrestricted motion vector mode. In the default prediction mode, the motion vectors
`are restricted to the range of [46. |5.5]. In the unrestricted mode, the maximum range for motion
`vectors is extended to [—3 | .5, 31.5] under certain conditions.
`
`19.3.2.3 Advanced Prediction Mode
`
`Generally. the decoder will accept no more than one motion vector per macroblock for baseline
`algorithm of H.263 video-coding standard. However, in the advanced prediction mode, the syntax
`allows up to four motion vecmrs to be used per macroblock. The decision to use One or four vectors
`is indicated by the maeroblock type and coded block pattern for ehrominanee (MCBPCJ codeword
`for each macroblock. How to make this decision is the task of the encoding process.
`The following example gives the steps of motion estimation and coding mode selection for the
`advanced prediction mode in the encoder.
`
`Step l.
`
`Integer pixel motion estimation:
`
`N—I N-l
`
`SA D~(x._v) = Z Jerigiaal— previoasl,
`i=l]
`j=U
`
`(19. l)
`
`where SAD is the sum ofabsolute difference, values of (x. y) are within the search
`range. N is equal to 16 for 16x 16 block, and N is equal to 8 for 8 x 8 block.
`
`SADM = 23,40” (x,y)
`
`snow = min (scram (x, y), stem).
`
`(19.2)
`
`(19.3)
`
`$pr 2.
`
`_
`,
`_
`.
`Intrafintennode decision:
`HA 4 (314051.” — 500), this macroblock is coded as Intro-MB; OIhBFWISC’ ll 15 coded
`as inter-MB. where snow is determined in Slap 1- and
`
`5I
`IS
`A = E Zlor'iginal = MBHM
`i=0 i=0
`
`(19-4)
`
`|PR2018—01413
`
`Sony EX1008 Page 461
`
`IPR2018-01413
`Sony EX1008 Page 461
`
`
`
`436
`
`Image and Video Compression for Multimedia Engineering
`
`15
`I
`.
`M3 =— E m'i snail.
`"W"
`256 .
`8
`i=0 ;=( )15
`
`If this macroblock is determined to be coded as inter-MB. go to step 3.
`
`Step 3. Half-pixel search:
`In this step. half-pixel search is performed for both 16 >< 16 blocks and 8 x 8 blocks
`as shown in Figure 193.
`Step 4. Decision on 16 x 16 or four 8 x 8 (one motion vector or four motion vectors per
`macroblock):
`
`If 3/1de c SADIn — 100, four motion vectors per maeroblock will be used, one of
`the motion vectors is used for all pixels in one of the four luminance blocks in the
`macroblock, otherwise. one motion vector will be used for all pixels in the mac—
`roblock.
`
`Step 5. Differential coding ofmoti0n vectors for each ol‘S x 8 luminance lilock is performed
`as in Figure l9.4.
`
`When it has been decided to use four motion vectors. the MVDWk motion vector for both
`chrominanee blocks is derived by calculating the sum of the four luminance vectors and dividing
`by 3. The component values of the resulting lint pixel resolution vectors are modified toward the
`position as indicated in the Table 19.4.
`Another advanced prediction mode is overlapped motion compensation for luminance. Actually
`this idea is also used by MPEG-4, which has been described in Chapter l8. In the overlapped
`rnotiOn compensation mode, each pixel in an 8 x S luminance block is a weighted sum of three
`values divided by 8 with rounding. The three values are obtained by the motion compensation with
`three motion vectors: the motion vector of the current iuminance block and two of four “remote"
`
`
`
`MVDX : MVI — RT
`
`MVD)‘ = MVI‘. — r3.
`
`1.1“
`F; = Median(MV MVEX’ MVSJ‘)
`
`r; = Median(Ml/ly, MV M113!)
`2)“
`.I
`P = I; = 0. ifMB is intracoded or block is outside of picture boundary
`
`FIGURE 19.4 Differential coding of motion vectors.
`
`|PR2018—01413
`
`Sony EX1008 Page 462
`
`IPR2018-01413
`Sony EX1008 Page 462
`
`
`
`ITU-T Video Coding Standards H.261 and H.263
`
`437
`
`
`
`TABLE19.4
`
`Modification of V”; Pixel Resolution Chrominance Vector Components
`
`its
`15
`14
`:3
`[2
`ll
`10
`9
`a
`r
`e
`5
`4
`3
`2
`t
`0
`'z'isPthelPosition
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`t) l 1 | l l | l ] l l I 2 200ResultingPosition {2
`
`vectors. These remote vectors include the motion vector of the block to the left or right of the
`current block and the motion vector of the block above or below the current block. The remote
`
`motion vectors from other GOBs are used in the same way as remote motion vectors inside the
`current GOB. For each pixel to be coded in the current block. the remote motion vecrors of the
`blocks at the two nearest block borders are used. i.e.. for the upper half of the block the motion
`vector corresponding to the block above the current block is used while for the lower half of the
`block the motion vector corresponding to the block below the current block is used. Similarly. the
`left half of the block uses the motion vector of the block at the left side of the current block and
`the right half uses the one at the right side of the current block. To make this clearer. let (MVIO,
`MK?) be the motion vector for the current block. (Ml/1', MVJ') be the motion vector for the block
`either above or below. and (MW. MVP!) be the motion vector ofthe block either to the left or right
`of the current block. Then the value of each pixel. p(x. y) in the current 3 X 8 luminance bIUCk is
`given by
`
`p(.t‘. y) = (q[.r, y). H” + r[,t.y]-H1 +5(.t'.y) - H2[x. y)+ 4)/3.
`
`(19.5)
`
`where
`
`and
`
`th, y] = p(x + Mtg", y + MK”),
`
`r(x. y) = p(.t‘ + My: .y + Mtg' ]
`
`:(mr) = pl.” wf. y + MKF).
`
`(19.6)
`
`He is the Weighting matrix for prediction with the current block motion vector. HI rs’the-werghting
`matrix for prediction with the top or bottom block motion vector'and H3 15 the weightmg Emu-11x
`for prediction with the left or right block motion vector. This applies to the luminance b oc on y.
`The values of H". H,, and H2 are shown in Figure 195-
`
`H.
`
`be
`
`
`
`
`
`
`IIIHIIHIflflflflflflflflflflflflflflflfl
`wuunflflflflflflfl
`
`
`
`
`
`
`
`
`
`
`
`FIGURE 19.5 Weighting matrices for overlapped motion compensation.
`
`|PR2018—01413
`
`Sony EX1008 Page 463
`
`IPR2018-01413
`Sony EX1008 Page 463
`
`
`
`438
`
`Image and Video Compression for Multimedia Engineering
`
`It should be noted that the above coding scheme is not optimized in the selection of mode
`decision since the decision depends only on the values of predictive residues. Optimized mode
`decision techniques that include the above possibilities for prediction have been considered by
`Weigand (1996).
`
`19.3.2.4 Syntax-Based Arithmetic Coding
`
`As in other video-coding standards. H.263 uses VLC and variablcvlength decoding (VLCNLD) to
`remove the redundancy in the video data. The basic principle. ol‘ VLC is to encode a symbol Wllh
`a specific table based on the syntax of the coder. The symbol is mapped to an entry of the table in
`a table lockup operation. then the binary codeword specified by the. entry is sent
`to a hitstream
`buffer for transmitting to the decoder. In the decoder. an inverse operation. VID. is performed to
`reconstruct the symbol by the table lookup Operation based on the same syntax ot' the coder. The
`tables in the decoder must be the same as the one used in the encoder tor encoding the current
`
`symbol. "Io obtain better performance. the tables are generated in a statistically optimized way
`(such as a Huffman coder) with a large number of training sequences. This VLCNLD process
`implies that each symbol be encoded into a lised~intcgral number of hits. An optional t‘eature ot'
`H.263 is to use arithmetic coding to remove the restriction of lined-integral number hits for symbols
`This syntax-based arithmetic coding mode may result in hit rate reductions.
`
`19.3.2.5
`
`PB-Frames
`
`The PB-frarne is a new feature of H.263 video coding. A PB—l‘rame consists of two pictures. one
`P-picture and one B-pieture. being coded as one unit. as shown in Figure 19.6. Since. H.261 does
`not have B-pictures. the concept of a B-pieture comes from the MPEG video-coding standards. In
`a PB-t‘rame, lhe P-picture is predicted from the previous decoded [- or P-pieture and the B-ptclurc
`is bidirectionally predicted both from the previous decoded I- or P-picture and the P-picturc in the
`PB-frame unit. which is currently being decoded.
`Several detailed issues have to be addressed at macroblock level in PB—t'rame mode:
`
`-
`
`the P-macroblock in the PB-unit is
`If a macrobloek in the PB-t'rame is intraeoded,
`intraeoded and the B-macroblock in the PB-unit is intercoded. The motion vector ot’
`
`intereodcd PB-maerobloek is used for the B-maerohlock only.
`0 A macroblock in PB-frarne contains [2 blocks for 4:2:0 Format, siit {four luminance
`blocks and two chrominance blocks) from the P-t‘rame and six from the B-frarne. The
`data for the six P-blocks are transmitted first and then for the six B-blocks.
`
`- Different parts of a B-block in a PB-frarne can be predicted with different modes. For
`pixels where the backward vector points inside of coded P~tnacroblock. bidirectional
`prediction is used. For all other pixels, forward prediction is used.
`
`PB-frame
`
`I
`
`l_—'_—l
`P
`
`B
`
`FIGURE 19.6 Prediction in PB-frames
`
`mode. (From tTU-T Recommendation
`H.263, May 1996. With permission.)
`
`lPR2018—01413
`
`Sony EX1008 Page 464
`
`IPR2018-01413
`Sony EX1008 Page 464
`
`
`
`ITU-T Video Coding Standards H.261 and H.263
`
`439
`
`19.4 H.263 VIDEO CODING STANDARD VERSION 2
`
`19.4.1
`
`OVERVIEW or H.263 VERSION 2
`
`The H.263 version 2 (l'l'U~T. H.398) video-coding standard. also known as H.263+, was approved
`in January 1998 by the lTU—T. H.263 version 2 includes a number of new optional features based
`on the [-1.263 video—coding standard. These new optional features are added to broaden the appli—
`cation range of H.2{i3 and to improve its coding efficiency. The main features are flexible video
`format. scalability, and bacl;ward-compatible supplemental enhancement infortnation.Among these
`new optional I'catures. tive of them are intended to improve the coding efficiency and three of them
`are proposed to address the needs of mobile video and other noisy transmission environments. The
`features of scalability provide the capability of generating layered bitstreams. which are spatial
`scalability. temporal scalability. and signal-to-noise ratio {SNR) scalability similar to those defined
`by the MPEG—2 video—coding standard. There are also other modes oil-1.263 version 2 that provide
`some enhancement functions. We will describe these features in the following section.
`
`19.4.2 New FEATURES or H.263 VERSION 2
`
`The H.263 version 2 includes a number of new leatures. In the following we briefly describe the
`key techniques used for these features.
`
`19.4.2.1
`
`Scalability
`
`The scalability function allows for encoding the video sequences in a hierarchical way that partitions
`the pictures into one basic layer and one or more enhancement layers. The decoders have the option
`of decoding only the base layer bitstrcam to obtain lower-quality reconstructed pictures or further
`decode the enhancement layers to obtain higher—quality decoded pictures. There are three types of
`scalability in H.263: temporal scalability. SNR scalability, and spatial scalability.
`Ternfloral scalability (Figure 19.7) is achieved by using B—ptctures as the cnltancenientdlayqer.
`All three types of scalability are similar to the ones in the MPEG-2 video-coding standalr .b to
`B'Pictures are predicted from either or both a previous and subsequent decoded picture In tie ase
`layer.
`_
`_
`.
`.
`h
`t:
`In SNR scalability (Figure. “3.8). the pictures are first encoded wrth coarse quantization tn’t
`base layer. The differences or coding error pictures between a reconstructed picture and HS ortgtga
`in the base layer encoder are then encoded in the enhancement layer and sent
`to the deeoflcr
`PTDViding an enhancement of SNR. In the enhancement layer there are two‘lYPCS 0f ptctures.
`E3;
`picture in the enhancement layer is only predicted from the base layer.
`it
`ts referrer: to as an" e
`picture. It is a bidirectionally predicted picture if it uses both a Prim“: enhancementlayer digit;
`and a temporally simultaneous base layer reference picture for predlCllon- Now 1113i ”‘3 PFC
`
`Enhancement
`
`[AYEr / \ /
`
`Base
`Layer
`
`________..+
`
`___——-b
`
`FIGURE 19,7 Temporal scalability. (From ITUJT Recommendation H.263. May 1996. With permtssmn.)
`
`|PR2018—01413
`
`Sony EX1008 Page 465
`
`IPR2018-01413
`Sony EX1008 Page 465
`
`
`
`440
`
`Image and Video Compression for Multimedia Engineering
`
`Layer
`
`Enhancement '1'
`Hm “ILaser
`
`1
`
`T
`
`1‘
`
`FIGURE 19.8 SNR scalability. (From lTU-T Recommendation H.263. May I996. With permission.)
`
`Enhancement
`Layer
`
`Base l
`
`Layer
`
`FIGURE 19.9 Spatial scalability. (From lTU-T Recommendation H.263. May 1996. With permission.)
`
`from the reference layer uses no motion vectors. However, EP (enhancement P) pictures use motion
`vectors when predicted from their temporally prior reference picture in the same layer. Also. if
`more than two layers are used. the reference may be the lower layer instead of the base layer.
`In spatial scalability (Figure 19.9), lower-resolution pictures are encoded in the base layer 0f
`lower layer. The differences or error pictures between up-sarnpled decoded base layer Plumes and
`their original picture are encoded in the enhancement layer and sent to the decoder providing “‘3
`spatial enhancement pictures. As in MPEG-2, spatial interpolation filters are used for the spatial
`scalability. There are also two types of pictures in the enhancement layer: EI and EP. If a decoder
`is able to perform spatial scalability. it may also need to be able to use a custom picture format.
`For example, if the base layer is sub-QCIF (128 x 96), the enhancement layer picture would be
`256 x 192, which does not belong to a standard picture format.
`Scalability in H.263 can be perform