throbber

`
`’l 9 lTU—T Video Coding Standards
`H.261 and H.263
`
`This chapter introduces ITU-T video coding standards H.261 and H.263, which are established
`mainly for videophony and videoconferencing. The basic technical detail of H.261 is presented.
`The technical
`improvements with which H.263 achieves high coding efficiency are discussed.
`Features of H.263+, H.263++, and H.26L are presented.
`
`19.1
`
`INTRODUCTION
`
`Very low bit rate video coding has found many industry applications such as wireless and network
`communications. The rapid convergence of standardization of digital video-coding standards is the
`reflection of several factors:
`the maturity of technologies in terms of algorithmic performance.
`hard ware implementation wrth VLSI technology, and the market need for rapid advances in wireless
`and network communications. As stated in the previous chapters. these standards include IPEG for
`still
`image coding and MPEG—1}? for CD-ROM storage and digital
`television applications. In
`Pflfalle] With the ISOXIEC development of the MPEG- 1/2 standards, the ITU-T has developed H.261
`“TILT. 1993) for vtdcotelcphony and videoconferencing applications in an ISDN environment.
`
`19.2 H.261 VIDEO-CODING STANDARD
`
`The H.261 video-coding standard was developed by ITU-T study group XV during 1988 to 1993.
`It was adopted in 1990 and the final revision approved in 1993. This is also referred to as the P x 64
`Standard because it encodes the digital video signals at the bit rates ofP x 64 ths, where P is an
`integer from t
`to 30. Le... at the bit rates 64 Kbps to [.92 Mbps.
`
`19.2.1 Ovecwrw or H.261 VIDEO-CODING STANDARD
`
`The H26! video-coding standard has many features in common with the MPEG-1 video-codtng
`standard. However, since they target different applications, there exist many differences between
`the two standards, such as data rates, pieture quality. end-to-end delay, and others. Before indicating
`the differences between the two coding standards. we describe the major similarity between H.261
`and MPEG-li’2. First both standards are used to code similar video format. H26! is mainly used
`to code the video with the common intermediate format (CIF) or quarter-C113. (QCIF) spatial
`rtisolation for teleconferencing application. MPEG-1 uses CIF. SfF, or higher spatial resolution for
`CD-ROM applications, The original motivation for developing the H.26l‘videojcoding standard
`was to provide a standard that can be used for both PAL and NTSC television Signals. But later.
`the H26] was mainly used for videoconferencing and the MPEG-1}? was used for digital televtsmn
`(BTW VCD (video co), and DVD {digital video disk]. The two TV system‘s. PAL and NTsc.
`USe different line and picture rates. The NTSC. which is used in North America and Japan, uses
`525 lines per interlaced picture at 30 framesrsecoiid- The PAL system is used for most oth:r
`countries, and it uses 625 lines per interlaced picture at 25 framesi’second. For this purpose, 1 e
`CIF was adapted as the source video format for the H.261 video coder. The CIF format conSists
`of 352 pixels/line, 288 linest'frarne. and 30 framesr'second. This format represents half the active
`
`429
`
`|PR2018—01413
`
`Sony EX1008 Page 455
`
`IPR2018-01413
`Sony EX1008 Page 455
`
`

`

`430
`
`Image and Video Compression for Multimedia Engineering
`
`lines of the PAL signal and the same picture rate of the NTSC signal. The PAL systems need only
`perform a picture rate conversion and NTSC systems need only perform a line number conversion,
`Color pictures consist of one luminance and two color-difference components (referred to as Y C“
`C, format) as specified by the CCIR601 standard. The C,, and C, components one the half-sire on
`both horizontal and vertical directions and have 1'36 pixelsftinc and 144 lincsi’l'rame. The other
`format, QCIF. is used for very low bit rate applications. The QCIF has half the number of pixels
`and half the number of lines of CIF format. Second.
`the key codin;I algorithms of H.261 and
`
`MPEG4 are very similar. Both H.261 and MPEG-1 use DCT—bnscd coding to remove intrat'riitnc
`redundancy and motion compensation to remove inlerl'rnme redundancy.
`Now let us describe the main differences between the two coding standards With respect to
`coding algorithms. The main differences include:
`
`- H.26i uses only I- and P-macroblocks but no B-niacroblocks. while M PEG-1 uses three
`macroblock types. 1-. P. and B—inacroblocks tl-IttélCt‘Ohluck is lit inii';ili'-.itne-eoded 111:th
`robloclt, P-rnacroblock is a predictive-coded mttcroblock. aind B-triuciohlock is a bidi-
`rectionally coded macroblock}, as well as tltree picture types. 1—. P—. and B-piclurcs ‘JS
`defined in Chapter 16 for the MPEG-1 standard.
`- There is a constraint of H.261 that for every 132 intert'rnme-coded iii-.icroblocks. which
`corresponds to 4 GOBs {group of blocks) or to one-third of the CIF pictures. it requires
`at least one intraframocoded macroblock. To obtain better coding performance at low-
`bit~rate applications. most encoding schemes ot‘HQot prefer not to use intrat‘rnme codingr
`on all the macroblocks of a picture, but only on it few iiiricroblocks in etcry picture with
`a rotational scheme. MPEG-1 uses the GOP (group ol‘picturesl structure. where the size
`of GOP (the distance between two l-piclures) is not specified.
`is critictil for H.261. The
`' The end-to-end delay is not a critical issue for MPEG-1. but
`video encoder and video decoder delays of H.261 need to be known to allow audio
`compensation delays to be fixed when H.261 is used in Interactive applications. Till-‘5
`will allow lip synchronization to be maintained.
`is only a
`- The accuracy of motion compensation in MPEG-1 is up to a halllpiscl, but
`full-pixel in H.261. However. H.261 uses a loop litter to smooth the previous frame. This
`filter attempts to minimize the prediction error.
`In H.261. a fixed picture aspect ratio of 4:3 is used. In MPEG—1. several picture aspect
`ratios can be used and the picture aspect ratio is defined in the picture header.
`0 Finally,
`in H.261. the encoded picture rate is restricted to allow up to three skipped
`frames. This would allow the control mechanism in the encoder some flexibility to control
`the encoded picture quality and satisfy the buffer regulation. Although MPEG-1 has no
`restriction on skipped frames.
`the encoder usually does not perform t'rarne skipping.
`Rather. the syntax for B-frnmes is exploited. as B-frames require much fewer bits than
`P-pictures.
`
`-
`
`19.2.2 TECHNICAL DETAlI. or H.261
`
`The key technologies used in the H.261 video-coding standard are the DCT and motion compen-
`sation. The main components in the encoder include DCT. prediction. quantization (Q).
`inverse
`DCT (IDCT).
`inverse quantization (IQ), 100p filter, frame memory, variable-length coding» and
`coding control unit. A typical encoder structure is shown in Figure 19.1.
`The input video source is first converted to the CIF frame and then is stored in the frame memory-
`The CIF frame is then partitioned into GOBs. The 608 contains 33 maeroblocks. which are 'ft: of
`a CIF picture or 1A of a QCIF picture. Each maeroblock consists of six 8 x 8 blocks among which
`four are luminance 0’) blocks and two are chrominanee blocks (one of Cb and one of C,).
`
`|PR2018—01413
`
`Sony EX1008 Page 456
`
`IPR2018-01413
`Sony EX1008 Page 456
`
`

`

`lTU-T Video Coding Standards H.261 and H.263
`
`431
`
`Coding Control
`
`
`
`
`
`Motion
`Comtensation-
`rim-
`
`Estimation
`
`
`
`
`FIGURE 19.1 Block diagram ot a typical 11.261 video encoder. {From lTU-T Recommendation H.261.
`March 1993. With permisrtionl
`
`For the tntral'rame mode. each 8 it 8 block is first transformed with DCT and then quantized.
`The variable-length coding (VLCJ is applied to the quantized DCT coefficients with a zigzag
`scanning order such as in MPEG-1. The resulting bits are sent to the encoder buffer to form a
`bitstrcam.
`
`For tlte interfrantc-codirig mode. frame prediction is performed with motion estimation in a
`similar manner to that to MPEG-1. but only P-macrohlocks and P-pictures. no B-maeroblocks and
`B'PiClurcs. are used. Each 8 x 8 block of differences or prediction residues is coded by the same
`DCT coding path its for intral‘rame coding. 1n the motion-compensated predictive coding.
`the
`encoder should perl'orm the motion estimation with the reconstructed pictures instead of the original
`video data, as it will be done in the decoder. Therefore. the IQ and IDCT blocks are included in
`the motion compensation loop to reduce the error propagation drift. Since the VLC operation is
`lossless, there is no need to include the VLC block in the motion compensation loop. The role of
`the spatial filler is to minimize the prediction error hy smoothing the previous frame that is used
`for motion compensation.
`_
`The loop filter is a separable 2-D spatial filter that operates on on 8 x 8 block. The corresponding
`1-D filters are ttonrecursive with coefficients Vi. 1/2. Vi. At block boundaries. the coefficients are 0,
`l. 0 to avoid the taps falling outside the block.
`it should be noted that MPEG-1 uses subpixei
`accurate motion vectors instead of a loop filter to smooth the anchor frame. The performance
`Comparison Of two methods should be interesting.
`The role of coding control includes the rate control. the buffer control. the quantization control,
`and the frame rate control. These parameters are intimately related. The coding control is not the
`part of the standard; however. it is an important part of the encoding Pmccss- For a given target
`bit rate. the encoder has to control several parameters to reach the rate target and at the same time
`provide reasonable coded picture quality.
`_
`_
`Since H.261 is a predictive coder and the VLCs are used everywhere, such as coding quantteed
`DCT coefficients and motion vectors. a single transmission error may cause a loss ofsynehrontzatton
`and censequently cause problems for the reconstruction. To enhance the performance of the. H.261
`Video coder in noisy environments. the transmitted bitstrcarn of H.261 can‘ optionally contatn a
`BC“ (Bose. Chaudhuri. and Hocqucngham} (51 1,493) forward error-correction code.
`'
`The H.261 video decoder performs the inverse operations of the encoder. Alter Opllmml error
`c'Iil‘fctnion decoding, the compressed bitstream enters the decoder buffer and then is parsed by the
`variable-length decoder (VLD). The output of the VLD is applied to the IQ and IDCT where the
`data are converted to the values in the spatial domain. For the interl‘rame-eodtng mode. the motion
`
`|PR2018—01413
`
`Sony EX1008 Page 457
`
`IPR2018-01413
`Sony EX1008 Page 457
`
`

`

`432
`
`Image and \fideo Compression for Multimedia Engineering
`
`unnnnn-nnnn
`
`-I“---Ifllfllflfl- men-“mm
`
`FIGURE 19.2 Arrangement of macroblocks in a (3013. (Front 1TU—T Recommendation H.261. March 1993.
`With permission.)
`
`compensation is performed and the data from the macrobloeks in the anchor frame are added to
`the current data to form the reconstructed data.
`
`19.2.3
`
`SYNTAX DESCRIPTION
`
`The syntax oft-1.261 video coding has a hierarchical layered structure. From the top to the bottom
`the layers are picture layer. 003 layer. macroblock layer. and block layer.
`
`19.2.3.1
`
`Picture Layer
`
`there are
`The picture layer begins with a 20-bit picture start code (PSC). Following the PSC.
`temporal reference (5-bit). picture type information (PTYPE. 6-bit], extra insertion information
`(PEI. 1-bit), and spare information (PSPARE). Then the data for GOBs are followed.
`
`19.2.3.2 COB Layer
`
`A GOB corresPonds to 176 pixels by 48 lines of 1’ and 88 pixels by 24 lines of CI, and C,. The
`(303 layer contains the following data in order: 16~bitGOB start code (GBSC), 4—bit group number
`(ON). 5~bit quantization information (GQUANT), l-bit extra insertion information (G81). and spare
`information (GSPARE). The number of bits for GSPARE is variable depending on the set of GEI
`bits. If OBI is set to “1.” then 9 bits follow, consisting of 8 bits of data and another GEI bit to
`indicate whether a further 9 bits follow, and so on. Data of the COB header are then followed by
`data for macrobloeks.
`
`19.2.3.3 Maeroblock Layer
`
`Each (303 contains 33 macroblocks. which are arranged as in Figure 19.2. A macroblock consists
`of 16 pixels by 16 lines of 1’ that spatially correspond to 8 pixels by 8 lines each of C1, and Cr
`Data in the bitstream for a macroblock consist of a macrobloek header followed by data for bloCkS-
`The macroblock header may include macrobtock address (MBA) (variable length). type information
`(MTYPE) (variable length), quantizer (MQUANT) (5 bits), motion vector data (MVD) (variable
`length), and coded block pattern (CBP) (variable length). The MBA information is always present
`and is coded by VLC. The VLC table for macrobloctc addressing is shown in Table 19.1. The
`presence of other items depends on macroblock type information. which is shown in the VLC
`Table 19-2-
`
`19.2.3.4 Block Layer
`
`Data in the block layer consists of the transformed coefficients followed by an end of block (E03)
`marker (10 bits). The data of transform coefi‘icients (TCOEFF) is first converted to the Pairs oi:
`RUN and LEVEL according to the zigzag scanning order. The RUN represents the number of
`successive zeros and the LEVEL. represents the value of nonzero coefficients. The pairs of RUN
`and LEVEL are then encoded with 111.135. The DC coeifieient of an intrablock is coded b)‘ a fixed-
`length code with 8 bits. A11 VLC tables can be found in the standard document (ITU-T, 1993).
`
`lPR2018—01413
`
`Sony EX1008 Page 458
`
`IPR2018-01413
`Sony EX1008 Page 458
`
`

`

`lTU-T Video Coding Standards H.261 and H.263
`
`433
`
`
`
`TABLE 19.1
`
`VLC Table fer Macroblock Addressing
`
`MBA
`
`Code
`
`MBA
`
`Code
`
`MBA
`
`Code
`
`i
`2
`3
`4
`5
`0
`7
`8
`9
`
`I
`011
`010
`0011
`0010
`0001 I
`00010
`0000111
`0000110
`
`I3
`14
`IS
`I6
`17
`18
`19
`20
`2|
`
`0000 |000
`00000111
`00000|l0
`00000101 11
`0000 0101 10
`0000 010101
`0000010100
`0000010011
`0000 0100 ID
`
`25
`26
`2?
`23
`29
`30
`at
`32
`33
`
`0000 0l00 000
`00000011111
`00000011|l0
`0000 0011101
`0000 0011 100
`0000 00:1 011
`00000011010
`00000011001
`0000 00“ 000
`
`0000 0100011 MBA stuffing
`22
`00001011
`10
`0000 0100 010
`Stan code
`23
`00001010
`11
`I2
`0000100|
`24
`0000 0100 00!
`
`
`00000001 111
`00000000 0000 0001
`
`TABLE 1 9.2
`
`VLC Table for Macroblock Type
`
`Prediction
`
`MQUANT MVD
`
`CBP
`
`TCDEFF
`
`VLC
`
`Intrzt
`lntra
`Inter
`Inter
`lnter+MC
`Inter+MC
`lnter+MC
`Inter+MC+FlL
`lntcr+MC+FlL
`|nter+MC+FlL
`
`Mites:
`
`x
`It
`3‘
`x
`
`x
`I.
`
`000l
`0000 00!
`l
`0000 '
`0°00 000° 1
`0000 000!
`0000 0000 Di
`00'
`0'
`0000 0]
`
`3‘
`1
`
`X
`It
`
`x
`x
`
`x
`X
`it
`x
`1:
`K
`
`l. "x" means that the item is present in the titacrnblock.
`2, It is possible to apply the filter in a non-motion-compensated macmblcck
`by declaring it as MC+FIL but with :1 zero vector.
`
`
`19.3 H.263 VIDEO-CODING STANDARD
`
`The H.263 video-coding standard (ITU-T, 1996) is specifically designed for very low bit rate
`applications such as practical video telecommunication. its technical content was completed In late
`1995 and the standard was approved in early l996.
`
`19.3.1
`
`OVERVIEW or H.263 VIDEO CODING
`
`The basic configuration of the video scurce coding algorithm of H.263 is based on the 0.261.
`Several important features that are different from H.261 include the following new options. unre-
`stricted motion vectors, syntax-based arithmetic coding. advanced prediction. and PB-frames. All
`lhese features can be used together or separately for improving the coding efficiency. The H.263
`
`|PR2018—01413
`
`Sony EX1008 Page 459
`
`IPR2018-01413
`Sony EX1008 Page 459
`
`

`

`434
`
`Image and Video Compression for Multimedia Engineering
`
`
`
`TABLE 19.3
`
`Number of Pixels per Line and the Number of Lines for Each Picture Format
`
`Picture
`Format
`
`Number of Pixels
`for Luminance (o'x)
`
`Number of Lines
`tor Luminance (dyl
`
`Number of Pixels
`for Chrominance {0W2}
`
`Number of Lines
`for Chrominancc ldyfl)
`
`Sub-QCIF
`QClF
`(HF
`4ClF
`IGCIF
`
`123
`176
`352
`704
`MOS
`
`96
`I44
`238
`576
`1152
`
`48
`64
`72
`as
`1461
`176
`283
`352
`
`3'04 576
`
`video standard can be used for both 625-line and S23-iine television standards. The. source coder
`
`operates on the noninterlaccd pictures at picture rate about 30 picturesfsceond. The pictures are
`coded as luminance and two color difference components ('i’, (3,. and (1”,). The source coder is based
`on a CIF. Actually. there are live standardized I'ormats which include sub-QC‘IF, QCIF, CIF, 4CIF.
`and IGCIF. The detail of formats is shovm in Table 19.3.
`
`It is noted that for each format, the chrontinance is a quarter the size ol‘ the luminance picture.
`i.e.. the chrominance pictures are half the size of the luminance picture in both horizontal and
`vertical directions. This is defined by the ITU-R 601 format. For CIF format.
`the number of
`pixelslline is compatible with sampling the active portion of the luminance and color difference
`signals from a 525- or 626—line source at 6.75 and 3.3?5 MHz. respectively. These frequencies have
`a simple relationship to those defined by the [TU-R 601 format.
`
`19.3.2 TECHNICAL FEATURES or H.263
`
`The H.263 encoder structure is similar to the H26] encoder with the exception that there is no
`loop filter in H.263 encoder. The main components of the encoder include block transform. motion—
`compensated prediction. block quantization, and VLC. Each picture is partitioned into groups of
`blocks, which are referred to as 00135. A GOB contains a multiple number of 16 lines. k r 16
`lines, depending on the picture format (it = l for sub—QCIF, QCIF; k = 2 for 4CIF: k = 4 for lfiCIF).
`Each (3013 is divided into macroblocks that are the same as in H.261 and each macrohlock consists
`
`of feur 8 x 8 luminance blocks and two 8 x 8 chrominance blocks. Compared with H.261, H.263
`has several new technical features for the enhancement of coding efficiency for very low bit rate
`applications. These new features include picture—extrapolating motion vectors for unreslricled
`motion vector mode). motion compensation with half-pixel accuracy. advanced prediction {which
`includes variable-biock~size motion compensation and overlapped block motion compensation).
`syntax-based arithmetic coding. and PB-frame mode.
`
`‘1 9.3.2.1 Half-Pixel Accuracy
`
`In H.263 video coding, half-pixel accuracy motion compensation is used. The half-pixel values are
`found using bilinear interpolation as shown in Figure 19.3.
`Note that H.263 uses subpixel accuracy for motion compensation instead of using a loop filter
`to smooth the anchor frames as in H.261. This is also done in other coding standards, such as
`MPEG-1 and MPEG-2. which also use haifspixel accuracy for motion compensation. In MPEG-4
`video, quarter—pixel accuracy for motion compensation has been adopted as a tool for version 32.
`
`19.3.2.2 Unrestricted Motion Vector Mode
`
`Usually motion vectors are limited within the coded picture area of anchor frames. In the unrestricted
`motion vector mode, the motion vectors are allowed to point outstde the pictures. When the values
`
`|PR2018—01413
`
`Sony EX1008 Page 460
`
`IPR2018-01413
`Sony EX1008 Page 460
`
`

`

`ITU-T Video Coding Standards H.261 and H.263
`
`435
`
`A
`
`x
`
`3
`
`C)
`
`c
`
`x
`
`C
`
`O
`
`C)
`
`b
`
`d
`
`x3
`
`x
`D
`
`X Integer pixel position
`O Halfpixcl position
`
`a==A
`b = (mama
`c=(A+C+ly2
`d = {A+B+C+D+2}r‘4
`
`“I" indicates division by truncation
`
`FIGURE 19.3 Half-pixel prediction by bilinear interpolation.
`
`of the motion vectors exceed the boundary of the anchor frame in the unrestricted motion vector
`mode. the picture-extrapolating method is used. The values of reference pixels outside the picture
`boundary will take the values of boundary pixels. The extension of the motion vector range is also
`applied to the unrestricted motion vector mode. In the default prediction mode, the motion vectors
`are restricted to the range of [46. |5.5]. In the unrestricted mode, the maximum range for motion
`vectors is extended to [—3 | .5, 31.5] under certain conditions.
`
`19.3.2.3 Advanced Prediction Mode
`
`Generally. the decoder will accept no more than one motion vector per macroblock for baseline
`algorithm of H.263 video-coding standard. However, in the advanced prediction mode, the syntax
`allows up to four motion vecmrs to be used per macroblock. The decision to use One or four vectors
`is indicated by the maeroblock type and coded block pattern for ehrominanee (MCBPCJ codeword
`for each macroblock. How to make this decision is the task of the encoding process.
`The following example gives the steps of motion estimation and coding mode selection for the
`advanced prediction mode in the encoder.
`
`Step l.
`
`Integer pixel motion estimation:
`
`N—I N-l
`
`SA D~(x._v) = Z Jerigiaal— previoasl,
`i=l]
`j=U
`
`(19. l)
`
`where SAD is the sum ofabsolute difference, values of (x. y) are within the search
`range. N is equal to 16 for 16x 16 block, and N is equal to 8 for 8 x 8 block.
`
`SADM = 23,40” (x,y)
`
`snow = min (scram (x, y), stem).
`
`(19.2)
`
`(19.3)
`
`$pr 2.
`
`_
`,
`_
`.
`Intrafintennode decision:
`HA 4 (314051.” — 500), this macroblock is coded as Intro-MB; OIhBFWISC’ ll 15 coded
`as inter-MB. where snow is determined in Slap 1- and
`
`5I
`IS
`A = E Zlor'iginal = MBHM
`i=0 i=0
`
`(19-4)
`
`|PR2018—01413
`
`Sony EX1008 Page 461
`
`IPR2018-01413
`Sony EX1008 Page 461
`
`

`

`436
`
`Image and Video Compression for Multimedia Engineering
`
`15
`I
`.
`M3 =— E m'i snail.
`"W"
`256 .
`8
`i=0 ;=( )15
`
`If this macroblock is determined to be coded as inter-MB. go to step 3.
`
`Step 3. Half-pixel search:
`In this step. half-pixel search is performed for both 16 >< 16 blocks and 8 x 8 blocks
`as shown in Figure 193.
`Step 4. Decision on 16 x 16 or four 8 x 8 (one motion vector or four motion vectors per
`macroblock):
`
`If 3/1de c SADIn — 100, four motion vectors per maeroblock will be used, one of
`the motion vectors is used for all pixels in one of the four luminance blocks in the
`macroblock, otherwise. one motion vector will be used for all pixels in the mac—
`roblock.
`
`Step 5. Differential coding ofmoti0n vectors for each ol‘S x 8 luminance lilock is performed
`as in Figure l9.4.
`
`When it has been decided to use four motion vectors. the MVDWk motion vector for both
`chrominanee blocks is derived by calculating the sum of the four luminance vectors and dividing
`by 3. The component values of the resulting lint pixel resolution vectors are modified toward the
`position as indicated in the Table 19.4.
`Another advanced prediction mode is overlapped motion compensation for luminance. Actually
`this idea is also used by MPEG-4, which has been described in Chapter l8. In the overlapped
`rnotiOn compensation mode, each pixel in an 8 x S luminance block is a weighted sum of three
`values divided by 8 with rounding. The three values are obtained by the motion compensation with
`three motion vectors: the motion vector of the current iuminance block and two of four “remote"
`
`
`
`MVDX : MVI — RT
`
`MVD)‘ = MVI‘. — r3.
`
`1.1“
`F; = Median(MV MVEX’ MVSJ‘)
`
`r; = Median(Ml/ly, MV M113!)
`2)“
`.I
`P = I; = 0. ifMB is intracoded or block is outside of picture boundary
`
`FIGURE 19.4 Differential coding of motion vectors.
`
`|PR2018—01413
`
`Sony EX1008 Page 462
`
`IPR2018-01413
`Sony EX1008 Page 462
`
`

`

`ITU-T Video Coding Standards H.261 and H.263
`
`437
`
`
`
`TABLE19.4
`
`Modification of V”; Pixel Resolution Chrominance Vector Components
`
`its
`15
`14
`:3
`[2
`ll
`10
`9
`a
`r
`e
`5
`4
`3
`2
`t
`0
`'z'isPthelPosition
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`t) l 1 | l l | l ] l l I 2 200ResultingPosition {2
`
`vectors. These remote vectors include the motion vector of the block to the left or right of the
`current block and the motion vector of the block above or below the current block. The remote
`
`motion vectors from other GOBs are used in the same way as remote motion vectors inside the
`current GOB. For each pixel to be coded in the current block. the remote motion vecrors of the
`blocks at the two nearest block borders are used. i.e.. for the upper half of the block the motion
`vector corresponding to the block above the current block is used while for the lower half of the
`block the motion vector corresponding to the block below the current block is used. Similarly. the
`left half of the block uses the motion vector of the block at the left side of the current block and
`the right half uses the one at the right side of the current block. To make this clearer. let (MVIO,
`MK?) be the motion vector for the current block. (Ml/1', MVJ') be the motion vector for the block
`either above or below. and (MW. MVP!) be the motion vector ofthe block either to the left or right
`of the current block. Then the value of each pixel. p(x. y) in the current 3 X 8 luminance bIUCk is
`given by
`
`p(.t‘. y) = (q[.r, y). H” + r[,t.y]-H1 +5(.t'.y) - H2[x. y)+ 4)/3.
`
`(19.5)
`
`where
`
`and
`
`th, y] = p(x + Mtg", y + MK”),
`
`r(x. y) = p(.t‘ + My: .y + Mtg' ]
`
`:(mr) = pl.” wf. y + MKF).
`
`(19.6)
`
`He is the Weighting matrix for prediction with the current block motion vector. HI rs’the-werghting
`matrix for prediction with the top or bottom block motion vector'and H3 15 the weightmg Emu-11x
`for prediction with the left or right block motion vector. This applies to the luminance b oc on y.
`The values of H". H,, and H2 are shown in Figure 195-
`
`H.
`
`be
`
`
`
`
`
`
`IIIHIIHIflflflflflflflflflflflflflflflfl
`wuunflflflflflflfl
`
`
`
`
`
`
`
`
`
`
`
`FIGURE 19.5 Weighting matrices for overlapped motion compensation.
`
`|PR2018—01413
`
`Sony EX1008 Page 463
`
`IPR2018-01413
`Sony EX1008 Page 463
`
`

`

`438
`
`Image and Video Compression for Multimedia Engineering
`
`It should be noted that the above coding scheme is not optimized in the selection of mode
`decision since the decision depends only on the values of predictive residues. Optimized mode
`decision techniques that include the above possibilities for prediction have been considered by
`Weigand (1996).
`
`19.3.2.4 Syntax-Based Arithmetic Coding
`
`As in other video-coding standards. H.263 uses VLC and variablcvlength decoding (VLCNLD) to
`remove the redundancy in the video data. The basic principle. ol‘ VLC is to encode a symbol Wllh
`a specific table based on the syntax of the coder. The symbol is mapped to an entry of the table in
`a table lockup operation. then the binary codeword specified by the. entry is sent
`to a hitstream
`buffer for transmitting to the decoder. In the decoder. an inverse operation. VID. is performed to
`reconstruct the symbol by the table lookup Operation based on the same syntax ot' the coder. The
`tables in the decoder must be the same as the one used in the encoder tor encoding the current
`
`symbol. "Io obtain better performance. the tables are generated in a statistically optimized way
`(such as a Huffman coder) with a large number of training sequences. This VLCNLD process
`implies that each symbol be encoded into a lised~intcgral number of hits. An optional t‘eature ot'
`H.263 is to use arithmetic coding to remove the restriction of lined-integral number hits for symbols
`This syntax-based arithmetic coding mode may result in hit rate reductions.
`
`19.3.2.5
`
`PB-Frames
`
`The PB-frarne is a new feature of H.263 video coding. A PB—l‘rame consists of two pictures. one
`P-picture and one B-pieture. being coded as one unit. as shown in Figure 19.6. Since. H.261 does
`not have B-pictures. the concept of a B-pieture comes from the MPEG video-coding standards. In
`a PB-t‘rame, lhe P-picture is predicted from the previous decoded [- or P-pieture and the B-ptclurc
`is bidirectionally predicted both from the previous decoded I- or P-picture and the P-picturc in the
`PB-frame unit. which is currently being decoded.
`Several detailed issues have to be addressed at macroblock level in PB—t'rame mode:
`
`-
`
`the P-macroblock in the PB-unit is
`If a macrobloek in the PB-t'rame is intraeoded,
`intraeoded and the B-macroblock in the PB-unit is intercoded. The motion vector ot’
`
`intereodcd PB-maerobloek is used for the B-maerohlock only.
`0 A macroblock in PB-frarne contains [2 blocks for 4:2:0 Format, siit {four luminance
`blocks and two chrominance blocks) from the P-t‘rame and six from the B-frarne. The
`data for the six P-blocks are transmitted first and then for the six B-blocks.
`
`- Different parts of a B-block in a PB-frarne can be predicted with different modes. For
`pixels where the backward vector points inside of coded P~tnacroblock. bidirectional
`prediction is used. For all other pixels, forward prediction is used.
`
`PB-frame
`
`I
`
`l_—'_—l
`P
`
`B
`
`FIGURE 19.6 Prediction in PB-frames
`
`mode. (From tTU-T Recommendation
`H.263, May 1996. With permission.)
`
`lPR2018—01413
`
`Sony EX1008 Page 464
`
`IPR2018-01413
`Sony EX1008 Page 464
`
`

`

`ITU-T Video Coding Standards H.261 and H.263
`
`439
`
`19.4 H.263 VIDEO CODING STANDARD VERSION 2
`
`19.4.1
`
`OVERVIEW or H.263 VERSION 2
`
`The H.263 version 2 (l'l'U~T. H.398) video-coding standard. also known as H.263+, was approved
`in January 1998 by the lTU—T. H.263 version 2 includes a number of new optional features based
`on the [-1.263 video—coding standard. These new optional features are added to broaden the appli—
`cation range of H.2{i3 and to improve its coding efficiency. The main features are flexible video
`format. scalability, and bacl;ward-compatible supplemental enhancement infortnation.Among these
`new optional I'catures. tive of them are intended to improve the coding efficiency and three of them
`are proposed to address the needs of mobile video and other noisy transmission environments. The
`features of scalability provide the capability of generating layered bitstreams. which are spatial
`scalability. temporal scalability. and signal-to-noise ratio {SNR) scalability similar to those defined
`by the MPEG—2 video—coding standard. There are also other modes oil-1.263 version 2 that provide
`some enhancement functions. We will describe these features in the following section.
`
`19.4.2 New FEATURES or H.263 VERSION 2
`
`The H.263 version 2 includes a number of new leatures. In the following we briefly describe the
`key techniques used for these features.
`
`19.4.2.1
`
`Scalability
`
`The scalability function allows for encoding the video sequences in a hierarchical way that partitions
`the pictures into one basic layer and one or more enhancement layers. The decoders have the option
`of decoding only the base layer bitstrcam to obtain lower-quality reconstructed pictures or further
`decode the enhancement layers to obtain higher—quality decoded pictures. There are three types of
`scalability in H.263: temporal scalability. SNR scalability, and spatial scalability.
`Ternfloral scalability (Figure 19.7) is achieved by using B—ptctures as the cnltancenientdlayqer.
`All three types of scalability are similar to the ones in the MPEG-2 video-coding standalr .b to
`B'Pictures are predicted from either or both a previous and subsequent decoded picture In tie ase
`layer.
`_
`_
`.
`.
`h
`t:
`In SNR scalability (Figure. “3.8). the pictures are first encoded wrth coarse quantization tn’t
`base layer. The differences or coding error pictures between a reconstructed picture and HS ortgtga
`in the base layer encoder are then encoded in the enhancement layer and sent
`to the deeoflcr
`PTDViding an enhancement of SNR. In the enhancement layer there are two‘lYPCS 0f ptctures.
`E3;
`picture in the enhancement layer is only predicted from the base layer.
`it
`ts referrer: to as an" e
`picture. It is a bidirectionally predicted picture if it uses both a Prim“: enhancementlayer digit;
`and a temporally simultaneous base layer reference picture for predlCllon- Now 1113i ”‘3 PFC
`
`Enhancement
`
`[AYEr / \ /
`
`Base
`Layer
`
`________..+
`
`___——-b
`
`FIGURE 19,7 Temporal scalability. (From ITUJT Recommendation H.263. May 1996. With permtssmn.)
`
`|PR2018—01413
`
`Sony EX1008 Page 465
`
`IPR2018-01413
`Sony EX1008 Page 465
`
`

`

`440
`
`Image and Video Compression for Multimedia Engineering
`
`Layer
`
`Enhancement '1'
`Hm “ILaser
`
`1
`
`T
`
`1‘
`
`FIGURE 19.8 SNR scalability. (From lTU-T Recommendation H.263. May I996. With permission.)
`
`Enhancement
`Layer
`
`Base l
`
`Layer
`
`FIGURE 19.9 Spatial scalability. (From lTU-T Recommendation H.263. May 1996. With permission.)
`
`from the reference layer uses no motion vectors. However, EP (enhancement P) pictures use motion
`vectors when predicted from their temporally prior reference picture in the same layer. Also. if
`more than two layers are used. the reference may be the lower layer instead of the base layer.
`In spatial scalability (Figure 19.9), lower-resolution pictures are encoded in the base layer 0f
`lower layer. The differences or error pictures between up-sarnpled decoded base layer Plumes and
`their original picture are encoded in the enhancement layer and sent to the decoder providing “‘3
`spatial enhancement pictures. As in MPEG-2, spatial interpolation filters are used for the spatial
`scalability. There are also two types of pictures in the enhancement layer: EI and EP. If a decoder
`is able to perform spatial scalability. it may also need to be able to use a custom picture format.
`For example, if the base layer is sub-QCIF (128 x 96), the enhancement layer picture would be
`256 x 192, which does not belong to a standard picture format.
`Scalability in H.263 can be perform

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket