`(12) Patent Application Publication (10) Pub. No.: US 2011/0170594 A1
`
` Budagavi et al. (43) Pub. Date: Jul. 14, 2011
`
`
`US 20110170594A1
`
`(54) METHOD AND SYSTEM FOR INTRACODING
`IN VIDEO ENCODING
`
`(76)
`
`Inventors:
`
`Madhukar Budagavi, Plano, TX
`(3:); Minhua Zhou, Plano, TX
`
`.
`(21) Appl. NO"
`
`12/982’482
`
`(22)
`
`Filed:
`
`Dec. 30: 2010
`
`Related US. Application Data
`
`(60) Provisional application No. 61/295,075, filed on Jan.
`14, 2010, provisional application No. 61/295,91 l,
`filed on Jan. 18, 2010.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`(2006.01)
`H04N 7/50
`(52) US. Cl. ............................ 375/240.13; 375/E07.2ll
`(57)
`ABSTRACT
`A method of intra-coding blocks of pixels in a digital Video
`sequence is provided that includes selecting a block trans-
`form of a plurality of block transforms according to a spatial
`prediction mode used in generating a block of pixel residual
`values from a block of pixels, wherein the block transform is
`based on a single directional transform matrix predetermined
`for the spatial prediction mode and is a same size as the block
`of pixel values, applying the block transform to the block of
`pixel residual values to generate transform coefiicients of the
`residual pixel values, and entropy coding the generated trans-
`form coefficients.
`
`200
`
`[
`
`206
`
`MV QP HDR
`
`ENTROPY
`ENCODEFIS
`
` 234
`
`
`_ _ _ _ _ _
`:— COMPRESSED I
`I
`VIDEO
`V236
`I
`I
`L _B'_T‘_ST_REA_M_ J
`
`INPUT FRAMES
`
`
`
`
`
`
`
`
`
`
`__________________________________ _ _ _ _ _|
`
`
`
`INVERSE SCAN
`
`MOTION
`COMPENSATION
`
`STORAGE
`
`
`IN—LOOP FILTER
`
`
`MOTION
`ESTIMATION
`
`220
`
`Page 1 of 21
`
`SAMSUNG EXHIBIT 1005
`
`SAMSUNG EXHIBIT 1005
`
`Page 1 of 21
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 1 of 8
`
`US 2011/0170594 A1
`
`100
`
`102
`
`SOURCE DIGITAL SYSTEM
`
`VIDEO
`CAPTURE
`
`VIDEO
`ENCODER
`
`TRANSM'TTER
`
`104
`
`106
`
`108
`
`110
`
`DESTINATION DIGITAL SYSTEM
`
`VIDEO
`DECODER
`
`RECEIVER
`
`DISPLAY
`
`114
`
`FIG. 1
`
`GENERATE A SPATIALLY PREDICTED PREDICTION
`BLOCK BASED ON A SPATIAL PREDICTION MODE
`
`400
`
`COMPUTE A RESIDUAL PREDICTION BLOCK
`BETWEEN THE PREDICTED PREDICTION BLOCK
`AND THE CURRENT PREDICTION BLOCK
`
`402
`
`SELECT A BLOCK TRANSFORM BASED
`
`ON THE SPATIAL PREDICTION MODE
`
`404
`
`APPLY THE SELECTED BLOCK TRANSFORM
`
`TO THE RESIDUAL PREDICTION BLOCK T0
`
`406
`
`GENERATE TRANSFORM COEFFICIENTS ENTROPY CODE THE GENERATED
`
`TRANSFORM COEFFICIENTS
`
`408
`
`Page 2 0f21
`
`FIG. 4
`
`Page 2 of 21
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 2 0f8
`
`US 2011/0170594 A1
`
`_fi
`
`
`
` _>_<m_m:wI._._m_I“QMNIXOm_n=>____.mwmmmmgwm.__IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
`
`
`
`wmm>n_om.—zm_
`
`mmeOQZm
`
`E:.3>2wowwowEN
`
`oom
`
`Page 3 0f21
`
`mmmm>z_
`
`_>_m_OH_wz<Ezo_.5_>_
`
`zo_._.<mzm_n=>_oo
`
`
`
`
`
`_|mmDoom—QDmoommSE
`
`
`
`.CNNzo_._.<_>__._.mm_NMVNK29.52
`
`Page 3 of 21
`
`
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 3 of8
`
`US 2011/0170594 A1
`
`9500mm
`
`mm=>_<m_n_
`
`
`
`
`n_OO._-z_mmmm>z_Qz<z<ommwmm>z_>n_om._.zm_omooozm_
`
`
`
`ESEeEmez/«E.zo_H<N_Hz<:om_m_oonomEE<mmw._._m_
`
`wa8m3mmom30,
`
`Page 4 0f21
`
`202.05%;EH2-mw/EOHw
`
`
`
`
`
`mao_>_zofioEm—mm._<_H<n_m
`
`
`
`
`
`Io._._>>mmoo_>_ZOPOE
`
`zo_._.<mzm_n__>_oo
`
`<m._.z_\mm_._.z_
`
`m_n_0_>_
`
`m.UN.“
`
`
`
`mmowomSzo_._o_>_
`
`Page 4 of 21
`
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 4 0f 8
`
`US 2011/0170594 A1
`
`REVERSE THE ENTROPY CODING OF
`THE TRANSFORM COEFFICIENTS OF
`AN INTRA PREDICTION BLOCK
`
`
`
`
`APPLY AN INVERSE BLOCK TRANSFORM
`SELECTED BASED ON THE SPATIAL PREDICTION
`MODE USED FOR THE INTRA PREDICTION BLOCK
`TO GENERATE A RESIDUAL PREDICTION BLOCK
`
`GENERATE A SPATIALLY PREDICTED PREDICTION
`BLOCK BASED ON THE SPATIAL PREDICTION MODE
`USED FOR THE ENCODED PREDICTION BLOCK
`
`ADD THE RESIDUAL PREDICTION BLOCK TO
`THE PREDICTED PREDICTION BLOCK TO
`GENERATE THE DECODED PREDICTION BLOCK
`
`FIG. 5
`
`500
`
`502
`
`504
`
`506
`
`Page 5 0f21
`
`Page 5 of 21
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 5 of8
`
`US 2011/0170594 A1
`
`
`
`
`
`
`
`x_E<_>_EmEmz/‘EsEsz<E£925
`
`Q.Uanfig3;-
`
`HEE-
`
`figE-
`
`III
`
`mmd
`
`nmd
`
`who
`
`om.m-
`
`EM-
`
`Sam-
`
`92m-
`
`mmd-
`
`mm.w-
`
`vofi-
`
`awn-
`
`mvk-
`
`5.9
`
`Sh-
`
`Ed-
`
`mmd-
`
`8.0zFSJgogfimeomlomimmwal___5__8§§
`
`5.0z538Nvlmeomlowwxwmmglmcgmaam
`
`
`
`owe:838$135mlowimmwaloamazamm
`
`
`
`one=83aomvlmeomlcfixomm313%;
`
`
`
`woo=83aomvlmeomlofixomm312%;
`
`m3=83aomvlmeoolofiéwm313%;
`
`
`
`mmdrémlaomvlwealowoFxommalsmEEm
`
`
`
`NEzSmlaomvlmeomlowo:633125:85me
`
`
`
`«we:oomlaomvlmeomlowo3831238
`
`
`
`RdSfilaomvlafilomosag518:8:
`
`
`
`Rd.hofilaomvlmevmlgorxowmalgamxé
`
`3993‘
`
`mod
`
`Nfio52
`
`
`
`
`
`wmdriolaomvlweomowwxmqulzmzom
`
`mmd
`
`EEEEo3-mgfomlaomwlmeomloasElmsfismagaem
`
`EHEEEE:SmlaomwlfiomlowwéSalmmmgaegmmm
`EEEEEE_u__oml_qomwlmeoolo§é$012283
`
`fiafimmdmom-EdhoomlaomvlmeomlowmxoSalowmeozmomm
`EEEEEEI
`
`
`
`
`zo_2_>__xomn_n_<x_E<_>_592:2moan.m_>_<zmozaawm
`
`Page 6 0f21
`
`Page 6 of 21
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 6 of8
`
`US 2011/0170594 A1
`
`
`
`
`
`
`
`x_E<_>_EmolmzstEmEmZ/‘Efleza
`
`mmN-
`
`moN-
`
`mwv-
`
`:m-
`
`mmN-
`
`3F-
`
`om@-
`
`mvow-
`
`oo2-
`
`:NTmo@-
`
`mmm-5N-5w-
`
`3.F-
`
`
`
`m.UNK5.x-
`
`90
`
`mmo
`
`to
`
`_.No
`
`:o
`
`Nmo
`
`Nmo
`
`Ewe
`
`omo
`
`_.No
`
`wow
`
`to
`
`mmo
`
`woe
`
`3.0
`
`mvv-
`
`mm@-
`
`Nvm-
`
`owm-
`
`mN#-
`
`mmN7mm:-
`
`omN_.-
`
`momTmmN-
`
`5n-
`
`owAN-
`
`Em-
`
`mmd
`
`ovd
`
`mNd
`
`mmd
`
`Fwd
`
`mod
`
`and
`
`Ed
`
`vmd
`
`wmd
`
`omd
`
`moé
`
`NNd
`
`mw.v-
`
`fid-
`
`nob-
`
`mw.m-
`
`mN.¢-
`
`mm._.T
`
`5N7
`
`mmN_.-
`
`oN.m-
`
`vaT
`
`omN-
`
`ofimN-
`
`mod-
`
`
`
`
`
`moo_,,oom-_SN¢-weom-o§éSQ-omaazgmm
`
`93t53SEN-aam-omvxmme-_asam
`
`one:53QONv-weom-omvxmmgq=5_59¢wa
`
`m3:538Nw-meom-owvxmme-gcowaé
`
`5.05848fi-meom-owimmwQ-ogeoI8mm
`
`
`
`=83QONv-meom-cfixomm3-25.;
`
`
`
`N3=83SNv-meg-ofixowmFades;
`
`
`
`No.0=83aomv-meoo-ofixowm341%;
`
`
`
`86_:8-_8$-8.,8-owosagE-Sséom
`
`
`
`woo:Sm-_Qo$-meom-owo8%3-25_Bang
`
`
`
`N3_.,oom-_SN¢-aam-omo5%3-258
`
`
`
`8.r_§N-_8N¢-m$N-8otag3-Egg;
`
`
`
`m2cox-acmv-azm-go5%3-88me
`
`
`zo:<_>__xomn_n_<x_E<_>_Ea:E<moi;mszz5230mm
`
`
`owm-mmomS-28FE-:3Eon-.momw-fism-owwéEQ-mmmgaegmwm
`
`2oav-5.0m3-moofem-EONv-meB-ovwéSQ-mmsgmasagm
`
`
`
`Page 7 0f21
`
`Page 7 of 21
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 7 of8
`
`US 2011/0170594 A1
`
`>mo_>_m__>_
`
`>mo_>_m__>_
`
`
`
`._<zmm_bm_._<zmm_bm_mo<n_mm_._z_
`
`
`gmoo<>-9589>
`>238j:55
`
`
`
`mom<o<_n_m__>_
`
`_>_om\_._m<._n_
`
`
`
`
`
`mm:._<mm__._n__mm_n_omEwas):
`
`
`
`:m:wmo<n_m_m_._.z_mommmoommmommmoommoo
`
`
`
`
`
`EmIo<od
`
`@530
`
`
`
`335530:3:
`
`
`
`moEmEZmmm
`
`%.mvhm
`
`EOE:
`
`EaEsao
`
`._<zw_mwas):
`
`moEoBoo
`
`
`
`$8808;83£382
`
`mlOmn=>
`
`Page 8 0f21
`
`Page 8 of 21
`
`
`
`
`
`
`Patent Application Publication
`
`Jul. 14, 2011 Sheet 8 of8
`
`US 2011/0170594 A1
`
`com
`
`L7
`
`<2zm:.z<
`
`Q<n_>m_x
`
`
`
`Qz<mm>>0n_om_n=>
`
`n__>_<
`
`58$:
`
`Page 9 0f21
`
`Ommmfiw
`
` mzozmomos1:;Q.UNK
`mom“E“Eoo._<z<99E_>_n_
`E5232So89>E82:85
`
`ENE;E5228A\EémmniEssa:85E;
`ann.oS5NEan.n./nan.elmgAvn..un.,—-pm5
`
`
`EmmaEjmfizoo£8533
`
`szxoé..<zzEz<
`
`
`
`
`
`Io._._>>mmm>_m_owz<r_._.Qz<mmw<mmegqm/w$22.
`
`9:EEE<E_>_<omzozmofiz
`
`
`flmomqwmpm255mimé_>__mfin":
`mo_>_o\n_oouOmmmfim
`E35281I
`momEUV
`IQDOHo5E/
`Emmi:ozo_>_moEmmS.Ewmfiz
`
`
`vmonmmm
`
`VEVvvwmmxfimw
`
`N5
`
`mm
`
`Ommmzm
`
`mmm
`
`L7
`
`NIOM
`
`omm
`
`mmmm
`
`Page 9 of 21
`
`
`
`US 2011/0170594 A1
`
`Jul. 14, 2011
`
`METHOD AND SYSTEM FOR INTRACODING
`IN VIDEO ENCODING
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application claims benefit of US. Provisional
`Patent Application Ser. No. 61/295,075, filed Jan. 14, 2010,
`and of US. Provisional Patent Application Ser. No. 61/295,
`91 1, filed Jan. 18, 2010, both ofwhich are incorporated herein
`by reference in their entirety.
`
`BACKGROUND OF THE INVENTION
`
`[0002] The demand for digital Video products continues to
`increase. Some examples of applications for digital Video
`include Video communication, security and surveillance,
`industrial automation, and entertainment (e.g., DV, HDTV,
`satellite TV, set-top boxes, Internet Video streaming, Video
`gaming devices, digital cameras, cellular telephones, Video
`jukeboxes, high-end displays and personal Video recorders).
`Further, Video applications are becoming increasingly mobile
`as a result ofhigher computation power in handsets, advances
`in battery technology, and high-speed wireless connectivity.
`[0003] Video compression, i.e., Video coding, is an essen-
`tial enabler for digital Video products as it enables the storage
`and transmission of digital video. In general, Video compres-
`sion techniques apply prediction, transformation, quantiza-
`tion, and entropy coding to sequential blocks of pixels, i.e.,
`coding blocks, in a Video sequence to compress, i.e., encode,
`the Video sequence. A coding block is a subset of a frame or
`a portion ofa frame, e.g., a slice or a block of64><64 pixels, in
`a Video sequence and a coding block and a frame may be
`inter-coded or intra-coded. For encoding, a coding block may
`be divided into prediction blocks, e.g., 4x4, or 8x8 or 16x16
`blocks of pixels. Prediction blocks may be inter-coded or
`intra-coded as well. In an intra-coded coding block, all pre-
`diction blocks are intra-coded. In an inter-coded coding
`block, the prediction blocks may be either intra-coded or
`inter-coded.
`
`For intra-coded predictionblocks, spatial prediction
`[0004]
`is performed using different spatial prediction modes that
`specify the direction, e.g., horizontal, vertical, diagonal, etc.,
`in which pixels are predicted. For example, the H.264/AVC
`Video coding standard provides nine 4><4 spatial prediction
`modes, nine 8x8 spatial prediction modes, and four 16x16
`spatial prediction modes for spatial prediction in the lumi-
`nance space, and four 8x8 prediction modes in the chromi-
`nance space. Future standards may provide more spatial pre-
`diction modes and/or larger sizes of prediction blocks. In
`general, spatial prediction predicts a current prediction block,
`i.e., an actual prediction block in a coding block of a frame,
`based on surrounding pixels in the same frame using each of
`the spatial prediction modes, and selects for output the pre-
`dicted prediction block and prediction mode that yields a
`predicted prediction block mo st closely resembling the pixels
`in the current prediction block. The predicted prediction
`block is then subtracted from the current prediction block to
`compute a residual prediction block, and transform coding is
`applied to the residual prediction block to reduce redundancy.
`[0005]
`Prediction mode dependent directional transforms
`may be used in transform coding of spatially predicted i.e.,
`intra-coded, prediction blocks. In one technique for using
`prediction mode dependent transforms, referred to as Mode-
`Dependent Directional Transform (MDDT), a set of prede-
`
`, n—l, is
`.
`.
`termined, trained transform matrices (B1, A1), i:0, .
`provided, one for each of n spatial prediction modes. The
`transform coding selects which of the transform matrices to
`use based on the spatial prediction mode selected by the
`spatial prediction. More specifically, if a residual prediction
`block X results from using prediction mode i, the transformed
`version of X, i.e., the 2D transform coefficients of X, is given
`by: Y:BZ.XAZ.T where B. and Al. are column and row trans-
`forms. In H.264, Bi::M, where M is a Discrete Cosine Trans-
`form (DCT) transform matrix. Further, a form of a Karhunen-
`Loeve Transform (KLT) is used to determine B. and Ai. More
`specifically, singular value decomposition (SVD) is per-
`formed on cross-correlated residual blocks of each prediction
`mode i collected from training Video sequences to determine
`B. and Ai.
`[0006]
`To use MDDT, two transform matrices must be
`stored for each spatial prediction mode. For example, if there
`are twenty-two spatial prediction modes as in H.264/AVC,
`forty-four transform matrices are required. Further, using
`transform matrices as generated for MDDT is computation-
`ally complex, especially as compared to the more commonly
`used DCT, since it may require a full matrix multiply. That is,
`transform coding of an N><N block may require 2><N><N><N
`multiplications and 2><N><N><(N—1) additions. Thus, using
`these transform matrices may not be well suited for encoding
`on resource limited devices. Additional information regard-
`ing MDDT may be found in the following documents pub-
`lished by the ITU-Telecommunications Standardization Sec-
`tor of the Video Coding Experts Group (VCEG): VCEG-
`AGl 1, VCEG-AM20, and VCEG-AF15, and in JCTVC-
`B024 published by the Joint Collaborative Team on Video
`Coding (JVT—VC) of ITU-T SG16 WP3 and ISO/IEC JTCl/
`SC29/WG1 1.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Particular embodiments in accordance with the
`[0007]
`invention will now be described, by way ofexample only, and
`with reference to the accompanying drawings:
`[0008]
`FIG. 1 shows a block diagram of a digital system in
`accordance with one or more embodiments of the invention;
`[0009]
`FIG. 2 shows a block diagram of a Video encoder in
`accordance with one or more embodiments of the invention;
`[0010]
`FIG. 3 shows a block diagram of a Video decoder in
`accordance with one or more embodiments of the invention;
`[0011]
`FIGS. 4 and 5 show flow diagrams of methods in
`accordance with one or more embodiments of the invention;
`[0012]
`FIGS. 6 and 7 show graphs in accordance with one
`or more embodiments of the invention; and
`[0013]
`FIGS. 8-10 show illustrative digital systems in
`accordance with one or more embodiments of the invention.
`
`DETAILED DESCRIPTION OF EMBODIMENTS
`OF THE INVENTION
`
`Specific embodiments of the invention will now be
`[0014]
`described in detail with reference to the accompanying fig-
`ures. Like elements in the various figures are denoted by like
`reference numerals for consistency.
`[0015] Certain terms are used throughout the following
`description and the claims to refer to particular system com-
`ponents. As one skilled in the art will appreciate, components
`in digital systems may be referred to by different names
`and/or may be combined in ways not shown herein without
`departing from the described functionality. This document
`
`Page 10 of21
`
`Page 10 of 21
`
`
`
`US 2011/0170594 A1
`
`Jul. 14, 2011
`
`does not intend to distinguishbetween components that differ
`in name but not function. In the following discussion and in
`the claims, the terms “including” and “comprising” are used
`in an open-ended fashion, and thus should be interpreted to
`mean “including, but not limited to .
`.
`. ” Also, the term
`“couple” and derivatives thereof are intended to mean an
`indirect, direct, optical, and/or wireless electrical connection.
`Thus, if a first device couples to a second device, that con-
`nection may be through a direct electrical connection,
`through an indirect electrical connection Via other devices
`and connections, through an optical electrical connection,
`and/or through a wireless electrical connection.
`[0016]
`In the following detailed description of embodi-
`ments of the invention, numerous specific details are set forth
`in order to provide a more thorough understanding of the
`invention. However, it will be apparent to one ofordinary skill
`in the art that the invention may be practiced without these
`specific details. In other instances, well-known features have
`not been described in detail to avoid unnecessarily compli-
`cating the description. In addition, although method steps
`may be presented and described herein in a sequential fash-
`ion, one or more of the steps shown and described may be
`omitted, repeated, performed concurrently, and/or performed
`in a different order than the order shown in the figures and/or
`described herein. Accordingly, embodiments ofthe invention
`should not be considered limited to the specific ordering of
`steps shown in the figures and/or described herein. Further,
`although reference may be made to the H.264/AVC video
`coding standard herein for purpose of illustration, embodi-
`ments ofthe invention should not be considered limited to any
`particular video coding standard.
`[0017]
`In general, embodiments of the invention provide
`for prediction mode dependent directional transform coding
`based on a single directional transform matrix instead of the
`two transform matrices required in the prior art. The single
`directional transform matrix for a prediction mode is deter-
`mined using a form of KLT is which eigenvalue decomposi-
`tion (EVD), also referred to as eigen-decomposition or spec-
`tral decomposition, is performed on auto-correlated residual
`blocks of each prediction mode collected from training video
`sequences. In some embodiments of the invention, the prede-
`termined trained single directional transform matrices for the
`prediction modes are used in encoding, requiring the storage
`of one matrix per prediction mode. Note that the prior art
`MDDT requires the storage of two matrices. Further, as is
`explained in more detail herein, the use of the single direc-
`tional transform matrices reduces the size of a hardware
`
`implementation as compared to the prior art use of two trans-
`form matrices. Moreover, evaluations of the use of the single
`directional transform matrices in encoding as compared to the
`use of the two transform matrices have shown similar coding
`efiiciency.
`[0018]
`In some embodiments of the invention, rather than
`using the single directional transform matrices, each direc-
`tional transform matrix is approximated as a product of a
`DCT matrix, e. g., a DCT matrix defined in an encoding stan-
`dard for use in intra-coding, and a sparse matrix derived from
`the directional transform matrix. The same DCT matrix is
`
`used for all prediction modes of the same block size. For
`example, for 4x4 prediction modes, a 4x4 DCT matrix is used
`and for 8x8 prediction modes, an 8x8 DCT matrix is used.
`Thus, in some such embodiments, the storage requirements
`are the DCT matrices for each prediction block size and
`representations of the sparse matrices for the prediction
`
`modes. The representations of the sparse matrices may be the
`entire matrices or non-zero elements of the sparse matrices.
`For example, if the sparse matrices are diagonal matrices, the
`diagonal elements of the matrices may be stored rather than
`the entire matrices. As is explained in more detail herein, the
`use of the approximations may further reduce the size of a
`hardware implementation. Moreover, evaluations of the use
`of the approximations of the single directional transform
`matrices in encoding as compared to the use of the single
`directional transform matrices have shown small coding effi-
`ciency losses, which may be acceptable in view ofthe reduced
`computational complexity.
`[0019]
`FIG. 1 shows a block diagram of a digital system in
`accordance with one or more embodiments of the invention.
`
`The system includes a source digital system (100) that trans-
`mits encoded video sequences to a destination digital system
`(102) via a communication channel (116). The source digital
`system (100) includes a video capture component (104), a
`video encoder component (106) and a transmitter component
`(108). The video capture component (104) is configured to
`provide a video sequence to be encoded by the video encoder
`component (106). The video capture component (104) may
`be for example, a video camera, a video archive, or a video
`feed from a video content provider. In some embodiments of
`the invention, the video capture component (104) may gen-
`erate computer graphics as the video sequence, or a combi-
`nation of live video and computer-generated video.
`[0020] The video encoder component (106) receives a
`video sequence from the video capture component (104) and
`encodes it for transmission by the transmitter component
`(1108).
`In general,
`the video encoder component (106)
`receives the video sequence from the video capture compo-
`nent (104) as a sequence of frames, divides the frames into
`coding blocks which may be a whole frame or a part of a
`frame, divides the coding blocks into prediction blocks, and
`encodes the video data in the coding blocks based on the
`prediction blocks. During the encoding process, a method for
`prediction mode dependent directional transform coding in
`accordance with one or more of the embodiments described
`
`herein may be performed. The functionality of embodiments
`of the video encoder component (106) is described in more
`detail below in reference to FIG. 2.
`
`(108) transmits the
`[0021] The transmitter component
`encoded video data to the destination digital system (102) via
`the communication channel (116). The communication chan-
`nel (116) may be any communication medium, or combina-
`tion of communication media suitable for transmission ofthe
`
`encoded video sequence, such as, for example, wired or wire-
`less communication media, a local area network, or a wide
`area network.
`
`[0022] The destination digital system (102) includes a
`receiver component (110), a video decoder component (112)
`and a display component (114). The receiver component
`(110) receives the encoded video data from the source digital
`system (100) via the communication channel (116) and pro-
`vides the encoded video data to the video decoder component
`(112) for decoding. In general, the video decoder component
`(112) reverses the encoding process performed by the video
`encoder component (106) to reconstruct the coding blocks of
`the video sequence. During the decoding process, a method
`for prediction mode dependent directional transform decod-
`ing in accordance with one or more of the embodiments
`described herein may be performed. The functionality of
`
`Page 11 of21
`
`Page 11 of 21
`
`
`
`US 2011/0170594 A1
`
`Jul. 14, 2011
`
`embodiments of the Video decoder component (112) is
`described in more detail below in reference to FIG. 3.
`
`[0023] The reconstructed Video sequence may then be dis-
`played on the display component (114). The display compo-
`nent (114) may be any suitable display deVice such as, for
`example, a plasma display, a liquid crystal display (LCD), a
`light emitting diode (LED) display, etc.
`[0024]
`In some embodiments of the invention, the source
`digital system (100) may also include a receiver component
`and a Video decoder component and/or the destination digital
`system (102) may include a transmitter component and a
`Video encoder component
`for
`transmission of Video
`sequences both directions for Video steaming, Video broad-
`casting, and Video telephony. Further, the Video encoder com-
`ponent (106) and the Video decoder component (112) may
`perform encoding and decoding in accordance with one or
`more Video compression standards such as, for example, the
`MOVing Picture Experts Group (MPEG) Video compression
`standards, e.g., MPEG- 1, MPEG-2, and MPEG-4, the ITU-T
`Video compressions standards, e.g., H.263 and H.264, the
`Society of Motion Picture and TeleVision Engineers
`(SMPTE) 421 M Video CODEC standard (commonly
`referred to as “VC-l”),
`the Video compression standard
`defined by the Audio Video Coding Standard Workgroup of
`China (commonly referred to as “AVS”), ITU-T/ISO High
`Efficiency Video Coding (HEVC) standard, etc. The Video
`encoder component (106) and the Video decoder component
`(112) may be implemented in any suitable combination of
`software, firmware, and hardware, such as, for example, one
`or more digital signal processors (DSPs), microprocessors,
`discrete
`logic,
`application specific
`integrated circuits
`(ASICs), field-programmable gate arrays (FPGAs), etc.
`[0025]
`FIG. 2 shows a block diagram of a Video encoder,
`e.g., the Video encoder (114) of FIG. 1, in accordance with
`one or more embodiments of the inVention. In the Video
`
`input frames (200) for encoding are
`encoder of FIG. 2,
`diVided into coding blocks and the coding blocks are pr0Vided
`as one input of a motion estimation component (220), as one
`input ofan intra prediction component (224), and to a positiVe
`input of a combiner (202) (e.g., adder or subtractor or the
`like). Further, although not specifically shown, a prediction
`mode, i.e., inter-prediction or intra-prediction, for each input
`frame is selected and pr0Vided to a mode selector component
`and the entropy encoders (234).
`[0026] The storage component (218) pr0Vides reference
`data to the motion estimation component (220) and to the
`motion compensation component (222). The reference data
`may include one or more preViously encoded and decoded
`coding blocks, i.e., reconstructed coding blocks.
`[0027] The motion estimation component (220) pr0Vides
`motion estimation information to the motion compensation
`component (222) and the entropy encoders (234). More spe-
`cifically, the motion estimation component (220) performs
`tests on coding blocks based on multiple temporal prediction
`modes using reference data from storage (218) to choose the
`best motion Vector(s)/prediction mode based on a coding
`cost. To test the prediction modes, the motion estimation
`component (220) may diVide a coding block into prediction
`blocks according to the block size of a prediction mode. The
`motion estimation component (220) pr0Vides the selected
`motion Vector (MV) or Vectors and the selected prediction
`mode to the motion compensation component (222) and the
`selected motion Vector (MV) to the entropy encoders (234).
`The motion compensation component (222) pr0Vides motion
`
`compensated inter prediction information to a selector switch
`(226) that includes motion compensated inter prediction
`blocks and the selected temporal prediction modes. The cod-
`ing cost of the inter prediction blocks are also pr0Vided to the
`mode selector component.
`[0028] The intra prediction component (224) pr0Vides intra
`prediction information to the selector switch (226) that
`includes intra prediction blocks and the corresponding spatial
`prediction modes. That is, the intra prediction component
`(224) performs spatial prediction in which tests based on
`multiple spatial prediction modes are performed on the cod-
`ing block using preViously encoded neighboring blocks ofthe
`frame from the buffer (228) to choose the best spatial predic-
`tion mode for generating an intra prediction block based on a
`coding cost. To test the spatial prediction modes, the intra
`prediction component (224) may diVide a coding block into
`prediction blocks according to the block size of a prediction
`mode. Although not specifically shown, the spatial prediction
`mode of each intra prediction block pr0Vided to the selector
`switch (226) is also pr0Vided to the transform component
`(204). Further, the coding cost of the intra prediction blocks
`are also pr0Vided to the mode selector component.
`[0029] The selector switch (226) selects between the
`motion-compensated inter prediction blocks from the motion
`compensation component (222) and the intra prediction
`blocks from the intra prediction component (224) based on
`the difference metrics of the blocks and a frame prediction
`mode pr0Vided by the mode selector component. The output
`of the selector switch (226), i.e., the predicted prediction
`block, is pr0Vided to a negatiVe input of the combiner (202)
`and to a delay component (230). The output of the delay
`component (230) is pr0Vided to another combiner (i.e., an
`adder) (238). The combiner (202) subtracts the predicted
`prediction block from the current prediction block of the
`current coding block to pr0Vide a residual prediction block to
`the transform component (204). The resulting residual pre-
`diction block is a set of pixel difference Values that quantify
`differences between pixel Values of the original prediction
`block and the predicted prediction block.
`[0030] The transform component (204) performs a block
`transform on the residual prediction blocks to conVert the
`residual pixel Values to transform coefficients and outputs the
`transform coefiicients. Further,
`the transform component
`(204) applies block transforms to intra-coded residual predic-
`tion blocks based on the spatial prediction mode used. That is,
`the block transform used for an intra-coded residual predic-
`tion block is selected based on the spatial prediction mode
`pr0Vided by the intra prediction component (224). In some
`embodiments of the inVention,
`the transform component
`(204) may select the block transform to be applied from
`among a number of different types of block transforms, such
`as, for example, DCTs, integer transforms, anelet trans-
`forms, directional transforms, or combinations thereof based
`on the spatial prediction mode. In some embodiments of the
`inVention, for one or more ofthe spatial prediction modes, the
`block transform selected may be based on predetermined
`single directional transform matrices trained for each of the
`spatial prediction modes. For example, if the spatial predic-
`tion mode is one with limited directionality, the block trans-
`form selected may be a DCT, and if the spatial prediction
`mode is one with significant directionality, the block trans-
`form selected may be a directional transform. The directional
`transforms for the spatial prediction modes with significant
`directionality may be based on predetermined single direc-
`
`Page 12 of21
`
`Page 12 of 21
`
`
`
`US 2011/0170594 A1
`
`Jul. 14, 2011
`
`matrix for prediction mode i. Note that, at mo st, half the
`number of matrices needs to be stored as compared to the
`prior art MDDT. Further, in a hardware implementation, half
`of the area of the prior art MDDT may be needed as the same
`matrix multiplication logic can be used for both matrix mul-
`tiplies in the transform.
`[0034]
`In some embodiments of the invention, the trans-
`form component (204) performs the block transforms using
`approximations ofthe predetermined single directional trans-
`form matrices. That is, each single directional transform
`matrix Sl. is approximated as DC where C is a DCT matrix,
`and D. is a sparse matrix derived from S. The DCT matrix
`may be, for example, a DCT matrix specified by a coding
`standard for a particular residual prediction block size. Thus,
`the transform component (204) computes the transformed
`versionY of a residual prediction block X as Y:DZ.CXCTDZ.T.
`The sparse matrices DZ. may be any suitable sparse matrices,
`such as, for example, diagonal matrices or rotational matri-
`ces. For example, each sparse matrix D. derived from S, may
`be a diagonal matrix with diagonal elements having values of
`+1 or —1. One of ordinary skill in the art will understand the
`derivation of appropriate sparse matrices.
`[0035] One possible derivation ofa diagonal matrix D. from
`SI. is now explained by way of example. Note that Si:Dl. C,
`thus Di:Sl.* inv(C). Assume that
`
`0.4253 —0.7310 —0.4549
`
`0.2791
`
`tional transform matrices trained for each of these spatial
`prediction modes. In some embodiments of the invention,
`directional transforms based on predetermined single direc-
`tional transform matrices may be provided for all spatial
`prediction modes regardless of the amount of directionality.
`[0031] Each of the predetermined directional transform
`matrices may be trained, i.e., empirically determined, using a
`set oftraining Video sequences. For example, for a prediction
`mode i, prediction residuals for the mode i are determined
`from the training Video sequences, are auto-correlated, and
`eigenvalue decomposition is performed on the result of the
`auto -correlation to generate the directional transform matrix.
`More specifically, the rows or columns of the prediction
`residuals for the prediction mode i are assembled in a matrix,
`an auto-correlation matrix of this matrix is computed, and
`eigenvalue decomposition is performed on the auto-correla-
`tion matrix to determine a row or column directional trans-
`
`form matrix. Only one ofthe row or column directional trans-
`form matrices is required to be computed as the row and
`column directional transform matrices computed in this way
`are transposes of the other. The resulting direction transform
`matrix or its transform may be the single directional trans-
`form matrix upon which the block transform for the predic-
`tion mode i is based. Note that the single directional transform
`matrix has the same size as the block size of the prediction
`mode i. That is, if prediction mode i is for a 4x4 block, the
`single directional transform matrix is 4x4.
`[0032]
`In some embodiments ofthe invention, ifthe trained
`directional transform matrices for two or more spatial predic-
`tion modes are sufficiently similar, rather than using a sepa-
`rate directional transform matrix for each of the spatial pre-
`diction modes, the training process may include generating
`one directional transform matrix to provide the basis for the
`block transform of all of the two or more spatial prediction
`modes. For example, tests may be executed using each of the
`two or more trained directional transform matrices for trans-
`
`forming blocks in each of the two or more prediction modes,
`and the trained directional transform matrix showing the best
`results may be selected as the one directional transform
`matrix for the two or more spatial modes. In another example,
`the one directional transform matrix may be derived from the
`two or more trained directional
`transform matrices. For
`
`example, the entries in the two or more trained directional
`transform entries may be averaged to produce corresponding
`entries in the one directional transform matrix.
`
`In some embodiments of the invention, the trans-
`[0033]
`form component (204) stores predetermined single direc-
`ti