`
`(12) United States Patent
`Kuriakin et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,298,782 B2
`Nov. 20, 2007
`
`(54) METHOD AND APPARATUS FOR
`IMPROVED MEMORY MANAGEMENT OF
`VIDEO IMAGES
`(75) Inventors: Valery Kuriakin, Nizhny Novgorod
`(RU); Alexander Knyazev, Nizhny
`Novgorod (RU); Roman Belenov,
`Nizhny Novgorod (RU); Yen-Kuang
`Chen, Franklin Park, NJ (US)
`
`(73) Assignee: last Corporation, Santa Clara, CA
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 893 days.
`
`(21) Appl. No.: 10/355,704
`(22) Filed:
`Jan. 31, 2003
`O
`O
`Prior Publication Data
`US 2003/0151610 A1
`Aug. 14, 2003
`
`(65)
`
`Related U.S. Application Data
`(62) Division of application No. 09/607,825, filed on Jun.
`30, 2000, now Pat. No. 6,961,063.
`(51) Int. Cl
`we
`(2006.01)
`H04N 7/2
`375/240.26
`52) U.S. C
`irr irrir.
`(52)
`(58) Field of Classification Search ................ 348/420,
`348/413, 402,421, 416, 400, 699, 405.1,
`348/419.1,424, 441, 425, 453, 449; 382/107,
`382/235, 234, 238,244; 386/109, 111; 375/240.03,
`375/240.21, 240,26, 240.25
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`5/1999 Nadehara
`5,907,500 A
`6/2000 Yamada et al.
`6,078,690 A
`3/2001 Herrera
`6,208,350 B1
`7/2001 Chen et al.
`6,259,741 B1
`6,326,984 B1* 12/2001 Chow et al. ................ 715,764
`
`OTHER PUBLICATIONS
`Bilas, Angelos et al., Real Time Parallel MPEG2 Decoding in
`Sofiware, Princeton Univ., Depts. of Comp. Sci and Elect. Eng., pp.
`1-14, IPPS Apr. 1997.
`Coelho, R. and Hawash, M., DirectX, RDX, RSX and MMX Tech
`nology, Chapter 22.10 Speed Up Graphics Writes with Write
`Combining. Addison-Wesley; Reading, MA., 1998, pp. 369-371.
`(Continued)
`Primary Examiner Tung Vo
`Assistant Examiner Behrooz Senfi
`(74) Attorney, Agent, or Firm—Blakely, Sokoloff, Taylor &
`Zafman LLP
`
`(57)
`
`ABSTRACT
`
`A novel storage format enabling a method for improved
`memory management of video images is described. The
`method includes receiving an image consisting of a plurali
`9.
`g
`g of a plurality
`of color components. Once received, the plurality of color
`components is converted to a mixed format of planar format
`and packed format. The mixed packet format is implemented
`by storing one or more of the plurality of color components
`in a planar format and storing one or more of the plurality
`of color components in a packed format. A method for
`writing out video images is also described utilizing a write
`combining (WC) fame buffer. The decoding method motion
`compensates groups of macroblocks in order to eliminate
`partial writes from the WC frame buffer.
`
`5,561,780 A * 10/1996 Glew et al. ................. T11 126
`
`6 Claims, 15 Drawing Sheets
`
`WIDEODECODER240
`
`B.SREAM
`
`
`
`MOTON
`COMPENSATION
`BLOCK
`
`
`
`CONVERSION
`BLOCK
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 1
`
`
`
`US 7,298.782 B2
`Page 2
`
`OTHER PUBLICATIONS
`Bilas, Angelos et al., Real Time Parallel MPEG 2 Decoding in
`Software, Princeton Uiversity, Departments of Computer Science
`and Electrical Engineering, pp. 1-14.
`
`Chapter 22.10 Speed Up Graphics Writes with Write Combining, pp.
`369-371.
`
`* cited by examiner
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 2
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 1 of 15
`
`US 7,298,782 B2
`
`100
`
`yuyZ Format
`
`F.G. 1A
`
`
`
`F.G. 1B
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 3
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 2 of 15
`
`US 7,298,782 B2
`
`120
`
`122
`
`124
`
`LUMA (Y)
`352 X 240
`
`Chroma (u)
`176 x 120
`
`Chroma (v)
`176 x 120
`
`FIG. 2A
`
`
`
`130
`
`132
`
`134
`
`LUMA (Y)
`352 X 240
`
`Chroma (u)
`176 X 240
`
`Chroma (v)
`176 X 240
`
`FIG. 2B
`
`
`
`140
`
`142
`
`144
`
`LUMA (Y)
`352 X 240
`
`CHROMA (u)
`352 x 240
`
`CHROMA (v)
`352 X 240
`
`FIG. 2C
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 4
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 5
`
`
`
`U.S. Patent
`
`US 7,298,782 B2
`
`
`
`
`
`
`
`EONERHEBB}}
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 6
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 5 Of 15
`
`US 7,298,782 B2
`
`070€
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 7
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 6 of 15
`
`US 7.298,782 B2
`
`MOTON COMPENSATION
`
`240
`
`PREVIOUS FRAME
`
`BT-STREAM
`242
`
`
`
`
`
`-
`
`
`
`
`
`
`
`X
`"box 140
`v-320
`8X8 in
`W. 166X,140
`
`N
`
`N
`
`N
`
`y
`--- a------
`y
`Y
`v 330i
`
`312
`
`w
`
`MOTON
`VECTOR
`Nuv
`V
`
`/
`
`o
`
`V-340
`
`M
`N.
`
`FIG. 6A
`
`.
`
`FIG. 6B
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 8
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 7 of 15
`
`US 7,298,782 B2
`
`400
`
`2.
`
`
`
`
`
`
`
`
`
`N
`S.
`Y
`Ll
`S
`2
`
`UL
`
`UNCOMPRESSED
`
`WDEO
`
`UNCOMPRESSED
`
`WDEO
`
`MOTION ESTIMATION
`
`HUFFMAN DECODNG
`
`
`
`
`
`DCT
`
`OUANTIZATION
`
`
`
`RUNLENGTHENCODING
`
`RUN LENGTH DECODNG
`
`
`
`JNVERSE QUANTIZATION
`
`INVERSE DCT
`
`S.
`r
`Y
`l
`O
`S2
`
`C
`
`
`
`
`
`
`
`HUFFMAN ENCODING
`
`MOTION COMPENSATION
`
`COMPRESSED
`VIDEO
`
`UNCOMPRESSED
`VIDEO
`
`FIG. 7A
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 9
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 8 of 15
`
`US 7,298,782 B2
`
`
`
`
`
`
`
`N NOILOW §
`
`N § NOIIWWIISE§
`
`:
`
`
`
`
`
`09?7 èJEGJOONE ORGIA
`
`
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 10
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 11
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 10 of 15
`
`US 7,298,782 B2
`
`520 5
`
`522A
`
`WLD
`
`524A
`DCT
`
`526A
`
`M.C.
`
`522B
`WLD
`524B
`DCT
`526B
`
`M.C
`
`522C
`WLD
`524C
`DCT
`526C
`M.C.
`
`522D
`WLD
`524D
`DCT
`526D
`M.C.
`
`F.G. 9
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 12
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 11 of 15
`
`US 7,298,782 B2
`
`
`
`Z19
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 13
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 12 of 15
`
`US 7,298,782 B2
`
`Z19
`
`&
`
`† 19
`
`
`
`
`
`
`
`009
`
`CINE
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 14
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 13 of 15
`
`US 7,298,782 B2
`
`079
`
`Z
`
`
`
`
`
`CINE
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 15
`
`
`
`U.S. Patent
`
`Nov. 20, 2007
`
`Sheet 14 of 15
`
`US 7,298,782 B2
`
`708 5
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`RECEIVE A PORTION OF AN
`ENCODED BIT STREAM
`REPRESENTING AN
`ENCODED BLOCK
`
`VARABLE LENGTH DECODE
`THE ENCODED BLOCK TO
`GENERATEA OUANTIZED
`BLOCK
`
`PERFORMIOON THE
`OUANTIZED BLOCK TO
`GENERATE AFREOUENCY
`SPECTRUM
`
`
`
`PERFORM DCT ON THE
`OUANTIZED BLOCK USING
`THE FREQUENCY SPECTRUM
`TO GENERATE A DECODED
`BLOCK
`
`
`
`
`
`PLURALITY
`OF MACROBLOCKS
`DECODED2
`
`MOTION COMPENSATE THE
`PLURALITY OF MACROBLOCKS
`TO GENERATE A PLURALITY
`OF MC MACROBLOCKS
`
`
`
`ADDITIONAL
`ENCODED BLOCKS2
`
`FIG. 16
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 16
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 17
`
`
`
`US 7,298,782 B2
`
`1.
`METHOD AND APPARATUS FOR
`IMPROVED MEMORY MANAGEMENT OF
`VIDEO IMAGES
`
`The application is a divisional of U.S. patent application,
`Ser. No. 09/607,825, filed Jun. 30, 2000 now U.S. Pat. No.
`6,961,063.
`
`FIELD OF THE INVENTION
`
`The present invention relates to video images, and, in
`particular, to a novel storage format for enabling improved
`memory management of video images.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`25
`
`30
`
`40
`
`45
`
`In accordance with the NTSC (National Television Stan
`dards Committee) and PAL (Phase Alternating Line) stan
`dard, video images are presented in the YUV color space.
`The Y signal represents a luminance value while the U and
`V signals represent color difference or chrominance values.
`YUV video image data may be transmitted in packed format
`or planar format. In packed format, all the data for a given
`set of pixels of the video image is transmitted before any
`data for another set of pixels is transmitted. As a result, in
`packed format, YUV data is interleaved in the transmitted
`pixel data stream 100, as depicted in FIG. 1 (YUY2 packed
`format). In planar format, Y. U and V data values are stored
`into separate Y. U and V memory areas (planes) in System
`memory 110, as depicted in FIG. 1B.
`FIGS. 2A, 2B and 2C are diagrams illustrating three
`different formats for representing video images in the YUV
`color space. A video image frame may consist of three
`rectangular matrices representing the luminance Y and the
`two-chrominance values U and V. Y matrices 120, 130 and
`35
`140 have an even number of rows and columns. In YUV
`4:2:0 color space format, chrominance component matrices
`122 and 124 may be one half in size of Y matrix 120 in
`horizontal and vertical directions as depicted in FIG. 2A. In
`YUV 4:2:2 format, chrominance component matrices 132
`and 134 may be one half in size of Y matrix 130 in the
`horizontal direction and the same size in the vertical direc
`tion as depicted in FIG. 2B. Finally, in YUV 4:4:4 format,
`chrominance component matrices 142 and 144 may be the
`same size as Y matrix 140 in the horizontal and vertical
`directions as depicted in FIG. 2C.
`To store video data efficiently, conventional digital video
`systems contain a data compressor that compresses the video
`image data using compression techniques. Many conven
`tional compression techniques are based on compressing the
`Video image data by processing the different pixel compo
`nents separately. For example, in accordance with Motion
`Picture Experts Group (MPEG) or International Telecom
`munications Union (ITU) Video compression standards, a
`YUV-data compressor may encode the Y data independently
`of encoding U data and encoding V data. Such a compressor
`preferable receives video data in planar format, in which the
`Y. U, and V data for multiple pixels are separated and
`grouped together in three distinct data streams of Y only, U
`only and V only data, as described above (FIG. 1B).
`60
`Although planar format provides significant advantages
`for data compression, several disadvantages arise when
`storing or processing data received in planar format. For
`example, a video decoder that receives video image data in
`YUV planar format requires three pointers to the Y. U and
`V component values. For basic DVD (Digital Versatile Disk)
`and HDTV (High Definition Television) mode, each mac
`
`50
`
`55
`
`65
`
`2
`roblock has three blocks of pixels: Y:16x16, U:8x8 and
`V:8x8. In addition, the U and V components are located in
`different memory locations. In terms of code size, three
`blocks of code are required for conventional motion com
`pensation of the video image data. Moreover, a separate
`memory page usually must be opened for each YUV com
`ponent.
`In terms of cache efficiency, for YUV Video in the 4:2:0
`format (FIG. 2A), the useful area (in a cache line) for the
`Y-component is about sixteen-bytes per cache line. For the
`U and V components, the useful area is eight-bytes per line
`per color-component. Therefore, two rows in a macroblock
`potentially occupy four cache lines since the U and V
`components are vertically and horizontally Sub-sampled in
`4:2:0 format (two Y cache lines, one U cache line, and one
`V cache line). For YUV video in 4:2:2 format (FIG. 2B), six
`cache lines are required (2 Y cache lines, 2 U cache lines,
`and 2 V cache lines.) Although YUY2 packed format (FIG.
`1A), as described above, uses only two cache lines and could
`be used to overcome this cache inefficiency problem, con
`ventional motion compensation of data in YUY2 format is
`inefficient.
`Therefore, there remains a need to overcome the limita
`tions in the above described existing art, which is satisfied
`by the inventive structure and method described hereinafter.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Additional features and advantages of the invention will
`be more apparent from the following detailed description
`and appended claims when taken in conjunction with the
`drawings, in which:
`FIG. 1A illustrates a video image data stream in YUV2
`packed format as known in the art.
`FIG. 1B illustrates video image data stored in YV12 pure
`planar format, as known in the art.
`FIGS. 2A-C illustrate the YUV 4:2:0 color space format,
`the YUV 4:2:2 color space format and the YUV 4:4:4 color
`space format, as known in the art.
`FIG. 3 illustrates a block diagram of a conventional
`computer system as known in the art.
`FIG. 4 is a block diagram illustrating a video decoder in
`accordance with one embodiment of the present invention.
`FIGS. 5A and 5B illustrate YUV color space storage
`formats as known in the art.
`FIG. 5C illustrates a block diagram of a mixed storage
`format according to an exemplary embodiment of the
`present invention.
`FIGS. 6A-6B illustrate motion compensation of decoded
`blocks according to a further embodiment of the present
`invention.
`FIG. 7A is a block diagram illustrating steps for encoding
`and decoding MPEG image data as known in the art.
`FIG. 7B is a block diagram illustrating a video encoder in
`accordance with a further embodiment of the invention.
`FIG. 8 is a block diagram illustrating a write combining
`(WC) buffer as known in the art.
`FIG. 9 illustrates method steps for decoding an encoded
`MPEG video bit stream as known in the art.
`FIG. 10 illustrates method steps for decoding an encoded
`MPEG video bit stream in accordance with the novel motion
`compensation technique as taught by the present invention.
`FIG. 11A illustrates a stride of an image frame represented
`in the mixed storage format as taught by the present inven
`tion.
`FIG. 11B is a block diagram illustrating a write combining
`(WC) buffer including a cache line containing a Y-compo
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 18
`
`
`
`3
`nent and a UV component represented in the mixed storage
`format as taught by the present invention.
`FIG. 12 illustrates method steps for improved memory
`management of video images utilizing a mixed storage
`format according to an embodiment of the present invention.
`FIG. 13 illustrates additional method steps for improved
`memory management of video images utilizing a mixed
`storage format according to a further embodiment of the
`present invention.
`FIG. 14 illustrates additional method steps for improved
`memory management of video images utilizing a mixed
`storage format according to a further embodiment of the
`present invention.
`FIG. 15 illustrates additional method steps for improved
`memory management of video images utilizing a mixed
`storage format according to a further embodiment of the
`present invention.
`FIG. 16 illustrates method steps for decoding an encoded
`bit stream according to an embodiment of the present
`invention.
`FIG. 17 illustrates additional method steps for decoding
`an encoded bit stream according to a further embodiment of
`the present invention.
`FIG. 18 illustrates additional method steps for decoding
`an encoded bit stream utilizing a mixed storage format
`according to a further embodiment of the present invention.
`
`DETAILED DESCRIPTION
`
`System Architecture
`
`The present invention overcomes the problems in the
`existing art described above by providing a novel storage
`format enabling a method for improved memory manage
`ment of video images. In the following detailed description,
`numerous specific details are set forth in order to provide a
`thorough understanding of the present invention. However,
`one having ordinary skill in the art should recognize that the
`invention may be practiced without these specific details. In
`addition, the following description provides examples, and
`the accompanying drawings show various examples for the
`purposes of illustration. However, these examples should
`not be construed in a limiting sense as they are merely
`intended to provide examples of the present invention rather
`than to provide an exhaustive list of all possible implemen
`tations of the present invention. In some instances, well
`known structures, devices, and techniques have not been
`shown in detail to avoid obscuring the present invention.
`Referring to FIG. 3, a block diagram illustrating major
`components of a computer system 200 in which the inven
`tive storage format may be implemented is now described.
`The computer system 200 includes a display controller 220.
`The display controller 220 is, for example a Video Graphics
`Adapter (VGA), Super VGA (SVGA) or the like. Display
`controller 120 generates pixel data for display 290, which is,
`for example, a CRT, flat panel display or the like. The pixel
`data is generated at a rate characteristic of the refresh of
`display 290 (e.g., 60 Hz, 72 Hz, 75 Hz or the like) and
`horizontal and Vertical resolution of a display image (e.g.,
`640x480 pixels, 1024x768 pixels, 800x600 or the like).
`Display controller 220 may generate a continuous stream of
`pixel data at the characteristic rate of display 290.
`Display controller 220 is also provided with a display
`memory 222, which stores pixel data in text, graphics, or
`video modes for output to display 290. Host CPU 210 is be
`coupled to display controller 220 through bus 270 and
`updates the content of display memory 222 when a display
`
`4
`image for display 290 is altered. Bus 270 may comprise, for
`example, a PCI bus or the like. System memory 280 may be
`coupled to Host CPU 210 for storing data. Hardware video
`decoder 230 is provided to decode video data such as, for
`example, MPEG video data. MPEG video data is received
`from an MPEG video data source (e.g., CD-ROM or the
`like). Alternatively, the video decoder 230 is implemented
`as, for example, a conventional software decoder 282 stored
`in the system memory 280. As such, one of ordinary skill in
`the art will recognize that the teaching of the present
`invention may be implemented in either software or hard
`ware video decoders. Once decoded, the decoded video data
`is outputted to system memory 270 or directly to display
`memory 230.
`Referring to FIG. 4, the components of a video decoder
`240 according to a first embodiment of present invention are
`further described. The video decoder 240 may be utilized as
`the hardware decoder 230 or software decoder 282 within
`the computer system 200. MPEG data received from an
`MPEG data source may be decoded and decompressed as
`follows. The video decoder 240 receives an MPEG bit
`stream 242 at a Variable Length Decoding (VLD) block 244.
`The VLD block 244 decodes the MPEG bit stream 242 and
`generates a quanitized block 246 that is transferred to an
`Inverse Quantanization Block (IQ block) 266. The IQ block
`266 performs inverse quantization on the quantized block
`246 to generate a frequency spectrum 268 for the quantized
`block. An Inverse Discrete Cosine Transform (IDCT) block
`246 performs inverse discrete cosine transformation of the
`quantized block 246 using the frequency spectrum 268 to
`generate a decoded block 252 that is transferred to the
`motion compensation block (MCB) 248. Motion compen
`sation is performed by the MCB 248 to recreate the MPEG
`data 256. Finally, color conversion block 262 converts the
`MPEG data 256 into the Red, Green, Blue (RGB) color
`space in order to generate pictures 264.
`Conventional MPEG decoders, such hardware video
`decoder 230 or software video decoder 282, decode a
`compressed MPEG bit stream into a storage format depend
`ing on the particular compression format used to encode the
`MPEG bit stream. For the reasons described above, YUV
`planar format is the preferred format for compression of
`MPEG images within conventional MPEG decoders. Con
`sequently, the decoded block 252 outputted by the IDCT
`block 250 as well as the MPEG data 256 outputted by the
`MCB 254 are generated in YUV planar format within
`conventional MPEG decoders. Unfortunately, YUV planar
`format is an inefficient format during motion compensation
`of the decoded block 252.
`Accordingly, FIG. 5C depicts a novel mixed storage
`format 300 described by the present invention that is utilized
`by the video decoder 240. Careful review of FIGS. 5A-5C
`illustrates that Y component values are stored in a planar
`array 300A while the U and V components are interleaved
`in a packed array 300B. Using the mixed storage format300,
`decoded block 252 received from the IDCT block 246 is
`converted from planar format (FIG. 5B) to the mixed storage
`format300. Storage of reference frames 260 and MPEG data
`256 in the mixed storage format 300 optimizes motion
`compensation of the decoded block 252 as depicted in FIGS.
`6A and 6B.
`FIG. 6A depicts a previous frame 260 that is stored in a
`reference frames block 258. The previous frame 260
`includes a 320x280 pixel Y-component 310, a 160x140
`pixel U-component 312 and a 160x140 pixel V-component
`314, represented in planar format 304 (FIG. 5B). FIG. 6B
`depicts a portion of the Video Decoder 240. Together FIGS.
`
`US 7,298,782 B2
`
`5
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 19
`
`
`
`US 7,298,782 B2
`
`5
`
`10
`
`15
`
`5
`6A and 6B depict the combination of the decoded block 252
`(252A, 252B, and 252C) and corresponding YUV compo
`nents, as determined by a motion vector (V) 248, of the
`previous frame 260 during motion compensation. During the
`decoding process, the VLD block 244 generates the motion
`vector (V) 248, which corresponds to the decoded block 252
`generated by the IDCT Block 250. Referring again to FIGS.
`6A and 6B, the motion vector (V) 248 identifies Y-block
`316, U-block 318 and V-block 320 of the previous frame
`260. In order to motion compensate the decoded block 252,
`Y data 252A is combined with Y-block 316, U data is
`combined with U-block 318 and V data is combined with
`V-block 320 in order to generate MPEG data 256 (322A,
`322B and 322C).
`However, if the previous frame 260 and the decoded block
`252 are stored in the mixed storage format 300 (FIG. 5C),
`the motion compensation process is streamlined. Referring
`again to FIG. 6A, the previous frame can be stored in the
`mixed storage format 300 (FIG. 5C) as indicated by the
`320x140 UV-component 330, such that a UV-block 335 is
`formed. Referring again to FIG. 6B, the decoded block can
`be stored in the mixed storage format 300 (FIG. 5B) as
`indicated by UV data 340. The UV-block 335 is then
`combined with the UV data 340 in order to generate UV
`MPEG data 350.
`25
`Careful review of FIGS. 6A and 6B illustrates the advan
`tages of using the mixed packed format 300 (FIG. 5C)
`during motion compensation. Motion compensation using
`the planar format 304 (FIG. 5B) requires (1) opening three
`memory pages, (2) using three pointers, and (3) performing
`30
`three operations for each YUV component to motion com
`pensate the decoded block 252. In contrast, motion com
`pensation using the mixed storage format 300 (FIG. 5C)
`requires (1) opening two memory pages, (2) using two
`pointers, and (3) performing two operations for each Y and
`UV component to motion compensate the decoded block
`252. In addition, referring again to FIGS. 6A and 6B, storage
`of YUV data in cache memory (not shown) requires thirty
`two cache lines in planar format 304 (FIG. 5B) (eight-cache
`lines for the Y-component, eight cache lines for each U and
`40
`V component). In contrast, storage of the Y and UV com
`ponents in the mixed storage format 300 (FIG.5C) requires
`twenty-four cache lines (sixteen cache lines for the Y-com
`ponent and eight cache lines for the UV component).
`Moreover, benefits from using of the mixed storage
`format 300 (FIG.5C) are not limited to video decoders. FIG.
`7A depicts the steps for encoding and decoding video
`images 400. Careful review of FIG. 7A illustrates that the
`encoding 402 and decoding 420 of video images essentially
`mirror each other. Consequently, referring to FIG. 7B, the
`mixed storage format may be used in a Video Encoder 450
`during motion estimation 452, motion compensation 454
`and storage of reference frames 456. For the reasons
`described above use of the mixed storage format 300 (FIG.
`5C) will reduce hardware costs of the Video encoder (less
`pointers), and speed up the encoding process (less opera
`tions and memory access). In addition, cache line efficiency
`results from using the mixed storage format 300 (FIG.5C),
`as described above.
`Although use of the mixed storage format 300 (FIG.5C)
`provides improved cache line efficiency, there remains a
`need to improve access speed to graphics memory. One
`technique for overcoming this problem is using a Write
`Combining (WC) buffer in order to accelerate writes to the
`video frame buffer, as depicted in FIG. 8. FIG. 8 depicts a
`memory address space 500 including a WC region 502. The
`WC buffer 512 utilizes the fact that thirty-two byte burst
`
`50
`
`6
`writes are faster than individual byte or word writes since
`burst writes consume less bandwidth from the system bus.
`Hence, applications can write thirty-two bytes of data to the
`WC frame buffer 512 before burst writing the data to its final
`destination 508.
`Nonetheless, not all applications take advantage of WC
`buffers. One problem associated with WC buffers is that WC
`buffers generally contain only four or six entries in their WC
`region 502. Consequently, any memory stores to an address
`that is not included in the current WC buffer 512 will flush
`out one or some entry in the WC buffer 512. As a result,
`partial writes will occur and reduce the system performance.
`However, by writing data sequentially and consecutively
`into the WC region 502 of the system memory 500, partial
`memory writes to the WC region 502 are eliminated. Refer
`ring again to FIG. 8, if we write the first pixel 504 in line one
`of an image, then the first pixel 506 in line two of the image,
`it is very likely that the WC buffer 512 (holding only one
`byte) will be flushed out. This occurs because we are writing
`to the WC region 502 in a vertical manner, such that the
`second write does not map to the same entry of the WC
`buffer 512. In contrast, when we write to the WC region 502
`in a sequential and consecutive manner, the first thirty-two
`pixels 508 of line one of the image may be written to the WC
`buffer 512 before the first thirty-two bytes of pixels 508 are
`burst written to their final destination in the WC region 502.
`Once we completely write the thirty-second pixel byte 508
`in the WC buffer 512, the entire thirty-two bytes of pixels
`508 can be burst written to their final destination.
`In MPEG video decoding, it is important to reduce partial
`writes during motion compensation and during frame output
`to graphics memory. A novel method for re-ordering of
`operations during motion compensation of video frames in
`order to reduce partial writes is now described with refer
`ence to FIG. 9 and FIG. 10.
`FIG. 9 is a block diagram depicting steps for decoding an
`MPEG bit stream. Instead of motion compensating a block
`after a block, we propose motion compensation of blocks in
`groups of four block (as shown in FIG. 10). MPEG video
`bits streams are generally decoded as follows: (1) VLD the
`motion vector and the DCT coefficients of a block 522; (2)
`IQ and IDCT of the DCT coefficients of the block 524; and
`(3) MC the residue of the block with the displaced reference
`block 526. One problem of this approach is that it causes
`many partial writes.
`Generally, after a block is MC, the resulting block is
`written back to the frame buffer. Assuming the video image
`is stored in a linear fashion and the width of the video is
`larger than the size of an entry in WC buffer, the resulting
`marcoblock is written to the frame as follows. As the block
`is written back to the frame buffer line after line, each
`eight-byte write starts a new cache line. Thus, after storing
`four lines, the application is forcing the WC buffer to flush
`some of its entries. That is, partials writes (16 bytes out of
`32 bytes) occur.
`However by motion compensating four blocks together,
`partial writes are eliminated. That is, Step one (522) and
`Step two (524) are repeated four times before Step three
`(526) is performed as depicted in FIG. 10. Furthermore,
`instead of writing out the second line of a first block after the
`first line of the first block, the first line of a second block is
`written out. This is because the first line of the first block and
`the first line of the second block belong to the same cache
`lines. Consequently, the WC buffer 500 can easily combine
`the write operations in a burst operation. However, those
`skilled in the art will recognize that various numbers of
`
`35
`
`45
`
`55
`
`60
`
`65
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1027, p. 20
`
`
`
`US 7,298,782 B2
`
`7
`blocks may be chose as the plurality of block, such the
`number of blocks is dependent on the line size of the write
`combining buffer.
`The real advantages of writing the data out in a multiple
`of four blocks comes from using the mixed storage format
`300 (FIG.5C). In this format that, Y components of a video
`image have the same stride as the UV components of the
`video image. Referring to FIG. 11A, an image frame 550 is
`depicted utilizing the mixed storage format 300. The image
`550 includes a 320x280 pixel Y-component 552 and a
`320x140 pixel UV-component 554. As such a Y-block 556
`has an equal stride 560 as a UV-block 558. As a result,
`whenever we finish writing to a full cache line 572 (thirty
`two bytes) of a WC buffer 570 (FIG. 11B), we generate a
`“full write” of Y-components 574 (thirty-two bytes) and
`15
`corresponding UV components 576 (thirty-two bytes). We
`only need four blocks to guarantee a “full-write' of the
`cache line. That is, we don't need four extra blocks to
`guarantee a “full-write of the cache line. This property
`provides a distinct advantage over previous pure planner
`YV12 format. Procedural method steps for implementing
`the inventive mixed storage format 300 (FIG. 5C) and a
`modified method for decoding an encoded bit stream are
`now described.
`
`10
`
`Operation
`
`25
`
`8
`are written to a U-plane 304B of the planar arrays 304 and
`the V component is written to a V-plane 304B of the planar
`arrays 304. The planar format is, for example, chosen as one
`of YV12 planar format, YUV12 planar format, YUV16
`planar format, or YUV9 planar format. In addition, color
`components are presented in a color space chosen as, for
`example, one of a YUV color space, a YCrCb color space,
`a YIQ color space, or an RGB color space.
`Referring now to FIG. 16, a method 700 is depicted for
`decoding an encoded bit stream, for example, in the Video
`Decoder 240 as depicted in FIG. 4. At step 704, a portion of
`the encoded bit stream is received representing an encoded
`block. Alternatively, a quanitized block 246 may be
`received. At step 706, the encoded block is variable length
`decoded (VLD) to generate a quantized block. When the
`quanitized block is received at step 704, step 706 is not
`performed. Those skilled in the art will appreciate that the
`encoded block may be decoded in various ways and remain
`with in the scope of this invention. At step 708, inverse
`quantization (IQ) is performed on the quantized block to
`generate a frequency spectrum for the quantized block. At
`step 710, inverse discrete cosine transformation (IDCT) of
`the quantized block is performed using the frequency spec
`trum to generate a decoded block. At step 712, steps 704
`through 710 are repeated for a plurality of encoded blocks.
`As a result, a plurality of decoded blocks, representing a
`plurality of macroblocks, are formed. At step 714, the
`plurality of macroblocks are motion compensated as a group
`in order to generate a plurality of motion compensated (MC)
`macroblocks. Finally at step 740, steps 704 through 714 are
`repeated for each encoded block represented by the encoded
`bit stream. The encoded bit streams is, for example, an
`encoded MPEG video bit stream.
`FIG. 17 depicts additional method steps 716 for motion
`compensating the plurality of macroblocks of step 714. At
`step 718, four blocks are used as the plurality of blocks.
`Finally at step 720, pixel data of the four MC blocks is
`written as a group and in a sequential manner to a frame
`buffer, such that prior to being burst written to the frame
`buffer, the pixel data is