`
`803
`
`Transactions Letters________________________________________________________________
`
`A Unified Approach to Restoration, Deinterlacing and Resolution Enhancement
`in Decoding MPEG-2 Video
`Bo Martins and Søren Forchhammer
`
`Abstract—The quality and spatial resolution of video can be
`improved by combining multiple pictures to form a single super-
`resolution picture. We address the special problems associated
`with pictures of variable but somehow parameterized quality
`such as MPEG-decoded video. Our algorithm provides a unified
`approach to restoration, chrominance upsampling, deinterlacing,
`and resolution enhancement. A decoded MPEG-2 sequence for
`interlaced standard definition television (SDTV) in 4 : 2 : 0 is
`converted to: 1) improved quality interlaced SDTV in 4 : 2 : 0; 2)
`interlaced SDTV in 4 : 4 : 4; 3) progressive SDTV in 4 : 4 : 4; 4)
`interlaced high-definition TV (HDTV) in 4 : 2 : 0; and 5) progres-
`sive HDTV in 4 : 2 : 0. These conversions also provide features as
`freeze frame and zoom. The algorithm is mainly targeted at bit
`rates of 4–8 Mb/s. The algorithm is based on motion-compen-
`sated spatial upsampling from multiple images and decimation
`to the desired format. The processing involves an estimated
`quality of individual pixels based on MPEG image type and local
`quantization value. The mean-squared error (MSE) is reduced,
`compared to the directly decoded sequence, and annoying ringing
`artifacts including mosquito noise are effectively suppressed. The
`superresolution pictures obtained by the algorithm are of much
`higher visual quality and have lower MSE than superresolution
`pictures obtained by simple spatial interpolation.
`
`Index Terms—Deinterlacing, enhanced decoding, motion-com-
`pensated processing, MPEG-2, SDTV to HDTV conversion, video
`decoding.
`
`I. INTRODUCTION
`
`M PEG-2 [1] is currently the most popular method for com-
`
`pressing digital video. It is used for storing video on
`digital versatile disks (DVDs) and it is used in the contribu-
`tion and distribution of video for TV. We base this paper on the
`MPEG reference software encoder [2] for which a bit rate of
`5–7 Mb/s yields a quality which is equivalent to (analog) distri-
`bution phase alternating line (PAL) TV quality. Lower bit rates
`are also used in TV distribution to save bandwidth and because
`professional encoders may provide better quality than the refer-
`ence software.
`
`Manuscript received December 1, 1999; revised May 2, 2002. This work was
`supported in part by The Danish National Centre for IT Research. This paper
`was recommended by Associate Editor A. Tabatabai.
`B. Martins was with the Department of Telecommunication, Technical
`University of Denmark, DK-2800 Lyngby, Denmark. He is now with
`Scientific-Atlanta Denmark A/S, DK-2860 Søborg, Denmark (e-mail: bo.mar-
`tins@sciatl.com).
`S. Forchhammer is with Research Center COM, 371, Technical University of
`Denmark, DK-2800 Lyngby, Denmark (e-mail: sf@com.dtu.dk).
`Publisher Item Identifier 10.1109/TCSVT.2002.803227.
`
`At these bit rates, a sequence decoded from an MPEG-2 bit-
`stream is of lower quality than the original digital sequence in
`terms of sharpness and color resolution but still acceptable (ex-
`cept for very demanding material). This overall reduction of
`quality is less annoying to a human observer than the artifacts
`typically found in compressed video. The most annoying ar-
`tifacts are ringing artifacts1 and in particular mosquito noise,
`which occurs when the appearance of the ringing changes from
`picture to picture.
`The primary goal of this paper is to improve MPEG-2 de-
`coding, or rather to postprocess the decoded sequence re-using
`information in the MPEG-2 bitstream to obtain a sequence of
`higher fidelity, especially with regard to the artifacts. The re-
`sulting output is a sequence in the same format as the directly
`decoded one, which in our case is interlaced standard TV in
`4 : 2 : 0. In addition, we demonstrate how the approach can be
`used to obtain progressive (deinterlaced) or high-definition TV
`(HDTV) from the same bitstream. This also facilitates features
`such as frame freeze and zoom.
`Previous work on postprocessing includes projections onto
`convex sets (POCS) [3] and regularization [4]. For low-bit-rate
`(high compression)
`JPEG-compressed still
`images
`and
`MPEG-1-coded moving pictures, the main artifact is blocking,
`i.e., visible discontinuities at coding block boundaries. This
`artifact can be dealt with efficiently using the POCS frame-
`work [5], as well as by other methods [6]. By regularization,
`POCS constraints can be combined with “soft” assumptions
`about the sequence. Thus, Choi et al. [4] restored very-low
`bit-rate video encoded by H.261 and H.263 according to the
`following desired (soft) properties: 1) smoothness across block
`boundaries; 2) small distance between the directly decoded
`sequence and the reconstructed sequence; and 3) smoothness
`along motion trajectories. Elad and Feuer [7] presented a
`unified methodology for superresolution restoration requiring
`explicit knowledge of parameters as warping and blurring. As
`this knowledge is not available in our case, we do not take the
`risk of processing based on estimating such parameters. Patti et
`al. [8] also addressed the superresolution problem in a general
`setting modeling the system components. They applied POCS
`performing projections for each pixel of each reference image
`in each iteration. Recently [9] this approach was modified to
`obtain superresolution from images of an MPEG-1 sequence
`captured by a specific video camera. Projections were carried
`
`1Ringing artifacts are caused by the quantization error of high-frequency con-
`tent, e.g., at edges. They appear as ringing adjacent to the edge.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`1051-8215/02$17.00 © 2002 IEEE
`
`1
`
`SAMSUNG-1007
`
`
`
`804
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`out in the transform domain. Our goal is to develop simpler
`techniques (which could be combined with POCS).
`The starting point of our work is the sequence decoded by
`an ordinary MPEG-2 decoder [2]. The material to be processed
`in this paper is of higher quality than MPEG-1 material or the
`low-bit-rate material of [4]. Consequently, there is a higher risk
`of degrading the material. Enforcing assumptions of smoothness
`of the material will almost surely lead to a decrease of sharpness.
`The basic idea of our restoration scheme is to apply a conser-
`vative form of filtering along motion trajectories utilizing the
`assumed quality of the pixels on each trajectory. The assumed
`quality of each pixel in the decoded sequence is given by the
`MPEG picture structure (i.e., what type of motion compensation
`is applied) and the quantization step size for the corresponding
`macroblock.
`The algorithm has two steps. In the first step, a superreso-
`lution version (default is quadruple resolution) of each directly
`decoded picture2 is constructed. In the second step, the super-
`resolution picture is decimated to the desired format. Depending
`on the degree of decimation of the chrominance and luminance
`in the second step, the problem addressed is one of restoration,
`chrominance upsampling, deinterlacing, or resolution enhance-
`ment, e.g., conversion to HDTV. The aim in restoration is to en-
`hance the decoding quality. For the other applications, the reso-
`lution is also enhanced.
`In the first part of the upsampling, directly decoded pixels are
`placed very accurately in a superresolution picture before fur-
`ther processing. This approach is motivated by the fact that the
`individual pictures of the original sequence are undersampled
`[9], [10]. We do not want to trade resolution for improved peak
`signal-to-noise ratio (PSNR) by spatial filtering at this stage so
`the noise reducing filtering is deferred to the decimation step.
`The paper is organized as follows. In Section II, a quality
`value is assigned to each pixel in the decoded sequence. Part
`one (upsampling) of our enhancement algorithm is described in
`Section III. The second part (decimation) is described in Sec-
`tion IV. Results on a number of test sequences are presented in
`Section V.
`
`II. PROCESSING BASED ON MPEG-QUALITY
`
`MPEG-2 [1] partitions a picture into 16 16 blocks of picture
`material (macroblocks). A macroblock is usually predicted from
`one or more reference pictures. The different types of pictures
`are referred to as I, P, and B pictures. I pictures are intracoded,
`i.e., no temporal prediction. Macroblocks in P pictures may be
`unidirectionally predicted and macroblocks in B pictures may
`be uni- or bidirectionally predicted. (Macroblocks in B and P
`pictures may also be intracoded as macroblocks in I-pictures.)
`The error block, resulting from the prediction, is partitioned
`into four luminance and two, four, or eight chrominance blocks
`of 8
`8 pixels, depending on the format. For the 4 : 2 : 0 format,
`each macroblock has two chrominance blocks. The discrete co-
`sine transform (DCT) is applied to each 8
`8 block. The DCT
`coefficients are subjected to scalar quantization before being
`coded to form the bitstream.
`
`A. Quality Measure for Pixels in an MPEG Sequence
`From the MPEG code stream, the type (I, P, or B) and the
`quantization step size are extracted for each macroblock. Based
`on this information, we shall estimate a quality parameter for
`each pixel which is used in a motion-compensated (MC) fil-
`tering. MPEG specifies the code-stream syntax but not the en-
`coder itself. Our work is based on the reference MPEG-2 soft-
`ware encoder [2], for which the quantizers may be character-
`ized as follows. The nonintra quantizer used for DCT coefficient
`is (very close to) a uniform quantizer with quantization
`step
`and a deadzone of
`around zero. The intra quantizer
`used for DCT coefficient
`has a deadzone of 5/4
`around zero. For larger values, it is a uniform quantizer with
`, and the dequantizer reconstruction
`quantization step
`point has a bias of 1/8
`toward zero. In [2], as is usually
`the case, all DCT coefficients in all blocks are being quantized
`independently as scalars.
`The mean-squared error (MSE) caused by the quantization
`depends on the distribution of
`. This distribution varies
`with the image content and is hard to estimate accurately. We
`may approximate the expected error by the expression for a uni-
`form distribution of errors, within each quantization interval, re-
`sulting from a uniform quantizer with quantization step
`ap-
`plied to
`
`(1)
`
`This expression may underestimate the error as it neglects the
`influence of the dead zone, and it may overestimate the error as
`the distribution of
`is usually quite peaked around zero,
`especially for the high frequencies.
`The DCT transform is unitary (when appropriate scaling is
`applied). Thus, the sum of squares over a block is the same in
`the DCT and spatial domains. Applying this to the quantization
`errors and introducing the expected values gives the following
`relationship for each DCT block:
`
`(2)
`
`where the DCT coefficients
`are scaled as specified in [1]
`(Annex A) and
`denotes the pixel value variables. As
`an approximation, we assume that the expected squared quan-
`tization errors are the same for all the pixel positions
`within the DCT block. Based on this assumption, the expected
`value of the squared error for pixel
`is given by
`
`2In this paper, all pictures are field pictures.
`
`for all
`
`within the DCT block having coefficients
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`(3)
`
`.
`
`2
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`805
`
`Fig. 1. MSE measured for sequence table as a function of the quantization step
`size q (depicted using natural logarithms). For intra pictures, q is defined as
`the quantization step size for the DCT coefficient at (1, 1).
`
`Fig. 1 depicts the logarithm of the MSE as a function of
`for the luminance component of I, B, and P pictures.
`The figure reflects the fact that bidirectional prediction is better
`than unidirectional prediction, and that intra pictures and non-
`intra pictures are different. It is noted that we can use the ex-
`pression (1) as a general approximation for the MSE of picture
`type
`as long as we replace with
`, where
`is a constant
`which depends on the picture type, i.e.
`
`(4)
`
`From the data in Fig. 1, we measure
`, and
`,
`. These values are used in all the experiments reported.
`The intra and nonintra quantization matrices used [2] are dif-
`ferent. This is, in part, addressed by the values of
`. [The value
`of was measured with
`defined as the quantization step size
`for
`.] The normalized quantization parameters,
`in (4)
`are used as the quality value we assign to each pixel within the
`block. This measure is only used for relative comparisons and
`not as an absolute measure. It could be improved by taking the
`specific frequency content into account, as well as the precise
`quantization for each coefficient.
`In general, pixels in the interior of an 8
`8 DCT block have
`a smaller MSE than pixels on the border. We could assign a dif-
`ferent value of
`for interior pixels and pixels on the border.
`Experiments lead to our decision of ignoring the small differ-
`ence at our (high) bit rates and as an approximation use the same
`quality value (4) for all pixels in a block.
`
`III. UPSAMPLING TO SUPERRESOLUTION USING
`MOTION COMPENSATION
`
`To process a given (directly decoded) picture we combine the
`information from the current frame and the
`previous frames
`and the
`subsequent frames, where
`is a parameter and
`each frame consists of two field pictures. We first describe how
`to align pixels of the current picture at time with pixels of one
`of the reference pictures using motion estimation. Section III-A
`then describes how to combine the information from all the ref-
`erence pictures to form a single superresolution picture at
`. The
`
`Fig. 2. Overview block diagram. MC upsampling alternates between doubling
`the resolution vertically and horizontally. Then final step is decimation to the
`desired format. Equation numbers are given in (). Dashed line marks control
`flow.
`
`term superresolution picture is used to refer to the initial MC up-
`sampled high-resolution image. An overview of the algorithm is
`given in Fig. 2.
`The motion field, relative to one of the reference pictures, is
`determined on the directly decoded sequence by block-based
`motion estimation using blocks of size 8
`8. This block size
`is our compromise between larger blocks for robustness and
`smaller blocks for accuracy, e.g., at object boundaries. A mo-
`tion vector is calculated at subpixel accuracy for each pixel
`of the current picture relative to the reference field picture con-
`sidered. Based on the position of
`and the associated motion
`vector, one pixel
`shall be chosen in the reference picture.
`The motion vector is found by searching the reference picture
`for the best match of the 8 8 block, which has
`positioned as
`the lower-right of the four center pixels. The displacements are
`denoted by
`, where
`is the integer
`and
`the (positive) fractional part of the displacement
`relative to the position in the current picture.
`is the ver-
`tical displacement. For a given candidate vector
`, each pixel
`of the 8 8 block is matched against an esti-
`mated value which is formed by bilinear interpolation of four
`neighboring pixels
`,
`,
`, and
`in the reference picture
`
`(5)
`
`where
`
`is the pixel in the reference picture displaced
`relative to the pixel
`in the current picture. (The co-
`ordinate systems of the two pictures are aligned such that the
`positions of the pixels coincide with the lattice given by the in-
`teger coordinates.) The subpixel resolution of the motion field,
`specified vertically by
`and horizontally by
`, determines the
`allowed values of
`and
`:
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`3
`
`
`
`806
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`and
`
`. The best motion vector
`is defined as the candidate vector that
`minimizes the sum of the absolute differences (
`) taken
`over the 64 pixels of the block. (How the set of candidate vec-
`tors is determined is described in Section III-C.) Let
`be the absolute coordinate of pixel
`in the reference picture
`obtained by displacing the position of the current pixel
`by
`the integer part
`of the best motion vector. The pixel
`value of
`is now perceived as a (quantized) sample value of a
`pixel at position
`,
`in a
`superresolution picture at time which has
`times the number
`of pixels vertically and
`times the number horizontally rela-
`tive to the directly decoded picture.
`It is not sufficient, though, to find the best motion vector ac-
`cording to the matching criterion as there is no guarantee this is
`a good match. The following criteria is used to decide for each
`whether it shall actually be placed in the superresolution pic-
`ture. We may look at the problem as a lossless data compression
`problem (inspired by the minimum description length principle
`[11]). Let there be two alternative predictive descriptions of the
`pixels of the current 8 8 block, one utilizing a block of the ref-
`erence picture and one which does not. If the best compression
`method that utilizes the reference block is better than the best
`method which does not, then we rely on the match. In practice,
`we do not know the best data compression scheme, but instead
`some of the best compression schemes in the literature may be
`used. For lossless still-image coding, we use JPEG-LS [12]. For
`lossless compression utilizing motion compensation, we chose
`the technique in [13], which may be characterized as JPEG-LS
`with motion compensation. For simplicity, the comparison is
`based on the sum of absolute differences. The JPEG-LS pre-
`dictor [12] is given by
`
`if
`if
`otherwise
`
`(6)
`
`, denotes the pixel on
`
`denotes the pixel to the left of
`where
`top of
`, and
`the top-left pixel.
`We compare the (intra picture) JPEG-LS predictor and the
`best MC bilinear predictor (5). If the former yields a better pre-
`diction of the pixels of the 8 8 surrounding block, we leave the
`superresolution pixel undefined (or unchanged) by not inserting
`(or modifying) a MC pixel at the position
`,
`.
`Checking the match reduces the risk of errors in the motion
`compensation process, e.g., at occlusions. Occlusions are also
`handled by performing the motion compensation in both di-
`rections time wise, and by performing motion compensation at
`pixel level. This leads to a fairly robust handling of occlusions
`to within 3–4 pixels of the edge.
`
`A. Forming the Superresolution Picture
`The superresolution picture is initially formed by mapping
`pixels from each of the reference pictures as described above.
`The implemented block-based motion-compensation scheme
`is described in Section III-C. If more than one reference pixel
`
`maps to the same superresolution pixel, the superresolution
`pixel is assigned the value of the reference pixel having the
`smallest value of the normalized quantization parameter
`obtained from
`and the picture type (4). If the pixels are
`of equal quality
`, the superresolution pixel is set equal to
`their average value. We do not define a MC superresolution
`pixel if the best (i.e., smallest)
`is significantly larger than the
`normalized quantization value of the current macroblock in the
`directly decoded picture.
`Pixels of the current directly decoded picture a priori have a
`higher validity than the reference pixels because the exact lo-
`cation in the current picture is known. Let
`be a pixel of the
`directly decoded picture at time
`and
`a pixel from a refer-
`ence picture aligned with within the uncertainty of the motion
`compensation. To estimate a new (superresolution) pixel value
`at the original sample position of
`, we calculate a weighted
`value
`of
`and
`by
`
`The filter coefficients in (7) may be estimated in a training
`session using original data. The (MSE) optimal linear filter is
`given by solving the Wiener–Hopf equations
`
`(7)
`
`(8)
`
`where
`are the stochastic variables of the pixels in
`, and
`,
`(7). The variables
`and
`represent quantized pixel values,
`whereas
`represents a (superresolution) pixel at a sample posi-
`tion in the picture with the original resolution. The Wiener filter
`coefficients could, alternatively, be computed under the con-
`straint that
`in order to preserve the mean value. In
`our experiments on actual data applying (8),
`was fairly
`close to 1, so we just proceeded with these estimates. Given
`enough training data, the second-order mean values in (8) could
`be conditioned on the quality of
`, i.e.,
`, and on
`the types of the pictures of
`and
`as well as other MPEG pa-
`rameters. In this paper, the picture type is reflected by (4) and the
`number of free parameters is reduced by fitting a smooth func-
`tion to the samples
`. We choose the function below
`as it is monotonically increasing in
`from 0 to 1 and as its
`behavior can be adjusted by just two parameters as follows:
`
`(9)
`(10)
`
`The parameter
`should
`specifies the a priori weight that
`carry. The parameter
`specifies how much the difference in the
`qualities of
`and
`should influence . The filter (9) has the
`property that for
`,
`and
`, we have
`.
`The MC superresolution pixels, which do not coincide with
`the sample positions in the current image, maintain the quality
`value they were assigned in the reference picture. Pixels in the
`original sample positions
`, determined by (7),
`are assigned the quality value
`
`(11)
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`4
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`807
`
`used in the software coder [2] for 4 : 2 : 2–4 : 4 : 4 con-
`
`filter
`version
`
`(12)
`
`Each pixel of the resulting superresolution picture
`is assigned the attribute of whether it was determined by motion
`compensation
`or interpolation
`. The MC
`pixels also maintain their quality value determined by (4) [and
`possibly modified by (11)] as an attribute.
`
`Fig. 3. Block diagram of MC upsampling doubling the vertical or horizontal
`resolution. Equation numbers are given in (). Averages as expressed by
`(8)–(10) may also be used.
`
`C. Speedup of Motion Compensation
`The following scheme is applied to speed up the estimation
`of the
`high-resolution motion fields that are required
`for the
`reference pictures relative to the current pic-
`ture. The very first motion field (estimating the displacement
`of pixels of the other field of the current frame relative to the
`current field picture) is found by an exhaustive search within a
`7 horizontally).
`small rectangular window ( 3 vertically and
`For each of the remaining
`reference pictures, we initially
`predict the motion field before actually estimating the field by
`a search over a reduced set of candidate motion vectors. The
`motion field is initially predicted from the previously estimated
`motion fields using linear prediction, simply extrapolating the
`motion based on two motion vectors taken from two previous
`fields. (The offset in relative pixel positions between fields of
`different parity is taken into account in the extrapolation. After
`this the motion vectors implicitly takes care of the parity issue.)
`Having the predicted motion field (truncated to integer preci-
`sion), we collect a list over the most common motion vectors
`appearing in the predicted motion field. Thereafter, the search is
`restricted to the small set of this list for the integer part (
`)
`of the motion vector in (5). All
`fractional values of a mo-
`tion vector are combined with the
`integer vectors on the list.
`Consequently, the final motion vector search consists of trying
`out
`vectors. This way, we hope to track the motion vec-
`tors at picture level without requiring the tracking locally. Thus,
`even with a small initial search area, between the two fields of
`a frame, the magnitude of the motion vectors on the list may
`increase considerably with no explicit limit to the magnitude.
`Very fast motion, exceeding the initial search area between two
`fields of the same frame, is not captured though. In the exper-
`iments, we use a fixed-size (
`) candidate list. The size
`of the list can be adjusted according to different criterias. As
`an example, including all motion vectors on the list with an oc-
`currence count greater than some threshold
`in the predicted
`motion field reduces the risk of overlooking the motion vector
`of an object composed of more than
`pixels, as a motion vector
`is estimated for each pixel. An additional increase in speed for
`higher-resolution motion fields
`is obtained
`by letting them be simple subpixel refinements of the motion
`field found for
`. The processing time for creating
`the high-resolution motion field is proportional to
`instead of
`, i.e., approximately a reduction by a factor
`of four for the usual resolution
`. As the size of
`the list with the updated vectors is fixed, the complexity is also
`proportional to the number of pictures specified by
`.
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`B. Completing the Superresolution Picture by Interpolation
`A block diagram of the MC upsampling is given in Fig. 3.
`Let
`denote a superresolution picture created by MC
`upsampling as described previously.
`and
`specify the res-
`olution of the motion compensation (5). Usually, some of the
`pixels of
`are undefined because there was no ac-
`curate match (of adequate quality) in any of the reference pic-
`tures. These pixels are assigned values from an interpolated
`superresolution picture
`having the same resolution
`as
`. The resulting image is denoted
`. The
`picture
`is created by a 2 : 1 spatial interpolation of
`the high-resolution picture
`(if
`) or the
`high-resolution picture
`(if
`). This upsam-
`pling alternates between horizontal and vertical 2 : 1 upsam-
`pling.
`The upsampling process is first initialized by setting
`equal to the directly decoded picture which has the original res-
`olution. Thereafter, the initialization is completed by defining
`, where
`is created by spatial inter-
`polation of
`. Hereafter,
`,
`, and
`3
`may be created in turn building up the resolution, alternating
`between horizontal and vertical 2 : 1 upsampling.
`The odd samples being interpolated in the upsampled picture
`are obtained with a symmetric finite-impulse response (FIR)
`
`3The block-based motion-estimation method applied does not warrant higher
`precision of the motion field.
`
`5
`
`
`
`808
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`In order to keep the algorithmic complexity down, we base the
`decisions in the enhancement algorithm on analysis of the lumi-
`nance component only, always performing the same operations
`on a chrominance pixel as the corresponding luminance pixel.
`Additionally, no special action is taken at the picture boundaries
`apart from zero padding. The original motion vectors coming
`with the bit stream were disregarded as a higher resolution is
`desired. They could be used though, e.g., by including them on
`the list of predicted motion vectors.
`
`IV. DECIMATION
`
`The upsampling procedure only performed quality-based
`filtering for pixels located on the same motion trajectory
`(within our accuracy). In this section, we state a downsampling
`scheme applying quality based spatial filtering of the super-
`resolution pictures. The filter coefficient for each pixel should
`reflect the quality and the spatial distance of the pixel. The
`quality attributes are dependent on the MPEG quantization (4)
`and whether the pixel is MC or interpolated. For all possible
`combinations of quality attributes within the filter window,
`the optimal filter could be determined given enough training
`data. Instead, we take the simpler approach of first assigning
`individual weights to each pixel depending on its attributes
`relative to the current pixel and then normalizing the filter
`coefficients.
`A two-dimensional linear filter
`is applied to the samples
`of the superresolution picture
`in the vicinity of
`each sample position
`in the resulting output image of
`lower resolution. The filter is a product of a symmetric vertical
`filter, a symmetric horizontal filter and a function reflecting the
`quality. The weight of the pixel
`at
`in
`is
`
`(13)
`
`In this expression, the weight
`quality attributes of the pixel
`factor
`. The 1-D filters
`spatial distance, are defined as follows:
`
`is a function of the
`is a normalizing
`, reflecting the
`
`and
`and
`
`(14)
`(15)
`(16)
`
`It is noticed that the support of the low-pass filter is
`superresolution pixels or approximately the area of one
`low-resolution pixel. This very small region of support is chosen
`to reduce the risk of blurring across edges in the decimation
`process. Furthermore, the value of
`should be quite small be-
`cause very often the individual pictures are undersampled. In
`the experiments, we use the parameter value
`.
`The function
`(13), reflecting the quality, depends on
`whether
`and
`are MC superresolution pixels
`or whether they were found through interpolation. When both
`pixels are MC [i.e., defined by
`], their rel-
`ative quality parameters are used to determine the weight of
`
`. If one of the pixels is obtained by interpolation, a con-
`stant is used for the weight
`
`are parameters.
`
`(17)
`
`where
`, and
`,
`The parameter
`worth of a MC
`
`specifies the a priori
`pixel compared to an interpolated
`pixel. The last case in (17), where there
`is no MC superresolution
`pixel at the output sample
`position
`, may occur in conversion to HDTV and in
`chrominance upsampling. Restoring SDTV there will always
`be the directly decoded pixel at
`ensuring a defined
`pixel in
`at
`.
`The parameter
`is a global parameter (set to 0.5) whereas
`is inversely proportional to a local estimate [within a region
`of size
`] of the variance of the superresolu-
`tion picture at
`.
`is set to 6. The structurally simple
`downsampling filter specified by (13)–(17) only has the four
`parameters
`. The downsampling filter also attenu-
`ates noise, e.g., from (small) inaccuracies in the motion com-
`pensation. (Larger inaccuracies in the motion compensation are
`largely avoided by checking the matches and only operating on
`a reduced list of candidate motion vectors.)
`
`V. RESULTS
`
`Four sequences were encoded: table, mobcal, tambour-sdtv,
`and tambour-hdtv. The extremely complex tambour sequence
`has been used both as interlaced SDTV and in HDTV format.
`For SDTV, the format is 4 : 2 : 0 PAL TV, i.e., the luminance
`frame size is 720
`576 and the frame rate is 25 frames/sec. For
`HDTV, the resolution is doubled horizontally and vertically.
`The parameters
`of the filter expression (9) are esti-
`mated using a small number of frames of the sequence mobcal.
`Calculating the Wiener filter (8), we assume implicitly that
`the “original” pixels of the superresolution picture taken at
`the sample positions are equal to the original (low-resolution)
`pixels of the SDTV test sequence. This yields the curve
`depicted in Fig. 4. Fitting the filter parameters of (9)
`to this curve yields
`and
`. These parameters
`are used in the processing of all the test sequences. Besides the
`curve
`based on average values over all
`, curves of
`were recorded for different fixed values of
`. These
`curves differ from the average in shape, as well as in level, e.g.,
`expressed by the value for
`, i.e.,
`. For most of the
`occurrences,
`was close to 1. The irregular shape of the
`curves for larger values of
`reflects the sparse statistics
`and due to this the dependency on the specific data that was
`used for estimating the Wiener filter. The overall level of
`was observed to increase with increasing
`, reflecting the fact
`that the motion estimation inaccuracy becomes less important
`when the quantization error is large.
`
`Authorized licensed use limited to: Cliff Reader. Downloaded on November 14,2023 at 06:50:12 UTC from IEEE Xplore. Restrictions apply.
`
`6
`
`
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 9, SEPTEMBER 2002
`
`809
`
`Fig. 4. Wiener filter coefficient h as a function of q =q for a piece of mobcal.
`The smooth function is the filter expression obtained by fitting and
`curves are h (q =q ) for all q and for a small fixed value of q (=12).
`
`Fig. 6. Average improvement in PSNR for luminance and chrominance for all
`sequences (res) using parameters H = V = 4 and N = 5. The result of
`increasing the bit rate by 1 Mb/s (+1) is given for comparison.
`
`Fig. 5. PSNR of directly decoded sequences