`
`191
`
`Stereoscopic Image Generation Based
`on Depth Images for 3D TV
`
`Liang Zhang and Wa James Tam
`
`Abstract—A depth-image-based rendering system for gener-
`ating stereoscopic images is proposed. One important aspect of
`the proposed system is that the depth maps are pre-processed
`using an asymmetric filter to smoothen the sharp changes in
`depth at object boundaries. In addition to ameliorating the ef-
`fects of blocky artifacts and other distortions contained in the
`depth maps, the smoothing reduces or completely removes newly
`exposed (disocclusion) areas where potential artifacts can arise
`from image warping which is needed to generate images from
`new viewpoints. The asymmetric nature of the filter reduces the
`amount of geometric distortion that might be perceived otherwise.
`We present some results to show that the proposed system provides
`an improvement in image quality of stereoscopic virtual views
`while maintaining reasonably good depth quality.
`
`Index Terms—Asymmetric filter, depth-image-based rendering,
`stereoscopic image, stereoscopic image generation, three-dimen-
`sional television.
`
`I. INTRODUCTION
`
`D EPTH-IMAGE-BASED Rendering (DIBR) techniques
`
`have recently received much attention in the broadcast
`research community as a promising technology for three-di-
`mensional television (3D TV) systems [1]–[3]. Whereas, the
`classical approach requires the transmission of two streams of
`video images [4], [5], one for each eye, 3D TV systems based
`on DIBR will require a single stream of monoscopic images
`and a second stream of associated images, usually termed depth
`images or depth maps, that convey per-pixel depth information.
`A depth map is essentially a two-dimensional (2D) function that
`gives the depth, with respect to the camera position, of a point
`in the visual scene as a function of the image coordinates. Since
`the depth of every point in an original image is known, a virtual
`image of any nearby viewpoint can be rendered by projecting
`the pixels of the original image to their proper 3D locations
`and re-projecting them onto the virtual image plane. Thus,
`DIBR permits the creation of novel images, using information
`from the depth maps, as if they were captured with a camera
`from different viewpoints. A further advantage of the DIBR
`approach is that depth maps can be coded more efficiently than
`two streams of natural images, thereby reducing the bandwidth
`required for transmission. In this vein, it is not only suitable for
`3D TV but also for other 3D applications such as multimedia
`systems [6].
`
`Manuscript received August 30, 2004; revised December 22, 2004.
`The authors are with the Communications Research Centre Canada, Ottawa,
`Ontario, Canada, K2H 8S2 (e-mail: liang.zhang@crc.ca; james.tam@crc.ca).
`Digital Object Identifier 10.1109/TBC.2005.846190
`
`One disadvantage of the DIBR approach is that with this type
`of data representation, one or more “virtual” images of the 3D
`scene have to be generated at the receiver side in real time. In
`addition, it is not an easy task to create new, virtual, images with
`high image quality.
`The most significant problem in DIBR is how to deal with
`newly exposed areas (holes) appearing in the virtual images.
`Holes are due to the accretion (disocclusion) of portions/regions
`of objects or background that would have been visible only from
`the new viewpoint but not from the original location that was
`used in capturing the original image. There is no information in
`the original image for these disoccluded regions and, therefore,
`they would appear empty, like holes, in the new virtual image.
`A simple way to ‘fill’ these holes is to map a pixel in the orig-
`inal image to several pixels in the virtual image by simple inter-
`polation of pixel information in the foreground or background.
`More complex extrapolation technique might also be used [3].
`However, these filling techniques are known to produce visible
`disocclusion artifacts in the virtual images, whose severity de-
`pends on the scene layout.
`To deal with these disocclusion artifacts in the virtual images
`several approaches have been suggested. One approach, termed
`the layered-depth-image (LDI) [7], uses a set of original im-
`ages of a scene and their associated depth maps. The images
`and depth maps store not only what is visible in the original
`image, but also what is behind the visible surface. Note that
`while this approach is likely to produce very accurate virtual
`images, it is more computationally demanding and it requires
`more bandwidth for transmission. An alternative approach in-
`volves pre-processing of the depth maps. Recently, we adopted
`this latter approach and pre-processed depth maps using a sym-
`metric 2D Gaussian filter, so that the disocclusion artifacts were
`incrementally removed as the smoothing of depth maps became
`stronger [8], [9]. Experimental results using formal subjective
`evaluation techniques indicated that this technique (symmetric
`smoothing) could be used to significantly improve the image
`quality of novel stereoscopic views especially when there are
`blocky artifacts or noise in the depth maps and potential distor-
`tions in the newly generated images as a result of disocclusion
`[8], [9]. The notion of smoothing depth maps to remove dis-
`occlusion artifacts has been advocated by other authors as well
`[10].
`In this paper, we propose a new system for stereoscopic image
`generation based on depth images to deal with the disocclusion
`artifact in virtual images [11]. Different types of artifacts and
`distortions that could appear in the virtual images are then exper-
`imentally investigated for different system parameters. Based
`
`0018-9316/$20.00 © 2005 IEEE
`
`Legend3D, Inc.
`Exhibit 1015-0001
`
`
`
`192
`
`IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 2, JUNE 2005
`
`Fig. 1. Flowchart of the proposed depth-image-based rendering system.
`
`on the investigation, we present a new concept of asymmetric
`smoothing of depth maps for DIBR that can reduce artifacts and
`distortions in the virtual images and provide an improvement in
`the image quality.
`The remaining portions of this paper are organized as follow.
`In Section II, we illustrate the proposed rendering system.
`Section III is devoted to experimental investigation with dif-
`ferent system setups. In Section IV we propose the concept of
`asymmetric smoothing of depth maps. Section V provides a
`discussion of experimental results using natural depth images.
`Conclusions can be found in Section VI.
`
`II. DEPTH-IMAGE-BASED RENDERING SYSTEM
`
`A flowchart describing the proposed depth-image-based ren-
`dering system is illustrated in Fig. 1. This system consists of
`three parts: (i) pre-processing of depth maps, (ii) 3D image
`warping and (iii) hole-filling. Note that part (iii) is not neces-
`sary if there are no holes to fill as a result of optimal pre-pro-
`cessing of depth maps. In the following, these three parts will
`be addressed in detail.
`
`A. Pre-Processing of Depth Maps
`The pre-processing of depth maps includes two issues:
`choosing the convergence distance
`(so-called zero-parallax
`setting (ZPS)) [12] and smoothing the depth maps.
`There are several methods that can be used to establish a ZPS
`[12]. In the so-called toed-in approach, the ZPS is chosen by a
`joint inward rotation of the left-eye and right-eye cameras. In
`the so-called shift-sensor approach, a plane of convergence is
`established by a small lateral shift
`of the CCD sensors in the
`pair of parallel cameras. Different from these two methods, in
`the present rendering system the ZPS is chosen by “shifting” the
`depth map. Without loss of generality, we choose
`
`Fig. 2. The test image: “Interview.” The original image is on the top and its
`associated unprocessed depth map is on the bottom. A lower luminance value
`in the depth map means that the objects are farther away from the camera.
`
`Fig. 3. Camera configuration used for generation of virtual stereoscopic
`images.
`
`. Then, depth value
`pixel
`Gaussian filter is equal to
`
`after smoothing using a
`
`(2)
`
`are the nearest clipping
`and
`as the ZPS plane, where
`plane and the farthest clipping plane of the depth map. In an
`8-bit depth map,
`and
`(Fig. 2). After that,
`the depth map is further normalized with the factor of 255, so
`that the values of the depth map lie in the interval of [ 0.5, 0.5],
`values that are required by the image-warping algorithm.
`The second issue in the pre-processing step is to smooth the
`depth maps. To this end, different filter types can be used. For
`simplicity, a Gaussian filter
`
`for
`
`(1)
`
`the stan-
`is the filter’s window size and
`is employed, where
`dard deviation. The value of
`determines the depth smoothing
`strength. Let
`be a depth value in the depth map at the
`
`It is expected that different values of
`have different
`and
`impact on the quality of the virtual images generated from the
`original center image. We will discuss this issue in the next sec-
`tion. In the following experiment, we let the filter’s window size
`be equal to
`.
`
`B. 3-D Image Warping
`For simplicity, we only consider the commonly used parallel
`camera configuration for generating virtual stereoscopic images
`from one center image associated with one depth map for 3D TV
`(Fig. 3). In this case, the vertical coordinate of the projection
`of any 3D point on each image plane of three cameras is the
`same. Let
`be the viewpoint of the original center image,
`
`Legend3D, Inc.
`Exhibit 1015-0002
`
`
`
`ZHANG AND TAM: STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV
`
`193
`
`the viewpoint of the virtual left-eye and right-eye images
`and
`to be generated.
`is the focal length of three cameras.
`is the
`baseline distance between two virtual cameras.
`Under this camera configuration, one point with the depth
`in the world (of dimensions
`,
`,
`) is projected onto the
`image plane of three cameras at pixel
`,
`and
`,
`respectively. From the geometry shown in Fig. 3, we have
`
`(3)
`
`is given in the center image
`and
`where information about
`and the associated depth map, respectively. Therefore, with
`formulation (3) for 3D image warping, the virtual left-eye and
`right-eye images can be generated from the original center
`image and its depth map by providing the value of the baseline
`distance
`and focal length . Without loss of generality we
`choose the focal length
`to be equal to one in the experiments.
`Based on the ZPS defined in Section II and pre-processing of
`depth maps, the value of the baseline distance
`also indicates
`the depth range appearing in the generated stereoscopic image
`pair. According to the image warping formulation (3), the dis-
`parity
`involved in the rendered left-eye and right-eye
`images is proportional to the baseline distance
`. A large dis-
`parity value indicates that the object point in the real world is far
`away from the ZPS, while a small value means that the object
`point is close to the ZPS.
`
`C. Disocclusion and Hole-Filling
`
`Due to a difference in viewpoints, some areas that are oc-
`cluded in the original image might become visible in the virtual
`left-eye or the right-eye images. These newly exposed areas,
`referred to as “disocclusion” in the computer graphics litera-
`ture, have no texture after 3D image warping because informa-
`tion about the disocclusion area is available neither in the center
`image nor in the accompanying depth map. We fill in the newly
`exposed areas by averaging textures from neighborhood pixels,
`and this process is called hole-filling.
`
`III. INVESTIGATION OF DIFFERENT SYSTEM SETUPS
`
`This section is devoted to investigating the performance of
`the proposed rendering system using natural images. As an ex-
`ample, only results with the test image “Interview” (Fig. 2) are
`shown. The image and its corresponding depth map were gen-
`erously supplied by Fraunhofer HHI (Heinrich-Hertz-Institut),
`Germany. In the following investigation, the distance between
`the two virtual left-eye and right-eye cameras is fixed at 48
`pixels for illustration.
`First, we investigate the performance of this system without
`smoothing the depth maps. Fig. 4 shows an example of the
`results of the virtual left-eye image. The depth map without
`pre-processing is shown in Fig. 4(a). The image after 3D image
`warping is illustrated in Fig. 4(b). In the figure, the white areas,
`i.e., the holes, are the newly exposed areas. Recall that these
`holes are produced because information about the previously
`occluded areas is available neither in the monoscopic images
`nor in the accompanying depth maps.
`
`Fig. 4. Virtual left-eye image generated without smoothing of the depth map.
`(a) Depth map without smoothing; (b) image after 3D image warping.
`White areas represent newly exposed areas; (c) image after hole-filling; (d) and
`(e) artifacts clearly seen in enlarged segments of the image from (c).
`
`From the figure, we can see that the newly exposed areas
`are located mainly along the boundaries of objects and also the
`right margin of the whole image. After hole-filling, as shown
`in Fig. 4(c), significant texture artifacts appear at object bound-
`aries in the virtual image. Fig. 4(d) and (e) show more clearly
`the artifacts by enlarging segments of Fig. 4(c).
`We then evaluate the performance of the proposed system
`with smoothing of the depth maps. Similar to [8], [9], we let
`and
`of two Gaussian filters that are applied separately
`along the vertical and horizontal directions, respectively, have
`the same value. We term this process symmetric smoothing.
`Fig. 5 shows the results of the virtual left-eye image with
`. The depth map after symmetric smoothing is shown
`in Fig. 5(a). The image after 3D image warping is illustrated
`in Fig. 5(b). White areas represent newly exposed areas. Com-
`pared to Fig. 4(b), we can see from Fig. 5(b) that the newly ex-
`posed areas along object boundaries almost disappear except for
`the area in the right margin of the whole image. This can be ex-
`plained as follows. Due to the smoothing of the depth map, there
`are no more sharp depth discontinuities. In other words, the dis-
`occlusion areas have become sparse because of smoothing and
`even disappear as the smoothing becomes stronger. Fig. 6 shows
`the relation between the newly exposed areas and the depth
`smoothing strength (as determined by ) in the virtual left-eye
`image for different baseline distances. The newly exposed areas
`are represented as the ratio of the number of newly exposed
`
`Legend3D, Inc.
`Exhibit 1015-0003
`
`
`
`194
`
`IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 2, JUNE 2005
`
`Intermediate steps in the generation of a virtual left-eye image using
`Fig. 5.
`symmetric smoothing of the depth map with = = 30. (a) Depth map
`after symmetric smoothing; (b) image after 3D image warping. White areas
`(right margin of image) are newly exposed areas; (c) image after hole-filling;
`(d) and (e) enlarged segments of the image shown in (b). Notice the curved table
`leg in (d) and the curved vertical lines in (e), even though the overall output is
`significantly better than that without smoothing [cf. Fig. 4(e)].
`
`pixels over the total number of pixels in the image. We term this
`ratio the disocclusion ratio. As the depth smoothing strength be-
`comes stronger, the disocclusion ratio decreases gradually until
`it reaches a constant value. This constant value is simply due
`to the persistence of newly exposed areas at the image mar-
`gins. Also, from Fig. 6 it can be seen that the minimum depth
`smoothing strength to reach a constant value of newly exposed
`areas is dependent on the baseline distances
`. For the test
`image “Interview,” it is approximately one quarter of the base-
`line distance.
`Comparison of Fig. 4 to Fig. 5 shows that simple hole-filling
`produces
`significant
`texture artifacts whereas
`symmetric
`smoothing virtually eliminates these artifacts. However, sym-
`metric smoothing still produces some distortion. Specifically,
`vertically straight object boundaries now can become curved,
`depending on the depth in neighboring regions. This can be
`more clearly seen in Fig. 5(d) and (e), which show enlarged
`segments of the image in Fig. 5(c). We call this type of distor-
`tion, geometric distortion. The origin of this type of distortion
`can be explained as follows. Let us examine the table leg in
`Fig. 5(d). In the unprocessed depth map, this table leg has the
`same depth value along its vertical length and, at the bottom of
`the leg, the legs of the man and the woman with relatively large
`depth are in its neighborhood (as can be seen in Fig. 2). After
`
`Fig. 6. Relation between depth smoothing strength and newly exposed areas
`as a percentage of the total area for the test image “Interview.” Three graphs are
`shown for the three baseline distances that were used: (a) 48 pixels, (b) 36 pixels,
`and (c) 20 pixels. Note that in each figure the percentage of newly exposed area
`decreases with smoothing strength.
`
`processing, due to smoothing of the depth map in the horizontal
`direction and at a level that is as strong as that in the vertical
`direction, the bottom of the table leg has a slightly larger value
`than that of its top [Fig. 5(a)]. This creates a curved table leg
`after 3D image warping.
`
`IV. ASYMMETRIC SMOOTHING OF DEPTH MAP
`
`The analysis of the underlying reason for the geometric
`distortions in the previous section suggests that the strength of
`the smoothing of depth maps in the horizontal direction should
`be less than that of the smoothing in the vertical direction, so
`that vertical objects, e.g. the table leg, have similar depth values
`throughout after depth smoothing. We call this asymmetric
`smoothing.
`The concept of asymmetric smoothing is consistent with
`known characteristics of the binocular system of the human
`eyes. The human visual system obtains depth cues from dis-
`parity mainly from horizontal differences rather than vertical
`differences between the images that are projected to the left
`and the right eyes. This allows us to filter the depth map
`stronger in the vertical than in the horizontal direction. In other
`words, we can use an asymmetric filter to smoothen the sharp
`depth changes in a manner that will overcome the disocclusion
`problem and in the meantime will still provide good, reasonable
`disparity cues.
`
`Legend3D, Inc.
`Exhibit 1015-0004
`
`
`
`ZHANG AND TAM: STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV
`
`195
`
`Fig. 7. Virtual images generated using asymmetric smoothing of the depth
`map with = 10 and = 90. (a) Depth map after asymmetric smoothing;
`(b) virtual left-eye image; (c) virtual right-eye image; (d) and (e) enlarged
`segments from the image shown in (b). Note vertical lines are now straight
`compared to Fig. 5(e).
`
`Fig. 7 shows the results of rendering using asymmetric
`and
`. The
`smoothing of the depth map with
`depth map after asymmetric smoothing is shown in Fig. 7(a).
`The virtual left-eye and right-eye images generated from the
`original center image (Fig. 2) and the processed depth map
`are shown in Fig. 7(b) and (c). Two enlarged segments from
`the left-eye image are shown in Fig. 7(d) and (e) for clarity. It
`can be seen from Fig. 7(d) and (e) that geometric distortions
`are strongly reduced compared to Fig. 5(d) and (e). Also,
`no texture artifacts appear. In general asymmetric smoothing
`results in virtual images that have sharper texture and higher
`image quality. When viewed in a stereoscopic display, they also
`create reasonably good and stable depth.
`
`V. EXPERIMENTAL RESULTS AND DISCUSSIONS
`
`To further evaluate the performance of the proposed ren-
`dering system with asymmetric smoothing, experiments with
`additional natural depth image sequences were carried out.
`
`A. Test Depth Image Sequences
`
`Samples of three additional stereo video sequences and their
`corresponding depth maps used in the experiments are shown in
`Fig. 8. From top to bottom are the test depth images: “Puppy,”
`
`Fig. 8. Sample of three image sequences and their corresponding depth maps.
`From top to bottom are “Puppy,” “Soccer,” and “Tulips.”
`
`“Soccer,” and “Tulips.” The depth maps for the first two im-
`ages were obtained from the same institution that generously
`provided the source images [Electronics and Telecommunica-
`tions Research Institute (ETRI), Korea]. These depth maps had
`8-block depth resolution and were not as stable in that there
`8
`were blocky artifacts that appeared and disappeared over time.
`The depth map of the third sequence, “Tulips,” was estimated
`using our own in-house developed software for disparity esti-
`mation [13]. It had pixel depth resolution with pel accuracy and
`was relatively stable, although it contained some inaccuracies
`appearing in the left side of the walking woman. The image size
`480 pixels.
`of all of the depth maps was 720
`
`B. Parameter Selection
`
`The depth range for the rendered virtual stereoscopic images
`was selected so that the image was comfortable to view. Several
`studies suggest [14] that the maximum depth range that is still
`comfortable for viewing is 1 disparity or approximately 5% of
`3 image at a viewing distance of
`the width of a standard 4
`4H (four times image height). Therefore, we chose the baseline
`distance of 36 pixels to render the virtual left-eye and right-eye
`images for our images, which had a spatial resolution of 720
`480. For depth smoothing strength, we chose
`to be equal
`to 9 using the empirical relation found in Section III that the
`depth smoothing strength is approximately equal to one quarter
`of the baseline distance. In the case of symmetric smoothing, the
`smoothing strength in the vertical direction was the same as that
`in the horizontal direction. In the case of asymmetric smoothing,
`the smoothing strength was chosen to be five times the value in
`the horizontal direction. In both cases, the filter’s window size
`was set to 3 times the depth smoothing strength .
`
`Legend3D, Inc.
`Exhibit 1015-0005
`
`
`
`196
`
`IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 2, JUNE 2005
`
`Fig. 9. Rendered left-eye images (with a baseline distance of 36 pixels)
`based on the depth image “Puppy.” From top to bottom are the rendered results
`with no depth smoothing, symmetric depth smoothing, and asymmetric depth
`smoothing.
`
`C. Experimental Results
`
`Figs. 9, 10 and 11 show the rendered left-eye images with
`a baseline distance of 36 pixels based on the depth images:
`“Puppy,” “Soccer,” and “Tulips.” In each figure, three images
`separately demonstrate the results obtained with no depth
`smoothing, symmetric depth smoothing and asymmetric depth
`smoothing. Figs. 12, 13 and 14 show the enlarged segments of
`the original image and the rendered images to allow compar-
`ison of the image quality in further detail. In each figure, aside
`from the original, three segments that are cut from the original
`image are shown: the rendered image without depth smoothing,
`the rendered image with symmetric depth smoothing and the
`rendered image with asymmetric depth smoothing. Comparison
`of segment (b) to segments (c) and (d) in Figs. 12–14 shows
`that texture artifacts in the rendered image are completely elim-
`inated by depth smoothing. Please compare the flower on the
`top left in Fig. 12, the “ETRI” logo in Fig. 13 and the boundary
`of the woman in Fig. 14. Comparing segment (c) to segment
`(d) in Figs. 12–14, it can be seen that geometric distortions
`are strongly reduced by asymmetric depth smoothing. Curved
`boundaries are now straight, e.g., the letter behind the flower
`on the top-left in Fig. 12 and the “ETRI” logo in Fig. 13.
`These visual comparisons indicate that with asymmetric depth
`smoothing the quality of the rendered image can be further
`
`Fig. 10. Rendered left-eye images (with a baseline distance of 36 pixels)
`based on the depth image “Soccer.” From top to bottom are the rendered results
`with no depth smoothing, symmetric depth smoothing, and asymmetric depth
`smoothing, respectively.
`
`improved compared to that obtained with symmetric depth
`smoothing.
`
`D. Subjective Evaluation
`The advantage of asymmetric smoothing over symmetric
`smoothing with respect to image quality was confirmed by a
`formal subjective assessment study [15]. Ten viewers rated
`the image quality of stereoscopic sequences in which the view
`to the one eye consisted of rendered images based on either
`symmetric or asymmetric smoothing of the depth maps; the
`other view consisted of the original images. Viewers rated
`the stereoscopic sequences using the double-stimulus con-
`tinuous-quality scale method that is a standard procedure as
`described in ITU-R Recommendation 500 [16]. Ratings were
`based on a scale of 0 to 100, ranging from “Bad” to “Excel-
`lent”. The strength of smoothing was varied at three levels of
`“None,” “Mild,” and “Strong,” with smoothing parameters as
`shown in Table I. In general, asymmetric smoothing involved
`level of vertical smoothing that was three times that in the
`horizontal direction. As shown in Table II, ratings based on
`asymmetric smoothing were higher than those based on sym-
`metric smoothing. Not shown in the table is that the depth
`
`Legend3D, Inc.
`Exhibit 1015-0006
`
`
`
`ZHANG AND TAM: STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV
`
`197
`
`image;
`Fig. 12. Enlarged segments of the image “Puppy.” (a) Original
`(b) rendered image without depth smoothing; (c) rendered image with
`symmetric depth smoothing; and (d) rendered image with asymmetric depth
`smoothing. Please compare the letter behind the flower on the top left of the
`segment.
`
`Fig. 11. Rendered left-eye images (with a baseline distance of 36 pixels)
`based on the depth image “Tulips.” From top to bottom are the rendered results
`with no depth smoothing, symmetric depth smoothing, and asymmetric depth
`smoothing, respectively.
`
`quality of the stereoscopic images were unaffected by mild
`smoothing. With strong smoothing, depth quality was reduced
`but ratings were still significantly higher than the ratings for
`nonstereoscopic reference images.
`
`E. Discussions
`Depth-image-based rendering has the inherent problem of
`having to deal with disocclusion areas. Filling in these “holes”
`so as to create new images with high image quality is not easy.
`In this paper we propose pre-processing of the depth maps to
`smooth the sharp changes in depth at object boundaries. In
`addition to ameliorating the effects of blocky artifacts and other
`distortions contained in the depth maps that might be caused
`by noise, depth estimation, or coding of the depth maps, the
`smoothing reduces or completely removes disocclusion areas
`where potential texture artifacts can arise from image warping.
`In a previous experimental study, we found that subjective
`ratings of image quality in the stereoscopic virtual views can be
`improved with symmetric smoothing [8], [9]. Results presented
`
`image;
`Fig. 13. Enlarged segments of the image “Soccer.” (a) Original
`(b) rendered image without depth smoothing; (c) rendered image with
`symmetric depth smoothing, and (d) rendered image with asymmetric depth
`smoothing. Please compare the “ETRI” logo with respect
`to geometric
`distortion.
`
`in this paper demonstrate that asymmetric smoothing provides
`a significant improvement in image quality over symmetric
`smoothing by reducing the amount of geometric distortion that
`might be present otherwise.
`In addition to improving overall image quality of the vir-
`tual views, smoothing depth maps can potentially lead to other
`benefits:
`1. Smoothing reduces the contrast of depth maps and, thus,
`narrows the range of disparities contained in the rendered
`images. This will lead to increased visual comfort for
`
`Legend3D, Inc.
`Exhibit 1015-0007
`
`
`
`198
`
`IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 2, JUNE 2005
`
`based on the previous and current studies the benefits appear to
`outweigh this disadvantage.
`
`VI. CONCLUSIONS
`
`In this paper, we propose an algorithm for depth-image-based
`generation of virtual stereoscopic images. In order to minimize
`texture artifacts appearing in the newly exposed (disocclusion)
`areas of the virtual image, smoothing of the depth maps is pro-
`posed. Experimental results indicate that symmetric smoothing
`can create geometric distortions, leading to vertical straight
`boundaries becoming curved. To reduce this distortion, asym-
`metric smoothing of depth map is proposed. We have shown
`that asymmetric smoothing is an improved technique that can
`significantly reduce geometric distortions at the same time as
`removing texture artifacts. Reasonably good depth quality can
`be maintained with this algorithm.
`The present results are significant in demonstrating that
`smoothing of depth maps is beneficial not only in alleviating
`problems from noise and blocky artifacts in depth maps but that
`smoothing reduces the percentage of disoccluded areas that are
`required to be filled in the rendering process. This is important
`because previous studies have worked on the premise that
`smoothing of areas within objects is good but that smoothing
`across the borders of objects (intra- vs. inter-regions) is some-
`thing that is to be avoided because it reduces the depth between
`objects and their background [17], [18]. For this reason, there
`are suggestions to reduce the smoothing at and around edges of
`objects where there tend to be sharp transitions in depth [19].
`In the case of depth-image-based rendering, we suggest that, in
`addition to smoothing within objects, smoothing at borders can
`be beneficial because of the resulting reduction in the size of the
`areas that have to be filled. In turn, it leads to a reduction in the
`number and visibility of potential artifacts from the rendering
`and hole-filling process.
`Finally, the present investigation has a significant implication
`for 3D-TV and other stereoscopic display systems that are based
`on depth-image-based rendering. It is often thought that the spa-
`tial resolution of depth maps should be as high as possible, so
`as to obtain rendered views of the highest quality. The present
`results suggest that this need not be the case. We have shown
`that smoothing depth maps before the rendering of new views
`(i.e., the process effectively reduces the spatial resolution of the
`depth maps) actually helps improve the image quality of the ren-
`dered images.
`
`REFERENCES
`[1] A. Redert, M. Op de Beeck, C. Fehn, W. IJsselsteijn, M. Pollefeys,
`L. Van Gool, E. Ofek, I. Sexton, and P. Surman, “ATTEST—advanced
`three-dimensional television system techniques,” in Proc. 3DPVT’ 02,
`Padova, Italy, Jun. 2002, pp. 313–319.
`[2] J. Flack, P. Harman, and S. Fox, “Low bandwidth stereoscopic image
`encoding and transmission,” in Proc.SPIE Conference on Stereoscopic
`Displays and Virtual Reality Systems X, vol. 5006, CA, U.S.A., Jan.
`2003, pp. 206–214.
`[3] C. Fehn, “Depth-image-based rendering (DIBR), compression and
`transmission for a new approach on 3D-TV,” in Proc. SPIE Conf.
`Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, CA,
`U.S.A., Jan. 2004, pp. 93–104.
`[4] M. Ziegler, L. Falkenhagen, R. Horst, and D. Kalivas, “Evolution of
`stereoscopic and three-dimensional video,” Signal Processing: Image
`Communication, vol. 14, pp. 173–194, 1998.
`
`image;
`Fig. 14. Enlarged segments of the image “Tulips.” (a) Original
`(b) rendered image without depth smoothing; (c) rendered image with
`symmetric depth smoothing; and (d) rendered image with asymmetric depth
`smoothing.
`
`TABLE I
`PARAMETERS (IN PIXELS) USED FOR SMOOTHING DEPTH MAPS IN
`SUBJECTIVE IMAGE QUALITY ASSESSMENT. H=HORIZONTAL DIRECTION,
`V=VERTICAL DIRECTION
`
`TABLE II
`MEAN RATINGS OF IMAGE QUALITY AND STANDARD ERRORS (IN
`PARENTHESES) FOR THE DIFFERENT LEVELS OF SMOOTHING, FOR BOTH
`SYMMETRIC AND ASYMMETRIC CONDITIONS. SEE MAIN TEXT FOR DETAILS.
`
`viewing virtual stereoscopic images that are rendered
`from depth maps with large disparities.
`2. Smoothing reduces the sharp transitions at the edges and
`borders of objects that are in front of a background and,
`therefore, smoothes out the depth at and near the out-
`lines of objects. Informal observations indicate that this
`removal of “crispness” at the borders of objects reduces
`chances of perceiving the “cardboard effect” in which ob-
`jects appear in depth but appear to be flat like a sheet of
`cardboard.
`Nevertheless, smoothing of depth maps, while attenuating
`some artifacts, will lead to a diminution of the depth resolu-
`tion contained in the rendered stereoscopic views. Future studies
`will be required to examine this trade-off more closely, although
`
`Legend3D, Inc.
`Exhibit 1015-0008
`
`
`
`ZHANG AND TAM: STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV
`
`199
`
`[5] Y. Luo, Z. Zhang, and P. An, “Stereo video coding based on frame es-
`timation and interpolation,” IEEE Trans. Broadcast., vol. 49, no. 1, pp.
`14–21, 2003.
`[6] H. Mitsumine, H. Noguchi, K. Enami, Y. Ninomiya, Y. Yamanoue, S.
`Yano, A. Hanazato, and M. Okui, “Virtual Museum-3-D fine art appre-
`ciation system,” IEEE Trans. Broadcast., vol. 42, no. 3, pp. 200–207,
`Sept. 1996.
`[7] J. Shade, S. Gortler, L. He, and R. Sz