throbber

`
`
`
`Research Assignment
`
`Converting 2D to 3D: A Survey
`
`
`
`
`
`Supervisors: Assoc. Prof. Dr. Ir. E. A. Hendriks
`
`
`Dr. Ir. P. A. Redert
`
`Information and Communication Theory Group (ICT)
`Faculty of Electrical Engineering, Mathematics and Computer Science
`Delft University of Technology, the Netherlands
`
`
`Qingqing Wei
`Student Nr: 9936241
`Email: weiqingqing@yahoo.com
`
`
`December 2005
`
`

`

`Information and Communication Theory Group
`
`
`________________________________________________________________________
`
`
`Title:
`
`Converting 2D to 3D: A Survey
`
`
`
`
`Q. Wei
`Author:
`
`E. A. Hendriks (TU Delft)
`Reviewers:
`
`P. A. Redert
`
`
`
`________________________________________________________________________
`
`
`
`
`The Digital Signal Processing Group of Philips Research
`Customer:
`_____________________________________________________________________
`
`Keywords:
`
`2D to 3D conversion, depth cue, depth map, survey, comparison,
`3D TV
`
`
`Abstract:
`
`
`
`The survey investigates the existing 2D to 3D conversion
`algorithms developed in the past 30 years by various computer
`vision research communities across the world. According to the
`depth cues on which the algorithms reply, the algorithms are
`classified into the following 12 categories: binocular disparity,
`motion, defocus, focus, silhouette, atmosphere scattering, shading,
`linear perspective, patterned texture, symmetric patterns, occlusion
`(curvature, simple transform) and statistical patterns. The survey
`describes and analyzes algorithms that use a single depth cue and
`several promising approaches using multiple cues, establishing an
`overview and evaluating its relative position in the field of
`conversion algorithms.
`________________________________________________________________________
`.
`Conclusion: The results of some 2D to 3D conversion algorithms are 3D
`coordinates of a small set of points in the images. This group of
`algorithms is less suitable for the 3D television application.
`
`The depth cues based on multiple images yield in general more
`accurate results, while the depth cues based on single still image
`are more versatile.
`
` A
`
` single solution to convert the entire class of 2D images to 3D
`models does not exist. Combing depth cues enhances the accuracy
`of the results. It has been observed that machine learning is a new
`and promising research direction in 2D to 3D conversion. And it is
`also helpful to explore the alternatives than to confine ourselves
`only in the conventional methods based on depth maps.
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`i
`
`Project:
`
`Research Assignment for Master Program Media and Knowledge
`Engineering of Delft University of Technology
`
`

`

`Information and Communication Theory Group
`
`Content
`
` 1
`
`
`
`Introduction............................................................................................................... 1
`
`2
`
`2D to 3D Conversion Algorithms............................................................................. 3
`2.1
`Binocular disparity ............................................................................................. 3
`2.2 Motion................................................................................................................. 5
`2.3
`Defocus using more than two images ................................................................. 7
`2.4
`Focus................................................................................................................... 9
`2.5
`Silhouette............................................................................................................. 9
`2.6
`Defocus using a single image ........................................................................... 11
`2.7
`Linear perspective............................................................................................. 11
`2.8
`Atmosphere scattering ...................................................................................... 12
`2.9
`Shading ............................................................................................................. 13
`2.10 Patterned texture............................................................................................... 14
`2.11 Bilateral symmetric pattern .............................................................................. 16
`2.12 Occlusions......................................................................................................... 18
`2.13 Statistical patterns ............................................................................................ 20
`2.14 Other depth cues ............................................................................................... 21
`
`3 Comparison ............................................................................................................. 22
`
`4 A new Trend: Pattern Recognition in Depth Estimation.................................... 28
`
`5 Discussion and Conclusion..................................................................................... 32
`
`6 Bibliography ............................................................................................................ 34
`
`
`
`
`
`
`
`
`
`
`• The picture on the cover is taken from http://www.ddd.com.
`
`
`
`
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`ii
`
`

`

`Information and Communication Theory Group
`
`
`1 Introduction
`
`Three-dimensional television (3D-TV) is nowadays often seen as the next major
`milestone in the ultimate visual experience of media. Although the concept of
`stereoscopy has existed for a long time, the breakthrough from conventional 2D
`broadcasting to real-time 3D broadcasting is still pending. However, in recent years, there
`has been rapid progress in the fields image capture, coding and display [1], which brings
`the realm of 3D closer to reality than ever before.
`
`The world of 3D incorporates the third dimension of depth, which can be perceived by
`the human vision in the form of binocular disparity. Human eyes are located at slightly
`different positions, and these perceive different views of the real world. The brain is then
`able to reconstruct the depth information from these different views. A 3D display takes
`advantage of this phenomenon, creating two slightly different images of every scene and
`then presenting them to the individual eyes. With an appropriate disparity and calibration
`of parameters, a correct 3D perception can be realized.
`
`An important step in any 3D system is the 3D content generation. Several special
`cameras have been designed to generate 3D model directly. For example, a stereoscopic
`dual-camera makes use of a co-planar configuration of two separate, monoscopic
`cameras, each capturing one eye’s view, and depth information is computed using
`binocular disparity. A depth-range camera is another example. It is a conventional video
`camera enhanced with an add-on laser element, which captures a normal two-dimensional
`RGB image and a corresponding depth map. A depth map is a 2D function that gives the
`depth (with respect to the viewpoint) of an object point as a function of the image
`coordinates. Usually, it is represented as a gray level image with the intensity of each
`pixel registering its depth. The laser element emits a light wall towards the real world
`scene, which hits the objects in the scene and reflected back. This is subsequently
`registered and used for the construction of a depth map.
`
`
`
`
`
`
`
`Figure 1: A 2D image and its depth map1
`All the techniques described above are used to directly generate 3D content, which
`certainly contribute to the prevalence of 3D-TV. However, the tremendous amount of
`
`1 Figure source: http://www.extra.research.philips.com/euprojects/attest/
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`1
`
`
`
`

`

`Information and Communication Theory Group
`
`
`current and past media data is in 2D format and should be possible to be viewed with a
`stereoscopic effect. This is where the 2D to 3D conversion method comes to rescue. This
`method recovers the depth information by analyzing and processing the 2D image
`structures. Figure 1 shows the typical product of 2D to 3D conversion algorithm – the
`corresponding depth map of a conventional 2D image. A diversity of 2D to 3D
`conversion algorithms has been developed by the computer vision community. Each
`algorithm has its own strengths and weaknesses. Most conversion algorithms make use of
`certain depth cues to generate depth maps. An example of depth cues is the defocus or the
`motion that could be present in the images.
`
`This survey describes and analyzes algorithms that use a single depth cue and several
`promising approaches using multiple cues, establishing an overview and evaluating its
`relative position in the field of conversion algorithms. This may therefore contribute to
`the development of novel depth cues and help to build better algorithms using combined
`depth cues.
`
`The structure of the survey is as follows. In Chapter 2, one or multiple representative
`algorithms for every individual depth cue are selected and their working principles are
`briefly reviewed. Chapter 3 gives a comparison of these algorithms in several aspects.
`Taking this evaluation into consideration, one relatively promising algorithm using
`certain investigated depth cues is chosen and described in more detail in Chapter 4. At the
`end, Chapter 5 presents the conclusion of the survey.
`
`
`
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`2
`
`

`

`Information and Communication Theory Group
`
`
`2 2D to 3D Conversion Algorithms
`
`Depending on the number of input images, we can categorize the existing conversion
`algorithms into two groups: algorithms based on two or more images and algorithms
`based on a single still image. In the first case, the two or more input images could be
`taken either by multiple fixed cameras located at different viewing angles or by a single
`camera with moving objects in the scenes. We call the depth cues used by the first group
`the multi-ocular depth cues. The second group of depth cues operates on a single still
`image, and they are referred to as the monocular depth cues. The Table 1 summarizes the
`depth cues used in 2D to 3D conversion algorithms and their representative works. A
`review of algorithms using specific depth cue is given below.
`Table 1: Depth Cues and Their Representative Algorithms
`Depth Cues
`Representative Works
`
`The Number of
`Input Images
`
`Binocular disparity
`
`Two or More
`Images
`(binocular or
`multi-ocular)
`
`Motion
`Defocus
`
`Focus
`
`One single
`image
`(monocular)
`
`Silhouette
`Defocus
`Linear perspective
`Atmosphere Scattering
`Shading
`Patterned texture
`(Incorporates relative size)
`Symmetric patterns
`Occlusion
` - Curvature
` - Single Transform
`Statistical patterns
`
`Correlation-based, feature-based correspondence; triangulation
`[2][3]
`Optical flow [2]; Factorization [10]; Kalman filter [11]
`Local image decomposition using the Hermite polynomial basis
`[4]; Inverse filtering [12]; S-Transform [13]
`A set of images of different focus level and sharpness estimation
`[5]
`Voxel-based and deformable mesh model [6]
`Second Gaussian derivative [7]
`Vanishing line detection and gradient plane assignment [8]
`Light scattering model [15]
`Energy minimization [17]
`Frontal texel [19]
`
`Combination of photometric and geometric constraints [21]
`
`Smoothing curvature and isophote [22]
`Shortest path [23]
`Color-based heuristics [8], Statistical estimators [25]
`
`2.1 Binocular disparity
`
`With two images of the same scene captured from slightly different view points, the
`binocular disparity can be utilized to recover the depth of an object. This is the main
`mechanism for depth perception. First, a set of corresponding points in the image pair are
`found. Then, by means of the triangulation method, the depth information can be
`retrieved with a high degree of accuracy (see Figure 2) when all the parameters of the
`stereo system are known. When only intrinsic camera parameters are available, the depth
`can be recovered correctly up to a scale factor. In the case when no camera parameters
`are known, the resulting depth is correct up to a projective transformation [2].
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`3
`
`

`

`Information and Communication Theory Group
`
`
`
`
`T
`d
`
`
`
`(2.1)
`
`Z
`
`=
`
`f
`
`Figure 2: Disparity
`Assume lp and rp are the projections of the 3D point P on the left image and right image;
`lO and rO are the origin of camera coordinate systems of the left and right cameras. Based
`l OOP r
`on the relationship between similar triangles (
`P p p ) and (
`,
`,
`) shown in Figure 2,
`,
`,l
`r
`
`the depth value Z of the point P can be obtained:
`
`
`
`d
`x
`x
`, which measures the difference in retinal position between
`where
`=
`−
`l
`r
`corresponding image points. The disparity value of a point is often interpreted as the
`inversed distances to the observed objects. Therefore, finding the disparity map is
`essential for the construction of the depth map.
`
`The most time-consuming aspect of depth estimation algorithms based on binocular
`disparity is the stereo correspondence problem. Stereo correspondence, also known as
`stereo matching, is one of the most active research areas in computer vision. Given an
`image point on the left image, how can one find the matching image point in the right
`image? Due to the inherent ambiguities of the image pairs such as occlusion, general
`stereo matching problem is hard to solve. Several constraints have been introduced to
`make the problem solvable. Epipolar geometry and camera calibration are the two most
`frequently used constraints. With these two constraints, image pairs can be rectified.
`Another widely accepted assumption is the photometric constraint, which states that the
`intensities of the corresponding pixels are similar to each other. The ordering constraint
`states that the order of points in the image pair is usually the same. The uniqueness
`constraint claims that each feature can have one match at most, and the smoothness
`constraint (also known as the continuity constraint) says that disparity changes smoothly
`almost everywhere. Some of these constraints are hard, like for example, the epipolar
`geometry, while others such as the smoothness constraints are soft. The taxonomy [3] of
`Scharstein and Szeliski together with their website “Middlebury stereo vision page’ [9]
`have investigated the performance of approximately 40 stereo correspondence algorithms
`running on a pair of rectified images. Different algorithms impose various sets of
`constraints.
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`4
`
`

`

`Information and Communication Theory Group
`
`
`
`
`The current stereo correspondence algorithms are based on the correlation of local
`windows, on the matching of a sparse set of image features, or on global optimization.
`When comparing the correlation between windows in the two images, the corresponding
`element is given by the window where the correlation is maximized. A traditional
`similarity measure is the sum-of squared-differences (SSD). The local algorithms
`generate a dense disparity map. Feature-based methods are conceptually very similar to
`correlation-based methods, but they only search for correspondences of a sparse set of
`image features. The similarity measure must be adapted to the type of feature used.
`Nowadays global optimization methods are becoming popular because of their good
`performance. They make explicit use of the smoothness constraints and try to find a
`disparity assignment that minimizes a global energy function. The global energy is
`typically a combination of the matching cost and the smoothness term, where the latter
`usually measures the differences between the disparities of neighboring pixels. It is the
`different minimization step used in these algorithms which differentiates them from each
`other, e.g. dynamic programming or graph cuts.
`
`2.2 Motion
`
`The relative motion between the viewing camera and the observed scene provides an
`important cue to depth perception: near objects move faster across the retina than far
`objects do. The extraction of 3D structures and the camera motion from image sequences
`is termed as structure from motion. The motion may be seen as a form of “disparity over
`time”, represented by the concept of motion field. The motion field is the 2D velocity
`vectors of the image points, induced by the relative motion between the viewing camera
`and the observed scene. The basic assumptions for structure-from-motion are that the
`objects do not deform and their movements are linear. Suppose that there is only one
`P
`X Y Z
`rigid relative motion, denoted byV , between the camera and scenes. Let
`]T
`[
`,
`,
`=
`a 3D point in the conventional camera reference frame. The relative
`motionV between P and the camera can be described as [2]:
`
`
`
`where T andω are the translational velocity vector and the angular velocity of the camera
`respectively. The connection between the depth of 3D points and its 2D motion field is
`incorporated in the basic equations of the motion field, which combines equation (2.2)
`and the knowledge of perspective projection:
`
`
`V
`
`
`
`T Pω= − − ×
`
`
`
`
`
`
`
`v
`
`x
`
`=
`
`v
`
`y
`
`=
`
`T x T f
`−
`z
`x
`Z
`T x T f
`−
`z
`y
`Z
`
`−
`
`ω ω+
`
`f
`
`y
`z
`
`y
`
`+
`
`+
`
`f
`−
`ω ω
`x
`z
`
`y
`
`−
`
`−
`
` be
`
`(2.2)
`
`(2.3)
`
`(2.4)
`
`5
`
`2
`
`2
`
`
`
`
`
`x
`
`fω
`
`y
`
`xy

`x
`f
`xy
`ω ω
`y
`y
`x
`+
`f
`f
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`

`

`Information and Communication Theory Group
`
`
`(
`∇
`
`
`)E v E
`T
`+
`t
`
`Where xv and yv are the components of motion field in x and y direction respectively; Z is
`the depth of the corresponding 3D point; and the subscripts x , y and z indicate the
`component of the x-axis, y-axis and z-axis directions. In order to solve this basic equation
`for depth values, various constraints and simplifications have been developed to lower
`the degree of freedom of the equation, which leads to the different algorithms for depth
`estimation, each suitable for solving problem in a specific domain. Some of them
`compute the motion field explicitly before recovering the depth information; others
`estimate the 3D structure directly with motion field integrated in the estimation process.
`An example of the latter is the factorization algorithm [10], where the registered
`measurement matrix, containing entries of the normalized image point coordinates over
`several video frames, is converted into a product of a shape matrix and motion matrix.
`The shape matrix registers the coordinates of the 3D object, and the motion matrix
`describes the rotation of a set of 3D points with respect to the camera. An introduction to
`explicit motion estimation methods is given below.
`
`Dominant algorithms of motion field estimation are either optical flow based or feature
`based. Optical flow, also known as apparent motion of the image brightness pattern, is
`considered to be an approximation of the motion field. Optical flow subjects to the
`constraint that apparent brightness of moving objects remains constant, described by the
`image brightness constancy equation:
`
`
`
`where it is assumed that the image brightness is a function of image coordinates and the
`time. E∇ is the spatial gradients and tE denotes the partial differentiation with respect to
`time. After computing the spatial and temporal derivatives of image brightness for a
`small N N×
`patch, we can solve (2.5) to obtain the motion field for that patch. This
`method is notorious for its noise sensitivity, which requires extra treatments such as
`tracking the motion across a long image sequence or imposing more constraints. In
`general, current optical flow methods yield dense but less accurate depth maps.
`
`Another group of motion estimation algorithms is based on tracking separate features in
`the image sequence, generating sparse depth maps. Kalman filter [11] is for example a
`frequently used technique. It is a recursive algorithm that estimates the position and
`uncertainty of moving feature points in the subsequent frame.
`
`It is worth to note that the sufficiently small average spatial disparity of corresponding
`points in consecutive frames is beneficial to the stability and robustness for the 3D
`reconstruction from the time integration of long sequences of frames. On the other hand,
`when the average disparity between frames is large, the depth reconstruction can be done
`in a way as that of binocular disparity (stereo). The motion field becomes equal to the
`stereo disparity map only if the spatial and temporal variances between frames are
`sufficiently small.
`
`
`=
`0
`
`(2.5)
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`6
`
`

`

`Information and Communication Theory Group
`
`2.3 Defocus using more than two images
`
`2
`
`1
`u
`
`1
`+ =
`v
`
`1
`f
`
`
`
`(2.6)
`
`Depth-from-defocus methods generate a depth map from the degree of blurring present in
`the images. In a thin lens system, objects that are in-focus are clearly pictured whilst
`objects at other distances are defocused, i.e. blurred. Figure 3 shows a thin lens model of
`an out-of-focus real world point P projected onto the image plane. Its corresponding
`''P with a blur
`projection is a circular blur patch with constant brightness, centered at
`radius ofσ. The blur is caused by the convolution of the ideal projected image and the
`
` ,( yxg
`
`,( yx
`x y are the coordinates of the
`camera point spread function (PSF)
`)
` where ( ,
`,
`))

`''P . It is usually assumed that
`
`,( yx
`image point
`=)
`, whereσis a constant for a given


`window, to simplify the system and Gaussian function is used to simulate the PSF:
`1
`2
`x
`y
`+−
`g x y
`e σ
`. In order to estimate the depthu , we need the following two
`( ,
`)
`=
`2

`2
`πσ
`equations. The fundamental equation of thin lenses describes the relation
`betweenu , v and f as:
`
`
`
`
`Pentland [12] has derived a relationship between the distance u (Figure 3) and the
`blurσin equation (2.7):
`
`
`fs
`⎧
`⎪ − −
`f
`s
`⎪= ⎨
`fs
`⎪
`f
`s
`⎪ − +
`⎩
`
`(2.7)
`
`if u
`
`>
`
`v
`
`if u
`
`<
`
`v
`
`
`
`σ σ
`
`kf
`
`kf
`
`
`
`u
`
`
`where u is the depth,v is the distance between the lens and the position of the perfect
`focus, s is the distance between the lens and the image plane, f is the focal length of the
`lens, and k is a constant determined by the lens system. Of these, s , f and k are camera
`parameters, which can be determined by camera calibration. Please note that the second
`f
`u
`f
`2
`case u
`v< is possible to happen, for example, when
`, based on the
`< <
`f>
`v
`2
`, which yields thus u
`fundamental equation of thin lenses (2.6), we can obtain
`With equation (2.7), the problem of computing depth u is converted into a task of
`estimating camera parameters and the blur parameterσ. When camera parameters can be
`obtained from camera calibration, the depthu can be computed from equation (2.7) once
`the blur parameterσ is known. The depth-from-focus algorithms focus thus on the blur
`radius estimation techniques.
`
`v< .
`
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`7
`
`

`

`Information and Communication Theory Group
`
`
`
`
`Figure 3: Thin lens model2
`Equation (2.7) indicates also when blur radiusσand other camera parameters except the
`focal length f are known, the depth u cannot be exactly determined. With 2 unknowns –
`u and f , equation (2.7) is under-constrained. In this case, the output signal can be a
`projection of an out-of-focus step edge, an in-focus smooth transition (e.g. a smooth
`texture) or infinite situations in between these two extremes [7]. This causes ambiguity
`when estimating the blur parameter. To tackle the problem, most of the depth- from-
`defocus algorithms reply on two or more images of the same scene taken from the same
`position with different camera focal settings to determine the blue radius. Once the blur
`radius is estimated and camera parameters are obtained from camera calibration, the
`depth can be computed by Equation (2.7).
`
`The blur radius estimation techniques are based on, for example, inverse filtering [12],
`where the blur is estimated by solving a linear regression problem, or on S-Transform
`[13], which involves spatial domain convolution/de-convolution transform. Another
`example is the approach proposed by Ziou, Wang and Vaillancourt. It relies on a local
`image decomposition technique using the Hermite polynomial basis [4]. It is based on the
`fact that the depth can be computed once the camera parameters are available and the blur
`difference between two images, taken with different focal lengths, is known. The blur
`difference is retrieved by solving a set of equations, derived from the observation that the
`coefficients of the Hermite polynomial estimated from the more blurred image is a
`function of the partial derivation of the less blurred image and the blur difference.
`
`
`2 Figure source: reference [7]
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`8
`
`

`

`Information and Communication Theory Group
`
`2.4 Focus
`
`The depth-from-focus approach is closely related to the family of algorithms using depth
`from defocus. The main difference is that the depth-from-focus requires a series of
`images of the scene with different focus levels by varying and registering the distance
`between the camera and the scene, while depth-from-defocus only needs 2 or more
`images with fixed object and camera positions and use different camera focal settings.
`Figure 4 illustrates the principle of the depth-from-focus approach [5]. An object with an
`arbitrary surface is placed at the translational stage, which moves towards the camera
`(optics) starting from the reference plane. The focused plane is defined by the optics. It is
`located at the position where all points on it are focused on the camera sensor plane. Let
`‘s’ be a surface point on the object. When moving the stage towards the focused plane,
`the images of ‘s’ become more and more focused and will obtain its maximum sharpness
`when ‘s’ reaches the focused plane. After this, moving ‘s’ furthermore makes its image
`defocused again. During this process, the displacements of the translational stage are
`d
`when ‘s’ is maximally focused and
`registered. If we assume that the displacement is
`foused
`fd , then the depth
`the distance between the ‘focused plane’ and the reference plane is
`d
`d
`d
`value of ‘s’ relative to the stage will be determined as s
`. Applying this
`=
`−
`f
`focused
`same procedure for all surface elements and interpolating the focus measures, a dense
`depth map can be constructed.
`
`Figure 4: Depth from focus3
`
`
`
`2.5 Silhouette
`
`A silhouette of an object in an image refers to the contour separating the object from the
`background. Shape-from-silhouette methods require multiple views of the scene taken by
`cameras from different viewpoints. Such a process together with correct texturing
`
`
`3 Figure source: reference [5]
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`9
`
`

`

`Information and Communication Theory Group
`
`
`generates a full 3D model of the objects in the scene, allowing viewers to observe a live
`scene from an arbitrary viewpoint.
`
`Shape-from-silhouette requires accurate camera calibration. For each image, the
`silhouette of the target objects is segmented using background subtraction. The retrieved
`silhouettes are back projected to a common 3D space (see Figure 5) with projection
`centers equal to the camera locations. Back-projecting a silhouette produces a cone-like
`volume. The intersection of all the cones forms the visual hull of the target 3D object,
`which is often processed in the voxel representation. This 3D reconstruction procedure is
`referred to as shape-from-silhouette.
`
`
`
`
`Figure 5: Silhouette volume intersection4
`Matsuyama [6] proposed an approach using parallel computing via a PC cluster system.
`Instead of computing the intersection of 3D cones directly, the 3D voxel space is
`partitioned into a group of parallel planes. Each PC is assigned a task to compute the
`cross section of the 3D object volume on one specific plane. By stacking up such cross
`sections, the voxel representation of the 3D object shape is reconstructed. In this way, the
`3D volume intersection problem is decomposed into 2D intersection computation sub-
`problems which are concurrently carried out by all PCs. This leads to a promising speed
`gain. Furthermore, in order to capture the 3D object accurately, Matsuyama introduced a
`deformable mesh model, converting the 3D voxel volume into a surface mesh composed
`of triangular patches. According to a set of constraints, the surface mesh is deformed to
`fit the object surface. An example of the constraints is the 3D motion flow constraint,
`which requests that the mesh be adapted dynamically in conformity with object actions.
`
` shape-from-silhouette algorithm is often followed by a texturing algorithm. The visual
`hull is a geometry that encloses the captured object, but it does not capture the concave
`portion of the object that is not visible on the silhouette. Moreover, the number of views
`is often limited to make the processing time reasonable. This leads to a coarse geometry
`of the visual hull. Texturing assigns colors to the voxels on the surface of the visual hull
`and is therefore an indispensable step in creating realistic renderings.
`
`
` A
`
`
`4 Figure source: reference [6]
`Faculty of Electrical Engineering, Mathematics and Computer Science
`
`
`10
`
`

`

`Information and Communication Theory Group
`
`2.6 Defocus using a single image
`
`In section 2.3, depth-from-defocus algorithms based on two or more images are
`introduced. The reason for using more images is to eliminate the ambiguity in blur radius
`estimation when the focal setting of the camera is unknown. The images, with which this
`group of algorithms works, are required to be taken from a fixed camera position and
`object position but using different focal settings. However, only a small number of 2D
`video materials satisfy this condition. For example, the focus settings are changed when it
`is necessary to redirect the audience’s attention from foreground to background or vice
`verse. To make defocus as a depth cue suitable for conventional video contents, where we
`do not have control of the focal settings of the camera, Wong and Ernst [7] have
`proposed a blur estimation technique using a single image based on the second derivative
`of a Gaussian filter [14]. When filtering an edge of blur radiusσwith a second derivative
`of a Gaussian filter of certain variance s , the response has a positive and a negative peak.
`Denote the distance between the peaks as d , which can be measured directly from the
`s
`filtered image. The blur radius is computed according to the formula
`(see
`2( )d
`2
`2
`2
`σ =
`−
`Figure 6). With the estimated blur radius and the camera parameters obtained from
`camera calibration, a depth map can be generated that is based on equation (2.7). When
`the camera parameters are unknown, we can still estimate the relative depth level of each
`pixel based on its estimated blur radius by mapping a large blur value into a higher depth
`level and a smaller blur value to a lower depth level.
`
`
`
`Figure 6: Blur radius estimation: (a) Blurred edge; (b) Second derivative of Gaussian filter with
`standard derivation s; (c) Filter response, the distance d between peaks can be measured from the
`filtered image5
`
`2.7 Linear perspective
`
`Linear perspective refers to the fact that parallel lines, such as railroad tracks, appear to
`converge with distance, eventually reaching a vanishing point at the horizon. The more
`the lines converge, the farther away they appear to be. A recent representative work is the
`gradient plane assignment approach proposed by Battiato, Curti et al.. [8]. Their method
`performs well for single images containing sufficient objects of a rigid and geometric
`appearance. First, edge detection is employed to locate the predominant lines in the
`image. Then, the intersection points of these lines are determined. The intersection with
`
`
`5 Figure source: reference [7]
`Faculty of El

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket