throbber
Image Mosaicing for Tele-Reality
`
`Applications
`Richard Szeliski
`Digital Equipment Corporation
`Cambridge Research Lab
`
`CRL 94/2
`
`May, 1994
`
`VALEO EX. 1031_001
`
`

`

`Digital Equipment Corporation has four research facilities: the Systems Research Center and the
`Western Research Laboratory, both in Palo Alto, California; the Paris Research Laboratory, in
`Paris; and the Cambridge Research Laboratory, in Cambridge, Massachusetts.
`The Cambridge laboratory became operational in 1988 and is located at One Kendall Square,
`near MIT. CRL engages in computing research to extend the state of the computing art in areas
`likely to be important to Digital and its customers in future years. CRL’s main focus is
`applications technology; that is, the creation of knowledge and tools useful for the preparation of
`important classes of applications.
`CRL Technical Reports can be ordered by electronic mail. To receive instructions, send a mes-
`sage to one of the following addresses, with the word (cid:104)(cid:101)(cid:108)(cid:112)(cid:32) in the Subject line:
`
`On Digital’s EASYnet:
`On the Internet:
`
`CRL::TECHREPORTS
`techreports@crl.dec.com
`
`This work may not be copied or reproduced for any commercial purpose. Permission to copy without payment is
`
`granted for non-profit educational and research purposes provided all such copies include a notice that such copy-
`ing is by permission of the Cambridge Research Lab of Digital Equipment Corporation, an acknowledgment of the
`authors to the work, and all applicable portions of the copyright notice.
`
`The Digital logo is a trademark of Digital Equipment Corporation.
`
`TM
`
`Cambridge Research Laboratory
`One Kendall Square
`Cambridge, Massachusetts 02139
`
`VALEO EX. 1031_002
`
`

`

`Image Mosaicing for Tele-Reality
`
`Applications
`Richard Szeliski
`Digital Equipment Corporation
`Cambridge Research Lab
`
`CRL 94/2
`
`May, 1994
`
`Abstract
`
`While a large number of virtual reality applications, such as fluid flow analysis and molecular
`modeling, deal with simulated data, many newer applications attempt to recreate true reality as
`convincingly as possible. Building detailed models for such applications, which we call tele-reality,
`is a major bottleneck holding back their deployment.
`In this paper, we present techniques for
`automatically deriving realistic 2-D scenes and 3-D texture-mapped models from video sequences,
`which can help overcome this bottleneck. The fundamental technique we use is image mosaicing,
`i.e., the automatic alignment of multiple images into larger aggregates which are then used to
`represent portions of a 3-D scene. We begin with the easiest problems, those of flat scene and
`panoramic scene mosaicing, and progress to more complicated scenes, culminating in full 3-D
`models. We also present a number of novel applications based on tele-reality technology.
`
`Keywords:
`image mosaics, image registration, image compositing, planar scenes, panoramic
`scenes, 3-D scene recovery, motion estimation, structure from motion, virtual reality, tele-reality.
`c Digital Equipment Corporation 1994. All rights reserved.
`
`VALEO EX. 1031_003
`
`

`

`Contents
`
`Contents
`
`1 Introduction    
`
`2 Related work     
`
`3 Basic imaging equations
`
`   
`
`4 Planar image mosaicing   
`4.1 Local image registration    
`4.2 Global image registration    
`4.3 Results     
`
`5 Panoramic image mosaicing    
`5.1 Results     
`
`6 Scenes with arbitrary depth    
`6.1 Formulation    
`6.2 Results     
`
`7 Full 3-D model recovery   
`
`8 Applications    
`8.1 Planar mosaicing applications   
`8.2
`3-D model building applications   
`8.3 End-user applications
`
`  
`
`9 Discussion     
`
`10 Conclusions    
`
`A 2-D projective transformations   
`
`i
`
`1
`
`2
`
`3
`
`5
`6
`8
`8
`
`9
`11
`
`11
`13
`14
`
`14
`
`18
`18
`19
`19
`
`21
`
`22
`
`28
`
`VALEO EX. 1031_004
`
`

`

`ii
`
`List of Figures
`
`LISTOFFIGURES
`
`   
`
`Rigid, affine, and projective transformations   
`1
`2 Whiteboard image mosaic example
`3
`Panoramic image mosaic example (bookshelf and cluttered desk) 
`4
`Depth recovery example: table with stacks of papers
`5
`Depth recovery example: outdoor scene with trees
`6
`3-D model recovery example
`
` 
`
`  
`
`  
`
`4
`10
`12
`15
`16
`17
`
`VALEO EX. 1031_005
`
`

`

`1 Introduction
`
`1 Introduction
`
`1
`
`Virtual reality is currently creating a lot of excitement and interest in the computer graphics
`community. Typical virtual reality systems use immersive technologies such as head-mounted or
`stereo displays and data gloves. The intent is to convince users that they are interacting with an
`alternate physical world, and also often to give insight into computer simulations, e.g., for fluid
`flow analysis or molecular modeling [Earnshaw et al., 1993]. Realism and speed of rendering are
`therefore important issues.
`Another class of virtual reality applications attempts to recreate true reality as convincingly
`as possible. Examples of such applications include flight simulators (which were among the
`earliest uses of virtual reality), various interactive multi-player games, and medical simulators,
`and visualization tools. Since the reality being recreated is usually distant, we use the term tele-
`reality in this paper to refer to such virtual reality scenarios based on real imagery.1 Tele-reality
`applications which could be built using the techniques described in this paper include scanning
`a whiteboard in your office to obtain a high-resolution image, walkthroughs of existing buildings
`for re-modeling or selling, participating in a virtual classroom, and browsing through your local
`supermarket aisles from home (see Section 8.)
`While most focus in virtual reality today is on input/output devices and the quality and speed of
`rendering, perhaps the biggest bottleneck standing in the way of widespread tele-reality applications
`is the slow and tedious model-building process [Adam, 1993]. This also affects many other areas
`of computer graphics, such as computer animation, special effects, and CAD. For small objects,
`say 0.2–2m, laser-based scanners are a good solution. These scanners provide registered depth and
`colored texture maps, usually in either raster or cylindrical coordinates [Cyberware Laboratory Inc,
`1990; Rioux and Bird, 1993]. They have been used extensively in the entertainment industry for
`special effects and computer animation. Unfortunately, laser-based scanners are fairly expensive,
`limited in resolution (typically 512 256 pixels), and most importantly, limited in range (e.g., they
`cannot be used to scan in a building.)
`The image-based ranging techniques which we develop in this paper have the potential to
`overcome these limitations. Imagine walking through an environment such as a building interior
`and filming a video sequence of what you see. By registering and compositing the images in the
`
`1The term telepresence is also often used, especially for applications such as tele-medicine where two-way inter-
`activity (remote operation) is important [Earnshaw et al., 1993].
`
`VALEO EX. 1031_006
`
`

`

`2
`
`2 Relatedwork
`
`video together into large mosaics of the scene, image-based ranging can achieve an essentially
`unlimited resolution.2 Since the images can be acquired using any optical technology (from
`microscopy, through hand-held videocams, to satellite photography), the range or scale of the
`scene being reconstructed is not an issue. Finally, as desktop video becomes ubiquitous in our
`computing environments—initially for videoconferencing, and later as an advanced user interface
`tool [Gold, 1993; Rehg and Kanade, 1994; Blake and Isard, 1994]—image-based scene and model
`acquisition will become accessible to all users.
`
`In this paper, we develop novel techniques for extracting large 2-D textures and 3-D models
`from image sequences based on image registration and compositing techniques and also present
`some potential applications. After a review of related work (Section 2) and of the basic image
`formation equations (Section 3), we present our technique for registering pieces of a flat (planar)
`scene, which is the simplest interesting image mosaicing problem (Section 4). We then show how
`the same technique can be used to mosaic panoramic scenes obtained by rotating the camera around
`its center of projection (Section 5). Section 6 discusses how to recover depth in scenes. Section
`7 discusses the most general and difficult problem, that of building full 3-D models from multiple
`images. Section 8 presents some novel applications of the tele-reality technology developed in this
`paper. Finally, we close with a comparison of our work with previous approaches, and a discussion
`of the significance of our results.
`
`2 Related work
`
`While 3-D models acquired with laser-based scanners are now commonly used in computer ani-
`mations and for special effects, 3-D models derived directly from still or video images are still a
`rarity. One example of physically-based models derived from images are the dancing vegetables in
`the “Cooking with Kurt” video [Terzopoulos and Witkin, 1988]. Another, more recent example, is
`the 3-D model of a building extracted from a video sequences in [Azarbayejani et al., 1993], which
`was then composited with a 3-D teapot. Somewhat related are graphics techniques which extract
`camera position information from images, such as [Gleicher and Witkin, 1992].
`
`2The traditional use of the term image compositing [Porter and Duff, 1984; Foley et al., 1990] is for blending
`images which are already registered. We use the term image mosaicing in this paper for our work to avoid confusion
`with this previous work, although compositing (as in composite sketches) also seems appropriate.
`
`VALEO EX. 1031_007
`
`

`

`3 Basicimagingequations
`
`3
`
`The extraction of geometric information from multiple images has long been one of the central
`problems in computer vision [Ballard and Brown, 1982; Horn, 1986] and photogrammetry [Moffitt
`and Mikhail, 1980]. However, many of the techniques used are based on feature extraction,
`produce sparse descriptions of shape, and often require manual assistance. Certain techniques,
`such as stereo, produce depth or elevation maps which are inadequate to model true 3-D objects.
`In computer vision, the recovered geometry is normally used either for object recognition or for
`grasping and navigation (robotics). Relatively little attention has been paid to building 3-D models
`registered with intensity (texture maps), which are necessary for computer graphics and virtual
`reality applications (but see [Mann, 1993] for recent work which addresses some of the same
`problems as this paper.)
`
`In computer graphics, compositing multiple image streams together to create larger format
`(Omnimax) images is discussed in [Greene and Heckbert, 1986]. However, in this application,
`the relative position of the cameras was known in advance. The registration techniques developed
`in this paper are related to image warping [Wolberg, 1990] since once the images are registered,
`they can be warped into a common reference frame before being composited.3 While most current
`techniques require the manual specification of feature correspondences, some recent techniques
`[Beymer et al., 1993] as well as the techniques developed in this paper can be used to automate
`this process. Combinations of local image warping and compositing are now commonly used for
`special effects under the general rubric of morphing [Beier and Neely, 1992]. Recently, morphing
`based on z-buffer depths and camera motion has been applied to view interpolation as a quick
`alternative to full 3-D rendering [Chen and Williams, 1993]. The techniques we develop in Section
`6 can be used to compute the depth maps necessary for this approach directly from real-world
`image sequences.
`
`3 Basic imaging equations
`
`Many of the ideas in this paper have their simplest expression using projective geometry [Semple
`and Kneebone, 1952]. However, rather than relying on these results, we will use more standard
`methods and notations from computer graphics [Foley et al., 1990], and prove whatever simple
`
`3One can view the image registration task as an inverse warping problem, since we are given two images and asked
`to recover the unknown warping function.
`
`VALEO EX. 1031_008
`
`

`

`4
`
`3 Basicimagingequations
`
`Figure 1: Rigid, affine, and projective transformations
`
`results we require in Appendix A. Throughout, we will use homogeneous coordinates to represent
`
`points, i.e., we denote 2-D points in the image plane as    
` , with
   
` being the
`coordinates    
` have Cartesian coordinates      
` .
`
`corresponding Cartesian coordinates [Foley et al., 1990]. Similarly, 3-D points with homogeneous
`
`Using homogeneous coordinates, we can describe the class of 2-D planar transformations using
`matrix multiplication
`
`   !#"  $
`
`00
`
`10
`
`20
`
`01
`
`11
`
`21
`
` !    !
`
`02
`
`12
`
`22
`
`or
`
`%  "'&
`
`2D%
`
`(1)
`
`The simplest transformations in this general class are pure translations, followed by translations and
`rotations (rigid transforms), plus scaling (similarity transforms), affine transforms, and full projec-
`tive transforms. Figure 1 shows a square and possible rigid, affine, and projective transformations.
`2D matrix,
`
`Rigid and affine transformations have the following forms for the&
` !
`rigid-2D " 
` ! &
`affine-2D "  $
`cos( )
`sin( *+
`sin(
`cos( *-,
`with 3 and 6 degrees of freedom, respectively, while projective transformations have a general&
`similarity (7 dof), affine (12 dof), and full projective (15 dof) transforms. The&
`4Two&
`22/
`
`0
`
`0
`
`1
`
`00
`
`10
`0
`
`01
`
`11
`0
`
`02
`
`12
`1
`
`2D
`
`matrix with 8 degrees of freedom.4
`
`In 3-D, we have the same hierarchy of transformations, with rigid (6 degrees of freedom),
`3D matrices in this
`
`2D matrices are equivalent if they are scalar multiples of each other. We remove this redundancy by setting
`
`1.
`
`VALEO EX. 1031_009
`
`$
`$
`$
`$
`$
`$
`$
`$
`&
`$
`$
`$
`$
`$
`.
`

`

`4 Planarimagemosaicing
`
`5
`
`case are 4 4. Of particular interest are the rigid (Euclidean) transformation,
`
`!
`
`1
`is a 3 3 orthonormal rotation matrix and 
`where
`viewing matrix
`1 0
`0 1
`
`is a 3-D translation vector, and the 3 4
`
`0
`0
`
`0
`0
`0
`
` "   
` !
`ˆ  " 
` " 
`0 0 1
`
`which projects 3-D points through the origin onto a 2-D projection plane a distance
` along the
`axis [Foley et al., 1990]. The 3 3 ˆ matrix can be a general matrix in the case where the internal
`is always zero for central projection.
`camera parameters are unknown. The last column of 
`
"
   
` onto a 2-D screen
`The combined equations projecting a 3-D world coordinate
`location% "
    
` can thus be written as
`% " 
" &
`where &
`
`(2)
`
`(3)
`
`(4)
`
`cam
`
`

`
`This equation is valid even if the camera calibration
`cam is a 3 4 camera matrix.
`parameters and/or the camera orientation are unknown.
`
`4 Planar image mosaicing
`
`The simplest possible set of images to mosaic are pieces of a planar scene such as a document,
`whiteboard, or flat desktop. Imagine that we have a camera fixed directly over our desk. As we
`slide a document under the camera, different portions of the document are visible. Any two such
`pieces are related to each other by a translation and a rotation (2-D rigid transformation).
`Now imagine that we are scanning a whiteboard with a hand-held video camera. The class of
`transformations relating two pieces of the board is more general in this case. It is easy to see that
`this class is the family of 2-D projective transformations (just imagine how a square or grid in one
`image can appear in another.) These transformations can be computed without any knowledge of
`the internal camera calibration parameters (such as focal length or optical center) or of the relative
`camera motion between frames. The fact that 2-D projective transformations capture all such
`possible mappings is a basic result of projective geometry (see Appendix A.)
`
`VALEO EX. 1031_010
`
`

`

`6
`
`4 Planarimagemosaicing
`
`Given this knowledge, how do we go about computing the transformations relating the various
`scene pieces so that we can paste them together? A variety of techniques are possible, some more
`automated than others. For example, we could manually identify four or more corresponding points
`between the two views, which is enough information to solve for the eight unknowns in the 2-D
`projective transformation. Or we could iteratively adjust the relative positions of input images
`using either a blink comparator or transparency [Carlbom et al., 1991]. These kinds of manual
`approaches are too tedious to be useful for large tele-reality applications.
`
`4.1 Local image registration
`
`The approach we use in this paper is to directly minimize the discrepancy in intensities between
`pairs of images after applying the transformation we are recovering. This has the advantage of not
`requiring any easily identifiable feature points, and of being statistically optimal once we are in the
`vicinity of the true solution [Szeliski and Coughlan, 1994]. More concretely, suppose we describe
`our 2-D transformations as
`
`Our technique minimizes the sum of the squared intensity errors
`
`2
`
`5
`1
`
`4 $
`3 $
`1 $
`0  $
`1   " $
`7
`6 $
`7 
`6 $
` "   
  
` ) 

`
` 2 " 

`2
` 
  
` (pixels
`
 
` and
`over all corresponding pairs of pixels which are inside both images
`  into the reference frame of
`using &
`transformation &
`
with respect to the unknown motion parameters $
`7 . These are straightforward to
`0 "      
 $
`7 " )             
`
 $
`
`  " $
`
`(5)
`
`(6)
`
`which are mapped outside image boundaries do not contribute.) Once we have found the best
`1
`2D, we can warp image
`2D [Wolberg,
`1990] and then blend the two images together. To reduce visible artifacts, we weight images being
`blended together more heavily towards the center, using a bilinear weighting function.
`
`To perform the minimization, we use the Levenberg-Marquardt iterative non-linear minimiza-
`tion algorithm [Press et al., 1992]. This algorithm requires the computation of the partial derivatives
`of
`compute, i.e.,
`
`0 $
`
`(7)
`
`VALEO EX. 1031_011
`
`$
`$
`
`

`

`4.1 Localimageregistration
`
`7
`
`is the image intensity gradient of
`
`Hessian matrix
`
`and the weighted gradient vector
`
`with components
`
`(8)
`
`, where
`and then updates the motion parameter estimate
`is a time-varying stabilization parameter [Press et al., 1992]. The advantage of using Levenberg-
`Marquardt over straightforward gradient descent is that it converges in fewer iterations.
`The
`complete registration algorithm thus consists of the following steps:
`
`1
`
`(b) compute the error in intensity between the corresponding pixels
`
`(c) compute the partial derivative of
`
`as in (7);
`(d) add the pixel’s contribution to
`
`and
`
`as in (8).
`
`is the denominator in (5), and          
`
` 
`where
`at   
` . From these partials, the Levenberg-Marquardt algorithm computes an approximate
`  "  
 $ 
 $ " ) 2

 $
`by an amount D
` "

`
`1. For each pixel at location

` ,
`(a) compute its corresponding position in the other image
  
` using (5);
`
"  
  
` ) 

`
`and the intensity gradient          
` ;
`
w.r.t. the$ using
`
 $ "       $        $
`2. Solve the system of equations  
` D
` "
` "
`  D
`
`
`and update the motion estimate
`
`1
`
`.
`3. Check that the error in (6) has decreased; if not, increment
`1992]) and compute a new D
`
`,
`4. Continue iterating until the error is below a threshold or a fixed number of steps has been
`completed.
`
`
`
`(as described in [Press et al.,
`
`The steps in this algorithm are similar to the operations performed when warping images [Wolberg,
`1990; Beier and Neely, 1992], with additional operations for correcting the current warping pa-
`rameters based on the local intensity error and its gradients (similar to the process in [Gleicher and
`Witkin, 1992]). For more details on the exact implementation, see [Szeliski and Coughlan, 1994].
`
`VALEO EX. 1031_012
`
`
`
`
`
`
`

`

`8
`
`4 Planarimagemosaicing
`
`4.2 Global image registration
`
`Unfortunately, both gradient descent and Levenberg-Marquardt only find locally optimal solutions.
`If the motion between successive frames is large, we must use a different strategy to find the best
`registration. We have implemented two different techniques for dealing with this problem. The
`first technique, which is commonly used in computer vision, is hierarchical matching, which first
`registers smaller, subsampled versions of the images where the apparent motion is smaller [Quam,
`1984; Witkin et al., 1987; Bergen et al., 1992]. Motion estimates from these smaller coarser levels
`are then used to initialize motion estimates at finer levels, thereby avoiding the local minimum
`problem (see [Szeliski and Coughlan, 1994] for details.) While this technique is not guaranteed
`to find the correct registration, we have observed empirically that it works well when the initial
`misregistration is only a few pixels (the exact domain of convergence depends on the intensity
`pattern in the image.)
`
`For larger displacements, we use phase correlation [Kuglin and Hines, 1975; Brown, 1992].
`This technique estimates the 2-D translation between a pair of images by taking 2-D Fourier
`Transforms of each image, computing the phase difference at each frequency, performing an
`inverse Fourier Transform, and searching for a peak in the magnitude image. In our experiments,
`this technique has proven to work remarkably well, providing good initial guesses for image pairs
`which overlap by as little as 50%. The technique will not work, however, if the inter-frame motion
`is not mostly translational (large rotations or zooms), but this does not occur in practice with our
`current image sequences.
`
`4.3 Results
`
`To demonstrate the performance of our algorithm, we digitized an image sequence with a camera
`panning over a whiteboard. Figure 2a shows the final mosaic of the whiteboard, with the locations
`of the constituent images in the mosaic shown as black outlines. Figure 2b is a portion of the final
`mosaic without these outlines. This mosaic is 1300 2046 pixels, based on compositing 39 NTSC
`(640 480) resolution images.
`
`To compute this mosaic, we developed an interactive image mosaicing tool which lets the user
`coarsely position successive frames relative to each other. The tool also has an automatic mosaicing
`option which computes the initial rough placement of each image with respect to the previous one
`
`VALEO EX. 1031_013
`
`

`

`5 Panoramicimagemosaicing
`
`9
`
`using phase correlation. Our algorithm then refines the location of each image by minimizing (6)
`using the current mosaic thus far as
`images in Figure 2 were automatically composited without user intervention using the middle frame
`(center of the image) as the anchor image (no deformations.) As we can see, our technique works
`well on this example. Section 9 discusses some imaging artifacts which can cause difficulties.
`
`
 
` and the input frame being adjusted as
`
` 
  
` . The
`
`5 Panoramic image mosaicing
`
`Another image mosaicing problem which turns out to be equally simple to solve is panoramic
`image mosaicing. In this scenario, we rotate a camera around its optical center in order to create
`a full “viewing sphere” of images (in computer graphics, this representation is sometimes called
`an environment map [Greene, 1986].) This is similar to the action of panoramic still photographic
`cameras where the rotation of a camera on top of a tripod is mechanically coordinated with the
`film transport, the anamorphic lens used in [Lippman, 1980], and to “cinema in the round” movie
`theaters [Greene and Heckbert, 1986]. In our case, however, we can mosaic multiple 2-D images of
`arbitrary detail and resolution, and we need not know the camera motion. Examples of applications
`include constructing true scenic panoramas (say of the view at the rim of the Grand Canyon), or
`limited virtual environments (recreating a meeting room or office as seen from one location.)
`It turns out that images taken from the same viewpoint (stationary optical center) are related
`by 2-D projective transformations, just as in the planar scene case (see Appendix A for a quick
`proof.) Intuitively, we cannot tell the relative depth of points in the scene as we rotate (there is no
`motion parallax), so they may as well be located on any plane (in projective geometry, we could
`
`say they lie on the plane at infinity, " 0.) For some applications, this property is usually viewed
`
`as a deficiency, i.e., we cannot recover scene depth from rotational camera motion. For image
`mosaicing, this is actually an advantage since we just want to composite a large scene and be able
`to “look around” from a static viewpoint.
`
`More formally, the 2-D transformation denoted by&
`and ˆ  and the inter-view rotation
`ˆ 
`ˆ 
`2D "
`
`(see Appendix A.)
`
`In the case of a calibrated camera (known center of projection), this has a
`
`by
`
`2D is related to the viewing matrices ˆ
`
`1
`
`(9)
`
`VALEO EX. 1031_014
`
`&
`

`

`10
`
`5 Panoramicimagemosaicing
`
`(a)
`
`(b)
`
`Figure 2: Whiteboard image mosaic example
`(a) mosaic with component locations shown as black outlines, (b) portion of mosaic in greater
`detail.
`
`VALEO EX. 1031_015
`
`

`

`5.1 Results
`
`11
`
`particularly simple form,
`
` 10
`
`
`
` 02
`
` 12
`where
` and
`  are the scaled focal lengths in the two views, and  are the entries in the rotation
`.5 In this case, we only have to recover five independent parameters (or three if the
`
`matrix
`values are known) instead of the usual eight.
`
`2D "  00 20
`  21
` 
` 22
`   !
`
` 01
`
` 11
`
`(10)
`
`How do we represent a panoramic scene composited using our techniques? One approach is
`to divide the viewing sphere into several large, potentially overlapping regions, and to represent
`each region with a plane onto which we paste the images [Greene, 1986]. Another approach is to
`compute the relative position of each frame relative to some base frame, and to then re-compute
`and zoom
`an arbitrary view on the fly from all visible pieces, given a particular view direction
`. In our current implementation, we adopt the former strategy since it is a simple extension
`factor
`
`of the planar case.
`
`5.1 Results
`
`Figure 3 shows a mosaic of a bookshelf and cluttered desk, which is obviously a non-planar scene.
`The images were obtained by tilting and panning a video camera mounted on a tripod, without
`any special steps taken to ensure that the rotation was around the true center of projection. The
`complete scene is registered quite well.
`
`6 Scenes with arbitrary depth
`
`Mosaicing flat scenes or panoramic scenes may be interesting for certain tele-reality applications,
`e.g., transmitting whiteboard or document contents, or viewing an outdoor panorama. However,
`for most applications, we must recover the depth associated with the scene to give the illusion of a
`3-D environment, e.g., through nearby view synthesis [Chen and Williams, 1993]. Two possible
`approaches are to model the scene as piecewise-planar or to recover dense 3-D depth maps.
`
`5At large focal lengths and small rotations, M2D resembles a pure translation matrix.
`
`VALEO EX. 1031_016
`
`&
`

`

`12
`
`6 Sceneswitharbitrarydepth
`
`(a)
`
`(c)
`
`Figure 3: Panoramic image mosaic example (bookshelf and cluttered desk)
`(a) mosaic with component locations shown as black outlines, (b) portion of mosaic in greater
`detail.
`
`VALEO EX. 1031_017
`
`

`

`6.1 Formulation
`
`13
`
`The first approach is to assume that the scene is piecewise-planar, as is the case with many
`man-made environments such as building exteriors and office interiors. The image mosaicing
`technique developed in Section 4 can then be applied to each of the planar regions in the image.
`The segmentation of each image into its planar components can either be done interactively (e.g., by
`drawing the polygonal outline of each region to be registered), or automatically by associating each
`pixel with one of several global motion hypotheses [Jepson and Black, 1993; Wang and Adelson,
`1993]. Once the independent planar pieces have been composited, we could, in principle, recover
`the relative geometry of the various planes and the camera motion (see Appendix A.) However,
`rather than pursuing this approach in this paper, we will purse the second, more general solution
`
`which is to recover a full depth map, i.e., to infer the missing component associated with each
`
`pixel in a given image sequence.
`
`When the camera motion is known, the problem of depth map recovery is called stereo recon-
`struction (or multi-frame stereo if more than two views are used [Matthies et al., 1989; Okutomi
`and Kanade, 1993].) This problem has been extensively studied in photogrammetry and computer
`vision [Moffitt and Mikhail, 1980; Barnard and Fischler, 1982; Horn, 1986; Dhond and Aggarwal,
`1989]. When the camera motion is unknown, we have the more difficult structure from motion
`problem [Horn, 1986; Weng et al., 1993; Azarbayejani et al., 1993; Szeliski and Kang, 1994]. In
`this section, we present our solution to this latter problem based on recovering projective depth,
`which is particularly simple and robust and fits in well with the material already developed in this
`paper. Readers interested in alternative techniques should consult the above references.
`
`6.1 Formulation
`
`To formulate the projective structure from motion recovery problem, we note that the coordinates
`in some other frame can be written as
`
`corresponding to a pixel% with projective depth
`%  "  
"
`ˆ   " &
`ˆ 
`2D% 
`1% 
`ˆ
`ˆ 
`2D and ˆ are the computed planar projection
`, and  are defined in (2–3), and &
`where 
`,
`,
`2D and ˆ
`matrix and epipole direction (see (25) in Appendix A.) To recover the parameters in &
`for each frame along with the depth values
`
`(11)
`
`Levenberg-Marquardt algorithm as before.
`
`(which are the same for all frames), we use the same
`
`VALEO EX. 1031_018
`
`

`

`14
`
`7 Full3-Dmodelrecovery
`
`In more detail, we write the projection equation as
`
`2
`
`5
`1
`
`(12)
`
`(13)
`
`4 * 1 $
`3 $
`1 * 0 $
`0 $
`1   " $
`  " $
`7 * 2
`6 $
`7 * 2
`6 $
`We compute the partial derivatives of  and  w.r.t. the$ and* (which we concatenate into the
`) as before in (7.) We similarly compute the partials of  and  with respect to
` , i.e.,
`   "  * 0) * 2
`
`2     "  * 0) * 22
`where is the denominator in (12).
`0 * 2
`and the depth parameters   , using the
`gorithm over the motion parameters $
`
`motion vector
`
`To estimate the unknown parameters, we alternate iterations of the Levenberg-Marquardt al-
`in
`partial derivatives defined above to compute the approximate Hessian matrices
`and the weighted
`as in (8). In our current implementation, in order to reduce the total number of
`error vectors
`parameters being estimated, we represent the depth map using a tensor-product spline, and only
`recover the depth estimates at the spline control vertices (the complete depth map is available by
`interpolation) [Szeliski and Coughlan, 1994].
`
`6.2 Results
`
`Figure 4 shows an example of using our project

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket