throbber
Image Mosaicing for Tele-Reality
`
`Applications
`
`Richard Szeliski
`
`Digital Equipment Corporation
`
`Cambridge Research Lab
`
`CRL 94/2
`
`May, 1994
`
`VALEO EXHIBIT 1029
`Valeo v. Magna
`IPR2015-____
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:20)
`
`

`

`Image Mosaicing for Tele-Reality
`
`Applications
`
`Richard Szeliski
`
`Digital Equipment Corporation
`
`Cambridge Research Lab
`
`CRL 94/2
`
`May, 1994
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:21)
`
`

`

`Digital Equipment Corporation has four research facilities: the Systems Research Center and the
`Western Research Laboratory, both in Palo Alto, California; the Paris Research Laboratory, in
`Paris; and the Cambridge Research Laboratory, in Cambridge, Massachusetts.
`
`The Cambridge laboratory became operational in 1988 and is located at One Kendall Square,
`near MIT. CRL engages in computing research to extend the state of the computing art in areas
`likely to be important to Digital and its customers in future years. CRL’s main focus is
`applications technology; that is, the creation of knowledge and tools useful for the preparation of
`important classes of applications.
`
`CRL Technical Reports can be ordered by electronic mail. To receive instructions, send a mes-
`sage to one of the following addresses, with the word help in the Subject line:
`
`On Digital’s EASYnet:
`On the Internet:
`
`CRL::TECHREPORTS
`techreports@crl.dec.com
`
`This work may not be copied or reproduced for any commercial purpose. Permission to copy without payment is
`granted for non-profit educational and research purposes provided all such copies include a notice that such copy-
`ing is by permission of the Cambridge Research Lab of Digital Equipment Corporation, an acknowledgment of the
`authors to the work, and all applicable portions of the copyright notice.
`
`The Digital logo is a trademark of Digital Equipment Corporation.
`
`TM
`
`Cambridge Research Laboratory
`One Kendall Square
`Cambridge, Massachusetts 02139
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:22)
`
`

`

`Image Mosaicing for Tele-Reality
`
`Applications
`
`Richard Szeliski
`
`Digital Equipment Corporation
`
`Cambridge Research Lab
`
`CRL 94/2
`
`May, 1994
`
`Abstract
`
`While a large number of virtual reality applications, such as fluid flow analysis and molecular
`
`modeling, deal with simulated data, many newer applications attempt to recreate true reality as
`
`convincingly as possible. Building detailed models for such applications, which we call tele-reality,
`
`is a major bottleneck holding back their deployment.
`
`In this paper, we present techniques for
`
`automatically deriving realistic 2-D scenes and 3-D texture-mapped models from video sequences,
`
`which can help overcome this bottleneck. The fundamental technique we use is image mosaicing,
`
`i.e., the automatic alignment of multiple images into larger aggregates which are then used to
`
`represent portions of a 3-D scene. We begin with the easiest problems, those of flat scene and
`
`panoramic scene mosaicing, and progress to more complicated scenes, culminating in full 3-D
`
`models. We also present a number of novel applications based on tele-reality technology.
`
`Keywords:
`
`image mosaics, image registration, image compositing, planar scenes, panoramic
`
`scenes, 3-D scene recovery, motion estimation, structure from motion, virtual reality, tele-reality.
`
`c Digital Equipment Corporation 1994. All rights reserved.
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:23)
`
`

`

`Contents
`
`Contents
`
`1 Introduction ✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`2 Related work ✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎
`
`3 Basic imaging equations
`
`✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎
`
`4 Planar image mosaicing ✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`4.1 Local image registration ✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄
`
`4.2 Global image registration ✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄
`
`4.3 Results ✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`5 Panoramic image mosaicing ✂✄✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎
`
`5.1 Results ✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`6 Scenes with arbitrary depth ✂✄✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎
`
`6.1 Formulation ✂☎✂☎✂☎✂✄✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎
`
`6.2 Results ✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`7 Full 3-D model recovery ✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`8 Applications ✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`8.1 Planar mosaicing applications ✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`8.2
`
`3-D model building applications ☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`8.3 End-user applications
`
`☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`9 Discussion ☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`10 Conclusions ✁✂✄✁✄✂☎✂☎✂☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`A 2-D projective transformations ☎✂✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`i
`
`1
`
`2
`
`3
`
`5
`
`6
`
`8
`
`8
`
`9
`
`11
`
`11
`
`13
`
`14
`
`14
`
`18
`
`18
`
`19
`
`19
`
`21
`
`22
`
`28
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:24)
`
`

`

`ii
`
`List of Figures
`
`LISTOF FIGURES
`
`1
`
`Rigid, affine, and projective transformations ✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`2 Whiteboard image mosaic example
`
`☎✂☎✂✄✁✄✂✁✄✂☎✂☎✂☎✂✄✁✂✄✁✄
`
`3
`
`4
`
`5
`
`6
`
`Panoramic image mosaic example (bookshelf and cluttered desk) ✄✂☎✂✂☎✂☎
`
`Depth recovery example: table with stacks of papers
`
`☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`Depth recovery example: outdoor scene with trees
`
`✂☎✂☎✂☎✂✄✁✄✂✁✄✂☎
`
`3-D model recovery example
`
`✂☎✂✄✁✄✂☎✂☎✂✂☎✂☎✂✄✁✄✂☎✂✂☎✂☎
`
`4
`
`10
`
`12
`
`15
`
`16
`
`17
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:25)
`
`

`

`1 Introduction
`
`1 Introduction
`
`1
`
`Virtual reality is currently creating a lot of excitement and interest in the computer graphics
`
`community. Typical virtual reality systems use immersive technologies such as head-mounted or
`
`stereo displays and data gloves. The intent is to convince users that they are interacting with an
`
`alternate physical world, and also often to give insight into computer simulations, e.g., for fluid
`
`flow analysis or molecular modeling [Earnshaw et al., 1993]. Realism and speed of rendering are
`
`therefore important issues.
`
`Another class of virtual reality applications attempts to recreate true reality as convincingly
`
`as possible. Examples of such applications include flight simulators (which were among the
`
`earliest uses of virtual reality), various interactive multi-player games, and medical simulators,
`
`and visualization tools. Since the reality being recreated is usually distant, we use the term tele-
`
`reality in this paper to refer to such virtual reality scenarios based on real imagery.1 Tele-reality
`
`applications which could be built using the techniques described in this paper include scanning
`
`a whiteboard in your office to obtain a high-resolution image, walkthroughs of existing buildings
`
`for re-modeling or selling, participating in a virtual classroom, and browsing through your local
`
`supermarket aisles from home (see Section 8.)
`
`While most focus in virtual reality today is on input/output devices and the quality and speed of
`
`rendering, perhaps the biggest bottleneck standing in the way of widespread tele-reality applications
`
`is the slow and tedious model-building process [Adam, 1993]. This also affects many other areas
`
`of computer graphics, such as computer animation, special effects, and CAD. For small objects,
`
`say 0.2–2m, laser-based scanners are a good solution. These scanners provide registered depth and
`
`colored texture maps, usually in either raster or cylindrical coordinates [Cyberware Laboratory Inc,
`
`1990; Rioux and Bird, 1993]. They have been used extensively in the entertainment industry for
`
`special effects and computer animation. Unfortunately, laser-based scanners are fairly expensive,
`
`limited in resolution (typically 512 256 pixels), and most importantly, limited in range (e.g., they
`
`cannot be used to scan in a building.)
`
`The image-based ranging techniques which we develop in this paper have the potential to
`
`overcome these limitations. Imagine walking through an environment such as a building interior
`
`and filming a video sequence of what you see. By registering and compositing the images in the
`
`1The term telepresence is also often used, especially for applications such as tele-medicine where two-way inter-
`
`activity (remote operation) is important [Earnshaw et al., 1993].
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:26)
`
`

`

`2
`
`2 Related work
`
`video together into large mosaics of the scene, image-based ranging can achieve an essentially
`
`unlimited resolution.2 Since the images can be acquired using any optical technology (from
`
`microscopy, through hand-held videocams, to satellite photography), the range or scale of the
`
`scene being reconstructed is not an issue. Finally, as desktop video becomes ubiquitous in our
`
`computing environments—initially for videoconferencing, and later as an advanced user interface
`
`tool [Gold, 1993; Rehg and Kanade, 1994; Blake and Isard, 1994]—image-based scene and model
`
`acquisition will become accessible to all users.
`
`In this paper, we develop novel techniques for extracting large 2-D textures and 3-D models
`
`from image sequences based on image registration and compositing techniques and also present
`
`some potential applications. After a review of related work (Section 2) and of the basic image
`
`formation equations (Section 3), we present our technique for registering pieces of a flat (planar)
`
`scene, which is the simplest interesting image mosaicing problem (Section 4). We then show how
`
`the same technique can be used to mosaic panoramic scenes obtained by rotating the camera around
`
`its center of projection (Section 5). Section 6 discusses how to recover depth in scenes. Section
`
`7 discusses the most general and difficult problem, that of building full 3-D models from multiple
`
`images. Section 8 presents some novel applications of the tele-reality technology developed in this
`
`paper. Finally, we close with a comparison of our work with previous approaches, and a discussion
`
`of the significance of our results.
`
`2 Related work
`
`While 3-D models acquired with laser-based scanners are now commonly used in computer ani-
`
`mations and for special effects, 3-D models derived directly from still or video images are still a
`
`rarity. One example of physically-based models derived from images are the dancing vegetables in
`
`the “Cooking with Kurt” video [Terzopoulos and Witkin, 1988]. Another, more recent example, is
`
`the 3-D model of a building extracted from a video sequences in [Azarbayejani et al., 1993], which
`
`was then composited with a 3-D teapot. Somewhat related are graphics techniques which extract
`
`camera position information from images, such as [Gleicher and Witkin, 1992].
`
`2The traditional use of the term image compositing [Porter and Duff, 1984; Foley et al., 1990] is for blending
`
`images which are already registered. We use the term image mosaicing in this paper for our work to avoid confusion
`
`with this previous work, although compositing (as in composite sketches) also seems appropriate.
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:27)
`
`

`

`3 Basic imagingequations
`
`3
`
`The extraction of geometric information from multiple images has long been one of the central
`
`problems in computer vision [Ballard and Brown, 1982; Horn, 1986] and photogrammetry [Moffitt
`
`and Mikhail, 1980]. However, many of the techniques used are based on feature extraction,
`
`produce sparse descriptions of shape, and often require manual assistance. Certain techniques,
`
`such as stereo, produce depth or elevation maps which are inadequate to model true 3-D objects.
`
`In computer vision, the recovered geometry is normally used either for object recognition or for
`
`grasping and navigation (robotics). Relatively little attention has been paid to building 3-D models
`
`registered with intensity (texture maps), which are necessary for computer graphics and virtual
`
`reality applications (but see [Mann, 1993] for recent work which addresses some of the same
`
`problems as this paper.)
`
`In computer graphics, compositing multiple image streams together to create larger format
`
`(Omnimax) images is discussed in [Greene and Heckbert, 1986]. However, in this application,
`
`the relative position of the cameras was known in advance. The registration techniques developed
`
`in this paper are related to image warping [Wolberg, 1990] since once the images are registered,
`
`they can be warped into a common reference frame before being composited.3 While most current
`
`techniques require the manual specification of feature correspondences, some recent techniques
`
`[Beymer et al., 1993] as well as the techniques developed in this paper can be used to automate
`
`this process. Combinations of local image warping and compositing are now commonly used for
`
`special effects under the general rubric of morphing [Beier and Neely, 1992]. Recently, morphing
`
`based on z-buffer depths and camera motion has been applied to view interpolation as a quick
`
`alternative to full 3-D rendering [Chen and Williams, 1993]. The techniques we develop in Section
`
`6 can be used to compute the depth maps necessary for this approach directly from real-world
`
`image sequences.
`
`3 Basic imaging equations
`
`Many of the ideas in this paper have their simplest expression using projective geometry [Semple
`
`and Kneebone, 1952]. However, rather than relying on these results, we will use more standard
`
`methods and notations from computer graphics [Foley et al., 1990], and prove whatever simple
`
`3One can view the image registration task as an inverse warping problem, since we are given two images and asked
`
`to recover the unknown warping function.
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:19)(cid:28)
`
`

`

`4
`
`3 Basic imagingequations
`
`Figure 1: Rigid, affine, and projective transformations
`
`results we require in Appendix A. Throughout, we will use homogeneous coordinates to represent
`
`points, i.e., we denote 2-D points in the image plane as ✂✁☎✄✝✆✞✄✠✟☛✡ , with ☞✁✍✌✎✟✏✄✠✆✑✌✎✟☛✡ being the
`coordinates✂✁☎✄✝✆✞✄✓✒✔✄✝✟✕✡ have Cartesian coordinates✂✁✍✌✎✟✏✄✠✆✍✌✖✟✗✄✘✒✙✌✎✟☛✡ .
`
`corresponding Cartesian coordinates [Foley et al., 1990]. Similarly, 3-D points with homogeneous
`
`Using homogeneous coordinates, we can describe the class of 2-D planar transformations using
`
`matrix multiplication
`
`✚✛✛✛✜ ✁✔✢✆✔✢✟✣✢ ✤✥✥✥✦★✧ ✚✛✛✛✜ ✩
`
`00
`
`10
`
`20
`
`01
`
`11
`
`21
`
`✤✥✥✥✦ ✚✛✛✛✜ ✁✆✟ ✤✥✥✥✦
`
`02
`
`12
`
`22
`
`or
`
`✪ ✢ ✧✬✫
`
`2D✪
`
`(1)
`
`The simplest transformations in this general class are pure translations, followed by translations and
`
`rotations (rigid transforms), plus scaling (similarity transforms), affine transforms, and full projec-
`
`tive transforms. Figure 1 shows a square and possible rigid, affine, and projective transformations.
`
`0
`
`0
`
`1
`
`matrix with 8 degrees of freedom.4
`
`2D matrix,
`
`00
`
`10
`
`0
`
`01
`
`11
`
`02
`
`12
`
`0
`
`1
`
`2D
`
`Rigid and affine transformations have the following forms for the✫
`rigid-2D ✧ ✚✛✛✛✜
`affine-2D ✧ ✚✛✛✛✜ ✩
`✤✥✥✥✦ ✄
`✤✥✥✥✦ ✄ ✫
`cos✭ ✮
`sin✭ ✯✝✰
`sin✭
`cos✭ ✯✲✱
`with 3 and 6 degrees of freedom, respectively, while projective transformations have a general✫
`similarity (7 dof), affine (12 dof), and full projective (15 dof) transforms. The✫
`4Two✫
`22✴
`
`In 3-D, we have the same hierarchy of transformations, with rigid (6 degrees of freedom),
`
`3D matrices in this
`
`2D matrices are equivalent if they are scalar multiples of each other. We remove this redundancy by setting
`
`1.
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:20)(cid:19)
`
`✩
`✩
`✩
`✩
`✩
`✩
`✩
`✩
`✫
`✩
`✩
`✩
`✩
`✩
`✳
`

`

` ✧ ✚✜ ✁ ✂✄✆☎
`✤✥✥✥✦ ✄
`ˆ✝ ✄✠✟ ✧ ✚✛✛✛✜
`✝ ✧ ✞
`0 0 1✌☛✡
`which projects 3-D points through the origin onto a 2-D projection plane a distance ✡ along the✒
`axis [Foley et al., 1990]. The 3 3 ˆ✝ matrix can be a general matrix in the case where the internal
`camera parameters are unknown. The last column of ✝
`☞ ✧ ☞✁✞✄✠✆ ✄✓✒✔✄✠✟☛✡ onto a 2-D screen
`
`1 0
`
`0 1
`
`0
`
`0
`
`0
`
`0
`
`0
`
`is always zero for central projection.
`
`The combined equations projecting a 3-D world coordinate
`
`location✪ ✧ ☞✁ ✢✄✠✆ ✢✄✠✟ ✢✡ can thus be written as
`✪ ✧ ✝✌ ☞ ✧ ✫
`where ✫
`
`(2)
`
`(3)
`
`(4)
`
`4 Planarimage mosaicing
`
`5
`
`case are 4 4. Of particular interest are the rigid (Euclidean) transformation,
`
`✤✦ ✄
`
`1
`
`where ✁
`
`is a 3 3 orthonormal rotation matrix and ✂
`
`viewing matrix
`
`is a 3-D translation vector, and the 3 4
`
`cam☞ ✄
`
`cam is a 3 4 camera matrix.
`
`This equation is valid even if the camera calibration
`
`parameters and/or the camera orientation are unknown.
`
`4 Planar image mosaicing
`
`The simplest possible set of images to mosaic are pieces of a planar scene such as a document,
`
`whiteboard, or flat desktop. Imagine that we have a camera fixed directly over our desk. As we
`
`slide a document under the camera, different portions of the document are visible. Any two such
`
`pieces are related to each other by a translation and a rotation (2-D rigid transformation).
`
`Now imagine that we are scanning a whiteboard with a hand-held video camera. The class of
`
`transformations relating two pieces of the board is more general in this case. It is easy to see that
`
`this class is the family of 2-D projective transformations (just imagine how a square or grid in one
`
`image can appear in another.) These transformations can be computed without any knowledge of
`
`the internal camera calibration parameters (such as focal length or optical center) or of the relative
`
`camera motion between frames. The fact that 2-D projective transformations capture all such
`
`possible mappings is a basic result of projective geometry (see Appendix A.)
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:20)(cid:20)
`
`

`

`6
`
`4 Planar imagemosaicing
`
`Given this knowledge, how do we go about computing the transformations relating the various
`
`scene pieces so that we can paste them together? A variety of techniques are possible, some more
`
`automated than others. For example, we could manually identify four or more corresponding points
`
`between the two views, which is enough information to solve for the eight unknowns in the 2-D
`
`projective transformation. Or we could iteratively adjust the relative positions of input images
`
`using either a blink comparator or transparency [Carlbom et al., 1991]. These kinds of manual
`
`approaches are too tedious to be useful for large tele-reality applications.
`
`4.1 Local image registration
`
`The approach we use in this paper is to directly minimize the discrepancy in intensities between
`
`pairs of images after applying the transformation we are recovering. This has the advantage of not
`
`requiring any easily identifiable feature points, and of being statistically optimal once we are in the
`
`vicinity of the true solution [Szeliski and Coughlan, 1994]. More concretely, suppose we describe
`
`our 2-D transformations as
`
`Our technique minimizes the sum of the squared intensity errors
`
`2
`
`5
`1
`
`✁ ✢ ✧ ✩
`
`(5)
`
`(6)
`
`4✆ ✄✁ ✩
`3✁ ✄✁ ✩
`1✆ ✄✁ ✩
`0✁ ✂✁ ✩
`1 ✄☎✆ ✢ ✧ ✩
`7✆ ✄✁
`6✁ ✄✁ ✩
`7✆ ✂✁
`6✁ ✄✁ ✩
`☎ ✧ ✆✞✝✠✟ ✢☞✁ ✢ ✄✠✆ ✢ ✡ ✮ ✟ ☞✁ ✄✠✆ ✡☛✡ 2 ✧ ✆✌☞
`2
`✟ ✢☞✁✔✢✄✠✆ ✢✡ (pixels
`✟ ☞✁✞✄✠✆✑✡ and
`over all corresponding pairs of pixels✍ which are inside both images
`✟ ✢ into the reference frame of
`using ✫✏✎
`transformation✫
`☞ with respect to the unknown motion parameters✑ ✩
`7✒ . These are straightforward to
`7 ✧ ✮ ✆ ✔ ✕ ✁ ✢✓ ✟ ✢✓ ✁ ✢ ✁ ✆ ✢✓ ✟ ✢✓ ✆ ✢✗✖ ✄
`0 ✧ ✁ ✔ ✓ ✟ ✢✓ ✁ ✢ ✄ ✄ ✓ ☞✓ ✩
`✓ ☞✓ ✩
`
`which are mapped outside image boundaries do not contribute.) Once we have found the best
`
`2D, we can warp image
`
`1
`2D [Wolberg,
`
`1990] and then blend the two images together. To reduce visible artifacts, we weight images being
`
`blended together more heavily towards the center, using a bilinear weighting function.
`
`To perform the minimization, we use the Levenberg-Marquardt iterative non-linear minimiza-
`
`tion algorithm [Press et al., 1992]. This algorithm requires the computation of the partial derivatives
`
`of
`
`compute, i.e.,
`
`0 ✩
`
`(7)
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:20)(cid:21)
`
`✩
`✩
`
`✟
`

`

`4.1 Local imageregistration
`
`7
`
`is the image intensity gradient of
`
`Hessian matrix
`
`and the weighted gradient vector
`
`with components
`
`(8)
`
`and then updates the motion parameter estimate
`
`1
`
`, where ☛
`
`is a time-varying stabilization parameter [Press et al., 1992]. The advantage of using Levenberg-
`
`Marquardt over straightforward gradient descent is that it converges in fewer iterations.
`
`The
`
`complete registration algorithm thus consists of the following steps:
`
`(b) compute the error in intensity between the corresponding pixels
`
`(c) compute the partial derivative of
`
`as in (7);
`(d) add the pixel’s contribution to
`
`and
`
`as in (8).
`
`.
`
`and update the motion estimate
`
`1
`
`✡✒✑✔✓✖✕
`
`is the denominator in (5), and ✓ ✟ ✢✌✓ ✁✔✢✄✓ ✟ ✢✌✓ ✆✔✢✡
`✟ ✢
`where✔
`at ✂✁ ✢ ✄✠✆✔✢ ✡ . From these partials, the Levenberg-Marquardt algorithm computes an approximate
`✂☎✄✝✆ ✧ ✆ ✓ ☞✓ ✩✞✄ ✓ ☞✓ ✩✟✆ ✄ ✠ ✄ ✧ ✮ 2✆ ☞ ✓ ☞✓ ✩✟✄ ✄
`by an amount ∆✡ ✧ ✁☞☛✍✌ ✡✎
`1. For each pixel✍ at location☞✁ ✄✝✆ ✡ ,
`(a) compute its corresponding position in the other image☞✁✍✢ ✄✠✆ ✢✡ using (5);
`☞ ✧ ✟ ✢☞✁ ✢ ✄✠✆ ✢ ✡ ✮ ✟ ☞✁ ✄✠✆ ✡
`and the intensity gradient✓ ✟ ✢✌✓ ✁✔✢✄✓ ✟ ✢✌✓ ✆ ✢✡ ;
`☞ w.r.t. the✩✞✄ using
`✓ ☞✓ ✩✟✄ ✧ ✓ ✟ ✢✓ ✁ ✢ ✓ ✁✔✢✓ ✩✞✄ ✁ ✓ ✟ ✢✓ ✆ ✢ ✓ ✆✔✢✓ ✩✟✄
`2. Solve the system of equations ✁✎☛✏✌ ✡ ∆✡ ✧ ✁
`✗ ✧✡✘✑✙✓ ✗ ✁ ∆✡
`
`3. Check that the error in (6) has decreased; if not, increment ☛
`1992]) and compute a new ∆✡
`
`,
`
`(as described in [Press et al.,
`
`4. Continue iterating until the error is below a threshold or a fixed number of steps has been
`
`completed.
`
`The steps in this algorithm are similar to the operations performed when warping images [Wolberg,
`
`1990; Beier and Neely, 1992], with additional operations for correcting the current warping pa-
`
`rameters based on the local intensity error and its gradients (similar to the process in [Gleicher and
`
`Witkin, 1992]). For more details on the exact implementation, see [Szeliski and Coughlan, 1994].
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:20)(cid:22)
`
`✁
`✡
`✁
`✁
`

`

`8
`
`4 Planar imagemosaicing
`
`4.2 Global image registration
`
`Unfortunately, both gradient descent and Levenberg-Marquardt only find locally optimal solutions.
`
`If the motion between successive frames is large, we must use a different strategy to find the best
`
`registration. We have implemented two different techniques for dealing with this problem. The
`
`first technique, which is commonly used in computer vision, is hierarchical matching, which first
`
`registers smaller, subsampled versions of the images where the apparent motion is smaller [Quam,
`
`1984; Witkin et al., 1987; Bergen et al., 1992]. Motion estimates from these smaller coarser levels
`
`are then used to initialize motion estimates at finer levels, thereby avoiding the local minimum
`
`problem (see [Szeliski and Coughlan, 1994] for details.) While this technique is not guaranteed
`
`to find the correct registration, we have observed empirically that it works well when the initial
`
`misregistration is only a few pixels (the exact domain of convergence depends on the intensity
`
`pattern in the image.)
`
`For larger displacements, we use phase correlation [Kuglin and Hines, 1975; Brown, 1992].
`
`This technique estimates the 2-D translation between a pair of images by taking 2-D Fourier
`
`Transforms of each image, computing the phase difference at each frequency, performing an
`
`inverse Fourier Transform, and searching for a peak in the magnitude image. In our experiments,
`
`this technique has proven to work remarkably well, providing good initial guesses for image pairs
`
`which overlap by as little as 50%. The technique will not work, however, if the inter-frame motion
`
`is not mostly translational (large rotations or zooms), but this does not occur in practice with our
`
`current image sequences.
`
`4.3 Results
`
`To demonstrate the performance of our algorithm, we digitized an image sequence with a camera
`
`panning over a whiteboard. Figure 2a shows the final mosaic of the whiteboard, with the locations
`
`of the constituent images in the mosaic shown as black outlines. Figure 2b is a portion of the final
`
`mosaic without these outlines. This mosaic is 1300 2046 pixels, based on compositing 39 NTSC
`
`(640 480) resolution images.
`
`To compute this mosaic, we developed an interactive image mosaicing tool which lets the user
`
`coarsely position successive frames relative to each other. The tool also has an automatic mosaicing
`
`option which computes the initial rough placement of each image with respect to the previous one
`
`(cid:57)(cid:36)(cid:47)(cid:40)(cid:50)(cid:3)(cid:40)(cid:59)(cid:17)(cid:3)(cid:20)(cid:19)(cid:21)(cid:28)(cid:66)(cid:19)(cid:20)(cid:23)
`
`

`

`5 Panoramicimage mosaicing
`
`9
`
`using phase correlation. Our algorithm then refines the location of each image by minimizing (6)
`
`using the current mosaic thus far as
`
`✟ ✢☞✁✔✢✄✠✆ ✢✡ . The
`
`images in Figure 2 were automatically composited without user intervention using the middle frame
`
`(center of the image) as the anchor image (no deformations.) As we can see, our technique works
`
`well on this example. Section 9 discusses some imaging artifacts which can cause difficulties.
`
`✟ ☞✁✞✄✠✆✑✡ and the input frame being adjusted as
`
`5 Panoramic image mosaicing
`
`Another image mosaicing problem which turns out to be equally simple to solve is panoramic
`
`image mosaicing. In this scenario, we rotate a camera around its optical center in order to create
`
`a full “viewing sphere” of images (in computer graphics, this representation is sometimes called
`
`an environment map [Greene, 1986].) This is similar to the action of panoramic still photographic
`
`cameras where the rotation of a camera on top of a tripod is mechanically coordinated with the
`
`film transport, the anamorphic lens used in [Lippman, 1980], and to “cinema in the round” movie
`
`theaters [Greene and Heckbert, 1986]. In our case, however, we can mosaic multiple 2-D images of
`
`arbitrary detail and resolution, and we need not know the camera motion. Examples of applications
`
`include constructing true scenic panoramas (say of the view at the rim of the Grand Canyon), or
`
`limited virtual environments (recreating a meeting room or office as seen from one location.)
`
`It turns out that images taken from the same viewpoint (stationary optical center) are related
`
`by 2-D projective transformations, just as in the planar scene case (see Appendix A for a quick
`
`proof.) Intuitively, we cannot tell the relative depth of points in the scene as we rotate (there is no
`
`motion parallax), so they may as well be located on any plane (in projective geometry, we could
`
`say they lie on the plane at infinity,
`
`

`

`ERROR: undefinedfilename
`OFFENDING COMMAND: findfont
`
`STACK:
`
`/
`/ISOUYZ+
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket