`
`(12) United States Patent
`Snavely et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,160,400 B2
`Apr. 17, 2012
`
`(54) NAVIGATING IMAGES USING IMAGE BASED
`GEOMETRICALIGNMENT AND OBJECT
`BASED CONTROLS
`
`2002/01 13872 A1* 8/2002 Kinjo ............................ 348,116
`2008. O150890 A1* 6, 2008 Bell et al. ....
`... 345,156
`2008. O150913 A1* 6/2008 Bell et al. ...................... 34.5/175
`
`(75) Inventors: Keith Noah Snavely, Seattle, WA (US);
`Stely G St. WA
`SS Icnard SZeIISKI, Sellevue,
`(US)
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1245 days.
`
`(21) Appl. No.: 11/493,436
`9
`Jul. 25, 2006
`Prior Publication Data
`US 2007/O110338A1
`May 17, 2007
`
`(22) Filed:
`(65)
`
`Related U.S. Application Data
`(60) Provisional application No. 60/737,908, filed on Nov.
`17, 2005.
`(51) Int. Cl.
`(2006.01)
`G06K9/54
`(2006.01)
`G06K 9/46
`(52) U.S. Cl. ........ 382/305; 382/100; 382/154; 382/190;
`382/201:382/206: 707/E17.029
`(58) Field of Classification Search .................. 382/100,
`382/154, 190, 201, 206, 214, 216, 305, 325:
`707/E17.029
`See application file for complete search history.
`
`56
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`7.263,230 B2 * 8/2007 Tashman ....................... 382,232
`7,353,114 B1 * 4/2008 Rohlfetal. ....................... 7O2/5
`7.693,702 B1 * 4/2010 Kerner et al. ...
`TO3/22
`
`OTHER PUBLICATIONS
`TED Talk “Blaise Agueray Arcas demos Photosynth” filmed Mar.
`2007 at TED conference in Monterey, California, available to view at:
`http://www.ted.com/talkS/lang/eng/blaise aguera y arcas
`demos photosynth.html.*
`Arya, S. et al., “An optimal algorithm for approximate nearest neigh
`bor searching fixed dimensions,” Journal of the ACM 45, 1998, 6.
`891-923.
`Brown, M. et al., “Unsupervised 3D object recognition and recon
`struction in unordered datasets.” International Conference On 3D
`Imaging and Modeling, Ontario, Canada, Jun. 13-16, 2005, 56-63.
`Canny, J., “A computational approach to edge detection.” IEEE
`Trans. Pattern Anal. Mach. Intell., 1986, 8(6), 679-698.
`(Continued)
`
`Primary Examiner — Stephen Koziol
`(74) Attorney, Agent, or Firm — Woodcock Washburn LLP
`
`ABSTRACT
`(57)
`Over the past few years there has been a dramatic prolifera
`tion of digital cameras, and it has become increasingly easy to
`share large numbers of photographs with many other people.
`These trends have contributed to the availability of large
`databases of photographs. Effectively organizing, browsing,
`and visualizing Such seas. of images, as well as finding a
`particular image, can be difficult tasks. In this paper, we
`demonstrate that knowledge of where images were taken and
`where they were pointed makes it possible to visualize large
`sets of photographs in powerful, intuitive new ways. We
`present and evaluate a set of novel tools that use location and
`orientation information, derived semi-automatically usin
`y
`9.
`structure from motion, to enhance the experience of exploring
`Such large collections of images.
`
`9 Claims, 10 Drawing Sheets
`
`Map 501
`
`
`
`Information
`and Search
`tools 502
`
`Selectable
`Digital
`Photographs
`503
`
`Petitioner Apple Inc. - Ex. 1008, p. 1
`
`
`
`US 8,160.400 B2
`Page 2
`
`OTHER PUBLICATIONS
`Debevec, P. E. et al., “Modeling and rendering architecture from
`photographs: a hybrid geometry- and image-based approach. SIG
`GRAPH '96. Proceedings of the 23rd annual conference on Com
`puter graphics and interactive techniques, ACM Press, New York,
`NY, USA, 1996, 11-20.
`Yahoo, Inc., “Popular Tags on Flickr Photo Sharing.” Flickr, http://
`www.flickr.com/photoStags, 2006, 2 pages.
`Hartley, R. I. et al., Multiple View Geometry in Computer Vision,
`second ed. Cambridge University Press, 2004.
`Johansson, B. et al., “A System for automatic pose-estimation from a
`single image in a city scene.” IASTED Int. Conf. Signal Processing,
`Pattern Recognition and Applications, Crete, Greece, Jun. 25-28,
`2002, 68-73.
`Lourakis, M. I. et al., “The design and implementation of a generic
`sparse bundle adjustment Software package based on the levenberg
`marquardt algorithm.” Tech. Rep. 340, Institute of Computer Sci
`ence—FORTH. Heraklion, Crete, Greece, Aug. 2004.
`Mikolajczyk, K. et al., “A performance evaluation of local descrip
`tors.” IEEE Transactions on Pattern Analysis & Machine Intelli
`gence, 2005, 27(10), 1615-1630.
`
`Rubner, Y. et al., “A metric for distributions with applications to
`image databases.” Int'l Conf. On Computer Vision (ICCV), 1998,
`59-66.
`Schaffalitzky, F. et al., “Multi-view matching for un-ordered image
`sets, or “How do I organize my holiday Snaps?” Proceedings of the
`7" European Conference on Computer Vision, Copenhagen, Den
`mark, May 28-31, 2002, 1, 414-431.
`Sutherland, I. E., "Sketchpad: a man-machine graphical communi
`cation system.” Proceedings Spring Joint Computer Conference,
`1963, 329-346.
`Szeliski, R., “Image alignment and Stitching: A tutorial. Tech. Rep.
`MSR-TR-2004-92, Microsoft Research, 2004, 1-57.
`Werner, T. et al., “New techniques for automated architecture recon
`struction from photographs.” Proceedings of the 7th European Con
`ference on Computer Vision, Copenhagen, Denmark, May 28-31,
`2002, 2, 541-555.
`Microsoft Co., “What can you do with agazillion photos on a single
`database indexed by their locations?' World-Wide Media eXchange.
`WWMX. http://www.wwmx.org, Apr. 7, 2005, downloaded Sep. 27.
`2006, 2 pages.
`Yeh, T. et al., “Searching the web with mobile images for location
`recognition.” CVPR (2), 2004, 76-81.
`* cited by examiner
`
`Petitioner Apple Inc. - Ex. 1008, p. 2
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 1 of 10
`
`US 8,160,400 B2
`
`
`
`
`
`Server 100
`
`Database 101
`
`Image
`Processing
`102
`
`
`
`
`
`
`
`Computer 110
`
`Display 120
`
`Image
`Processing
`112
`
`Database 111
`
`Selection
`Device
`
`Fig. I
`9.
`
`Petitioner Apple Inc. - Ex. 1008, p. 3
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 2 of 10
`
`US 8,160,400 B2
`
`
`
`ITZ
`
`deu o, uonelys1601
`
`Petitioner Apple Inc. - Ex. 1008, p. 4
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 3 of 10
`
`US 8,160,400 B2
`
`
`
`Frusta Showing Camera
`Positions and Orientations
`
`Map View of 3D
`Geometry
`
`Fig. 3
`
`Petitioner Apple Inc. - Ex. 1008, p. 5
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 4 of 10
`
`US 8,160,400 B2
`
`Map 401
`
`Frustum 401
`
`
`
`Information
`and Search
`Tools 502
`
`Selectable
`Digital
`Photographs
`503
`
`Petitioner Apple Inc. - Ex. 1008, p. 6
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 5 of 10
`
`US 8,160,400 B2
`
`translucent projection of
`a digital photograph 601
`
`
`
`Fig. 7 Selectable Digital
`Photograph 703
`
`Petitioner Apple Inc. - Ex. 1008, p. 7
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr. 17, 2012
`Apr. 17, 2012
`
`Sheet 6 of 10
`Sheet 6 of 10
`
`US 8,160,400 B2
`US 8,160,400 B2
`
`
`
`
`
`Fig. 8
`Fig. 8
`
`Petitioner Apple Inc. - Ex. 1008, p. 8
`
`Petitioner Apple Inc. - Ex. 1008, p. 8
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 7 of 10
`
`US 8,160.400 B2
`
`
`
`Search Tools 900
`
`First ROW 91
`
`Petitioner Apple Inc. - Ex. 1008, p. 9
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 8 of 10
`
`US 8,160,400 B2
`
`Search Tools 1001
`
`Object 1000
`
`Photograph
`Information
`1010
`
`Search
`Object
`1011
`Zoom ln, NY
`Zoom Out, a
`Fu Size
`1012
`Step Left,
`Step Right,
`Step Back
`1013
`
`Fig. I0
`
`Main Location 1002
`
`Search Tools 501
`
`
`
`Main Location 1100
`
`Fig. II
`
`Sequentially Ranked
`Alternate images 1102
`
`Petitioner Apple Inc. - Ex. 1008, p. 10
`
`
`
`U.S. Patent
`
`Apr. 17, 2012
`
`Sheet 9 of 10
`
`US 8,160,400 B2
`
`Tag 1202
`Tag 1203
`Tag 1204
`
`Attribute 1205
`
`Attribute 1206
`
`Attribute 12O7
`
`Attribute 1208
`
`
`
`Photo 1200
`
`Annotation 1211
`
`POrtion of Photo
`1201
`
`Annotation 1210
`
`Fig. 12
`
`Petitioner Apple Inc. - Ex. 1008, p. 11
`
`
`
`U.S. Patent
`U.S. Patent
`
`
`
`oy<
`
`—oSN~=he
`
`oS=3=M
`
`oVOEL9}OUdCOELO}OUdCOELO}OUd=oO
`
`
`
`
`
`0}0Ud
`
`US 8,160,400 B2
`“Aa==t+Ss
`
`\o=GOD§1sy—e
`
`£I 81-I
`
`
`
`
`
`(7}Sul]72Uaxe})(¢}Suu]JeUA}e})(Z}atu|yeUaxe})
`
`(1}alu]JeUB}e})
`
`Petitioner Apple Inc. - Ex. 1008, p. 12
`
`LOEL
`
`
`
`
`
`
`
`
`
`OZE}YUOHEDOTJeysues)OLE}YOIHO,payejouuy
`
`
`
`
`
`Petitioner Apple Inc. - Ex. 1008, p. 12
`
`
`
`US 8,160,400 B2
`
`1.
`NAVIGATING IMAGES USING IMAGE BASED
`GEOMETRICALIGNMENT AND OBJECT
`BASED CONTROLS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application claims priority to U.S. Provisional Appli
`cation 60/737,908, filed Nov. 17, 2005.
`
`GOVERNMENT RIGHTS
`
`This invention was funded in part with grants (No. IIS
`0413198 and DGE0203031) by the National Science Foun
`dation. The University of Washington has granted a royalty
`free non-exclusive license to the U.S. government pursuant to
`35 USC Section 202(c)(4) for any patent claiming an inven
`tion subject to 35 Section 201.
`
`10
`
`15
`
`BACKGROUND
`
`2
`similarity (histogram distances such as the Earth Mover's
`Distance Rubner et al. 1998 are often used). A similarity
`score gives a basis for performing tasks such as creating
`spatial layouts of sets of images or finding images that are
`similar to a given image, but often the score is computed in a
`way that is agnostic to the objects in the scene (for instance,
`the score might just compare the distributions of colors in two
`objects). Therefore, these methods are most suitable for orga
`nizing images of classes of objects, such as mountains or
`SunSetS.
`Finally, several tools have been developed for organizing
`large sets of images contributed by a community of photog
`raphers. For example, the World-Wide Media eXchange
`(WWMX) is one such tool. WWMX allows users to contrib
`ute photographs and provide geo-location information by
`using a GPS receiver or dragging and dropping photos onto a
`map. However, the location information may not be
`extremely accurate, and the browsing interface of WWMX is
`limited to an overhead map view. Other photo-sharing tools,
`such as FLICKRR), do not explicitly use location information
`to organize users’ photographs, although FLICKRR Supports
`tools such as "Mappr” for annotating photos with location,
`and it is possible to link images in FLICKRR) to external
`mapping tools such as GOOGLE(R) Earth.
`Finally, the following references are relevant to the
`description of the invention.
`ARYA, S., MOUNT, D. M., NETANYAHU, N. S., SIL
`VERMAN, R., AND WU, A.Y. 1998. An optimal algorithm
`for approximate nearest neighbor searching fixed dimen
`sions. Journal of the ACM 45, 6, 891-923.
`BROWN, M., AND LOWE, D. G. 2005. Unsupervised 3D
`object recognition and reconstruction in unordered datasets.
`In International Conference on 3D Imaging and Modeling.
`CANNY.J. 1986. A computational approach to edge detec
`tion. IEEE Trans. Pattern Anal. Mach. Intell. 8, 6, 679-698.
`DEBEVEC, P. E., TAYLOR, C.J., AND MALIK, J. 1996.
`Modeling and rendering architecture from photographs: a
`hybridgeometry- and image-based approach. In SIGGRAPH
`96: Proceedings of the 23rd annual conference on Computer
`graphics and interactive techniques, ACM Press, New York,
`N.Y., USA, 11-20.
`Flickr. http://www.flickr.com.
`HARTLEY, R.I., AND ZISSERMAN, A. 2004. Multiple
`View Geometry in Computer Vision, second ed. Cambridge
`University Press, ISBN: 0521540518.
`JOHANSSON, B., AND CIPOLLA, R. 2002. A system for
`automatic pose-estimation from a single image in a city
`scene. In IASTED Int. Conf. Signal Processing, Pattern Rec
`ognition and Applications.
`LOURAKIS, M. I., AND ARGYROS, A. A. 2004. The
`design and implementation of a generic sparse bundle adjust
`ment software package based on the levenberg-marquardt
`algorithm. Tech. Rep. 340, Institute of Computer Science—
`FORTH, Heraklion, Crete, Greece, August.
`MIKOLAJCZYK, K., AND SCHMID, C. 2005. A perfor
`mance evaluation of local descriptors. IEEE Transactions on
`Pattern Analysis & Machine Intelligence 27, 10, 1615-1630.
`RUBNERY TOMASI, C., AND GUIBAS, L.J. 1998. A
`metric for distributions with applications to image databases.
`In Int'l Confon Computer Vision (ICCV), 59-66.
`SCHAFFALITZKY, F, AND ZISSERMAN, A. 2002.
`Multi-view matching for n-ordered image sets, or “How do I
`organize my holiday snaps?” In Proceedings of the 7" Euro
`pean Conference on Computer Vision, Copenhagen, Den
`mark, Vol. 1, 414-431.
`SUTHERLAND, I. E. 1964. Sketchpad: a man-machine
`graphical communication system. In DAC 64: Proceedings
`
`25
`
`30
`
`35
`
`40
`
`Digital cameras have become commonplace, and advances
`in technology have made it easy for a single person to take
`thousands of photographs and store all of them on a hard
`drive. At the same time, it has become much easier to share
`photographs with others, whether by posting them on a per
`Sonal web site, or making them available to a community of
`enthusiasts using a photo-sharing service. As a result, anyone
`can have access to millions of photographs through the Inter
`net. Sorting through and browsing Such huge numbers of
`photographs, however, is a challenge. At the same time, large
`collections of photographs, whether belonging to a single
`person, or contributed by thousands of people, create exciting
`opportunities for enhancing the browsing experience by gath
`ering information across multiple photographs. Some photo
`sharing services, such as FLICKRR), available at www.flick
`r.com, allow users to tag photos with keywords, and provide
`a text search interface for finding photos. However, tags alone
`often lack the level of specificity required for fine-grained
`searches, and can rarely be used to organize the results of a
`search effectively. For example, searching for “Notre Dame'
`in FLICKRR results in a list of thousands of photographs,
`sorted either by date or by other users interest in each photo.
`Within this list, photographs of both the inside and the outside
`of Notre Dame cathedral in Paris are interspersed with pho
`45
`tographs taken in and around the University of Notre Dame.
`Finding a photograph showing a particular object, for
`instance, the door of the cathedral, amounts to inspecting
`each image in the list. Searching for both “Notre Dame' and
`“door limits the number of images to a manageable number,
`but almost certainly excludes relevant images whose owners
`simply omitted the tag "door.”
`The computer vision community has conducted work on
`recovering camera parameters and scene geometry from sets
`of images. The work of Brown and Lowe 2005 and of
`Schaffalitzky and Zisserman 2002 involves application of
`automatic structure from motion to unordered data sets. A
`more specific line of research focuses on reconstructing
`architecture from multiple photographs, using semi-auto
`matic or fully automatic methods. The semi-automatic
`Facade system of Debevec, et al. 1996 has been used to
`create compelling fly-throughs of architectural scenes from
`photographs. Werner and Zisserman 2002 developed an
`automatic system for reconstructing architecture, but was
`only demonstrated on Small sets of photographs.
`Techniques have been developed for visualizing or search
`ing through large sets of images based on a measure of image
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Apple Inc. - Ex. 1008, p. 13
`
`
`
`US 8,160,400 B2
`
`3
`of the SHARE design automation workshop, ACM Press, New
`York, N.Y., USA, 6.329-6.346.
`SZELISKI, R. 2005. Image alignment and stitching: A
`tutorial. Tech. Rep. MSR-TR-2004-92, Microsoft Research.
`WERNER, T., AND ZISSERMAN, A. 2002. New tech
`niques for automated architecture reconstruction from pho
`tographs. In Proceedings of the 7" European Conference on
`Computer Vision, Copenhagen, Denmark, vol. 2, 541-555.
`WWMX. World-Wide Media exchange.
`YEH, T, TOLLMAR, K., AND DARRELL. T. 2004.
`Searching the web with mobile images for location recogni
`tion. In CVPR (2), 76-81.
`
`SUMMARY
`
`10
`
`15
`
`25
`
`30
`
`35
`
`4
`FIG. 2 illustrates an exemplary method for determining
`relative and absolute location information for a plurality of
`digital photographs.
`FIG.3 illustrates an exemplary overheadmap interface that
`may be used, in one embodiment, for registering new photo
`graphs in a photo set, and in another embodiment, for brows
`ing photos by selecting from the map a camera location that is
`desired for viewing.
`FIG. 4 illustrates a plurality of user interface features in an
`exemplary “free-flight' browsing mode, in which a user can
`move a virtual camera in a representation of a 3D geometry
`and select desired camera positions for viewing a correspond
`ing digital photo.
`FIG. 5 illustrates a plurality of user interface features in an
`exemplary “image-based' browsing mode, in which a user
`may see a first photograph in a main location in the interface
`and also have access to a plurality of selectable alternate
`images that may have image content related to the first pho
`tograph.
`FIG. 6 illustrates another exemplary embodiment of a “free
`flight' browsing mode such as presented in FIG. 4.
`FIG. 7 illustrates another exemplary embodiment of a
`“image-based' browsing mode such as presented in FIG. 5.
`FIG. 8 illustrates a sample triangulation of a set of sparse
`3D points and line segments, used for morphing. The trian
`gulation is Superimposed on the image that observed the 3D
`features.
`FIG. 9 illustrates an exemplary information and search
`pane comprising a plurality of search tools that may be incor
`porated into embodiments of the invention.
`FIG.10 illustrates a plurality of user interface features in an
`exemplary “object-based’ browsing mode, in which a user
`can select an object and find other images also containing the
`object, and moreover may sort images by which have “best”
`views of the selected object.
`FIG.11 illustrates a plurality of user interface features in an
`exemplary “object-based' browsing mode, in which a user
`selected an object in FIG. 10, and was presented with a best
`view of the object in FIG. 11 along with a plurality of other
`views of the object in 1102, which may be ordered according
`to which have best views of object 1000.
`FIG. 12 illustrates an exemplary digital photograph 1200
`which may be presented in various user interfaces presented
`herein, and metadata relating to image attributes, tags, and
`annotations to portions of the photograph which may also be
`presented along with the photograph 1200.
`FIG. 13 illustrates images from a Notre Dame data set
`showing the cathedral from approximately the same view
`point, but at different times. The various images 1301-1304
`may be presented in a stabilized slide show. The annotation of
`the rose window 1310 has been transferred from image 1301
`to the other three images 1302-1304.
`
`DETAILED DESCRIPTION
`
`Certain specific details are set forth in the following
`description and figures to provide a thorough understanding
`of various embodiments of the invention. Certain well-known
`details often associated with computing and Software tech
`nology are not set forth in the following disclosure, however,
`to avoid unnecessarily obscuring the various embodiments of
`the invention. Further, those of ordinary skill in the relevant
`art will understand that they can practice other embodiments
`of the invention without one or more of the details described
`below. Finally, while various methods are described with
`reference to steps and sequences in the following disclosure,
`the description as Such is for providing a clear implementa
`
`Many collections of photos can be organized, browsed, and
`visualized more effectively using more fine-grained knowl
`edge of location and orientation. As a simple example, if in
`addition to knowing simply that a photograph was taken at a
`place called “Notre Dame' we know the latitude and longi
`tude the photographer was standing along with the precise
`direction he was facing, then an image of the door to Notre
`Dame cathedral can be found more easily by displaying
`search hits on a map interface, and searching only among the
`images that appear in front of the cathedral door.
`As well as improving existing search tools, knowing where
`a photo was taken makes many other browsing modes pos
`sible. For instance, relating images by proximity makes it
`possible to find images that were taken nearby, or to the left of
`or north of a selected image, or to find images that contain a
`close-up of a part of another image. With knowledge of loca
`tion and orientation, it is easier to generate morphs between
`similar photographs, which can make the relationship
`between different images more explicit, and a browsing expe
`rience more compelling. Location and orientation informa
`tion can be combined with other metadata, Such as date, time,
`photographer, and knowledge of correspondence between
`images, to create other interesting visualizations, such as an
`animation of a building through time. With additional knowl
`edge of the geometry of the scene, location information also
`allows tags associated with parts of one photograph to be
`transferred to other similar photographs. This ability can
`improve text searches, and the access to additional informa
`tion for each photo can further enhance the browsing experi
`CCC.
`These browsing tools can be applied to a single user's
`photo collection, a collection of photos taken for a special
`purpose (such as creating a virtual tour of a museum), or a
`database containing photos taken by many different people.
`We also describe herein new tools and interfaces for visu
`alizing and exploring sets of images based on knowledge of
`three-dimensional (3D) location and orientation information,
`and image correspondence. We present semi-automatic tech
`niques for determining the relative and absolute locations and
`orientations of the photos in a large collection. We present an
`interactive image exploration system. These and other aspects
`and embodiments of the invention are described in detail
`below.
`
`40
`
`45
`
`50
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`60
`
`The systems and methods for navigating images using
`image based geometric alignment and object based controls
`in accordance with the present invention are further described
`with reference to the accompanying drawings in which:
`FIG. 1 illustrates a general operating environment for the
`invention.
`
`65
`
`Petitioner Apple Inc. - Ex. 1008, p. 14
`
`
`
`US 8,160,400 B2
`
`10
`
`5
`tion of embodiments of the invention, and the steps and
`sequences of steps should not be taken as required to practice
`this invention.
`The systems and methods for navigating images using
`image based geometric alignment and object based controls
`described herein have been applied to a variety of data sets
`comprising images from various locations. Thus, the various
`techniques and figures discussed may make occasional refer
`ence to the tested data sets. Tested data sets include, for
`example, a set of photographs of the Old Town Square in
`Prague, Czech Republic, a set of photographs taken along the
`Great Wall of China, a set of photos resulting from an internet
`search for “notredame AND paris,” a set of photos resulting
`from an internet search for “halfdome ANDYosemite a set
`of photos resulting from an internet search for “trevi AND
`15
`rome.” and a set of photos resulting from an internet search for
`“trafalgarsquare.”
`General Operating Environment
`FIG. 1 presents a general operating environment for
`aspects of the invention. In general, computer hardware and
`software such as that depicted in FIG.2 may be arranged in
`any configuration and using the full extent of presently avail
`able or later developed computing and networking technolo
`gies. In one configuration, a server 100 may be connected to
`a network 105 such as the internet. The server 100 may
`receive and respond to requests from client computers such as
`110 that are also coupled to the network 105. Server 100 may
`be equipped with or otherwise coupled to a database or data
`store 101 containing images such as digital photographs, as
`well as metadata or other useful information that can be used
`to categorize and process the images. Server 100 may also be
`equipped with or otherwise coupled to image processing
`logic 102 for carrying out various processing tasks as dis
`cussed herein.
`Thus, in one arrangement, a client 110 may request data
`from a server 100 via network 105. The request may be in the
`form of a browser request for a web page, or by other means
`as will be appreciated by those of skill in the art. The server
`100 may provide the requested information, which can be
`used by the client 110 to present a user interface on display
`120. A user can interact with the user interface by activating
`selectable objects, areas, icons, tools and the like using a
`selection device 130 such as a mouse, trackball or touchpad.
`In connection with providing Such information, the server
`100 may access database 101 for appropriate images and may
`45
`apply image processing logic 102 as necessary. Certain image
`processing logic in 102 may also be applied before and after
`the client request to properly prepare for and if necessary
`recover from satisfaction of the client 110 request. In connec
`tion with displaying the requested information, client 110
`may, in some embodiments, access its own database 111 and
`image processing 112, for example when the client 110 and
`server each contain information to be presented in a particular
`user interface on electronic display 120. In other embodi
`ments, the client 110 may simply rely on the server 100 to
`provide Substantially all of the image processing functions
`associated with carrying out the invention.
`In another arrangement, the client 110 may implement the
`systems and methods of the invention without relying on
`server 100 or network 105. For example, client 110 may
`contain images in database 111, and may apply image pro
`cessing logic 112 to the images to produce a user interface
`that can be presented to a user via display 120. Thus, while the
`invention can be performed over a network using client/server
`or distributed architectures as are known in the art, it is not
`limited to Such configurations and may also be implemented
`on a stand-alone computing device.
`
`55
`
`6
`The description and figures presented herein can be under
`stood as generally directed to hardware and software aspects
`of carrying out image processing logic Such as 102 and 112
`that produces at least in part a user interface that may be
`presented on an electronic display 120. Many of the remain
`ing figures, as will be appreciated, are directed to exemplary
`aspects of a user interface that may be presented on a display
`120. Aspects of the invention comprise novel features of such
`user interfaces, as will be appreciated, and optionally also
`Supporting logic 112, 102 that produces such aspects of user
`interfaces or that processes images such that they may be
`presented in a user interface as disclosed herein.
`Determining Geo-Location
`In order to effectively use our browsing tools on a particu
`lar set of images, we need fairly accurate information about
`the location and orientation of the camera used to take each
`photograph in the set. In addition to these extrinsic param
`eters, it is useful to know the intrinsic parameters, such as the
`focal length, of each camera. How can this information be
`derived? GPS is one way of determining position, and while
`it is not yet common for people to carry around GPS units, nor
`do all current GPS units have the accuracy we desire, a first
`Solution is to equip digital cameras with GPS units so that
`location and orientation information can be gathered when a
`photograph is taken. As for the intrinsic parameters, many
`digital camera models embed the focal length with which a
`photo was taken (as well as other information, such as expo
`Sure, date, and time) in the Exchangeable Image File Format
`(EXIF) tags of the image files. EXIF is the present standard
`for image metadata, but any image metadata may also be
`used. However, EXIF and/or other metadata values are not
`always accurate.
`A second solution does not rely on the camera to provide
`accurate location information; instead, we can derive location
`using computer vision techniques. Brown and Lowe 2005
`provides useful background for this discussion. We first
`detect feature points in each image, then match feature points
`between pairs of images, keeping only geometrically consis
`tent matches, and run an iterative, robust structure from
`motion procedure to recover the intrinsic and extrinsic cam
`era parameters. Because structure from motion only esti
`mates the relative position of each camera, and we are also
`interested in absolute coordinates (e.g., latitude and longi
`tude), we use a novel interactive technique to register the
`recovered cameras to an overhead map. A flowchart of the
`overall process is shown in FIG. 2.
`As can be observed in FIG. 2, a set of input images 200 can
`be processed through a variety steps as may be carried out by
`one or more computer software and hardware components, to
`ultimately produce information regarding the absolute loca
`tion of the input images (photographs), and the 3D points
`within Such images 212. Exemplary steps can include key
`point detection 201, keypoint matching 202, estimating epi
`polar geometry and removing outliers 203, applying a struc
`ture from motion procedure 204 that produces an output
`comprising the relative locations of photographs and 3D
`points 210, and map registration 211.
`The exemplary structure from motion procedure 204 may
`comprise choosing a pair of images I and I with a large
`mumber of matches and wide baseline 205, running bundle
`adjustment 206, choosing a remaining image I with the most
`matches to existing points in the scene and adding image I to
`the optimization 207, again running bundle adjustment as
`necessary 208, adding well-conditioned points to the optimi
`Zation 209. Additional images can be processed as necessary
`by returning to step 206. After all images are processed,
`output 210 can be used in map registration 211 as described
`
`25
`
`30
`
`35
`
`40
`
`50
`
`60
`
`65
`
`Petitioner Apple Inc. - Ex. 1008, p. 15
`
`
`
`7
`above. Various exemplary aspects of a system Such as that of
`FIG. 2 are discussed in greater detail in the below sections,
`entitled “keypoint detection and matching.” “structure from
`motion.” “interactive registration to overhead map.” “regis
`tering new photographs.” and “line segment reconstruction.”
`Keypoint Detection and Matching
`Detecting feature points in a plurality of images and match
`ing feature points between two or more of said plurality of
`images may comprise the following procedures for estimat
`ing image location. The first step is to use a keypoint detector,
`Such as any of the various keypoint detectors described in
`Mikolajczyk and Schmid 2005. A keypoint detector detects
`keypoints for each image. We then match keypoint descrip
`tors between each pair of images. This can be done, for
`example, using the approximate nearest neighbors technique
`of Arya et al. 1998. Any other acceleration technique could
`also be used, including but not limited to hashing or context
`sensitive hashing. For each image pair with a large enough
`number of matches, we estimate a fundamental matrix using,
`for example Random Sampling Consensus (RANSAC), or
`any other robust estimation technique, and remove the
`matches that are outliers to the recovered fundamental matrix.
`After finding a set of putative, geometrically consistent
`matches, we organize the matches into a set of tracks, where
`a track is simply a set of mutually matching keypoints; each
`track ideally contains projections of the same 3D point.
`If the keypoints in every image form the vertex set of a
`graph, and there is an edge in the graph between each pair of
`matching keypoints, then every connected component of this
`graph comprises a track. However, the tracks associated with
`Some connected components might be inconsistent; in par
`ticular, a track is inconsistent if it contains more than one
`keypoint for the same image. We keep only the consistent
`tracks containing at least two keypoints for the next phase of
`the location estimation procedure. Note that this simple rejec
`tion of nominally inconsistent tracks will not reject all physi
`cally inconsistent tracks (i.e., tracks that contain keypoints
`that are projections of different 3D points).
`Structure from Motion
`Next, we wish to determine a plurality of relative locations
`40
`of said images. This step can comprise recovering a set of
`camera parameters and a 3D location for each track. We make
`the common assumption that the intrinsic parameters of the
`camera have a single degree of freedom, the fo