`
`IMAGE-BASED 3D MODELLING: A REVIEW
`
`Fabio Remondino (fabio@geod.baug.ethz.ch)
`Swiss Federal Institute of Technology (ETH), Zurich
`
`Sabry El-Hakim (sabry.el-hakim@nrc-cnrc.gc.ca)
`National Research Council, Ottawa, Canada
`
`Abstract
`
`In this paper the main problems and the available solutions are addressed for the
`generation of 3D models from terrestrial images. Close range photogrammetry has
`dealt for many years with manual or automatic image measurements for precise 3D
`modelling. Nowadays 3D scanners are also becoming a standard source for input
`data in many application areas, but image-based modelling still remains the most
`complete, economical, portable, flexible and widely used approach. In this paper the
`full pipeline is presented for 3D modelling from terrestrial image data, considering
`the different approaches and analysing all the steps involved.
`
`Keywords: calibration, orientation, visualisation, 3D reconstruction
`
`Introduction
`
`Three-dimensional (3D) modelling of an object can be seen as the complete process that
`starts from data acquisition and ends with a 3D virtual model visually interactive on a
`computer. Often 3D modelling is meant only as the process of converting a measured point
`cloud into a triangulated network (‘‘mesh’’) or textured surface, while it should describe a more
`complete and general process of object reconstruction. Three-dimensional modelling of objects
`and scenes is an intensive and long-lasting research problem in the graphic, vision and
`photogrammetric communities. Three-dimensional digital models are required in many
`applications such as inspection, navigation, object identification, visualisation and animation.
`Recently it has become a very important and fundamental step in particular for cultural heritage
`digital archiving. The motivations are different: documentation in case of loss or damage,
`virtual tourism and museum, education resources, interaction without risk of damage, and so
`forth. The requirements specified for many applications,
`including digital archiving and
`mapping, involve high geometric accuracy, photo-realism of the results and the modelling of
`the complete details, as well as the automation, low cost, portability and flexibility of the
`modelling technique. Therefore, selecting the most appropriate 3D modelling technique to
`satisfy all requirements for a given application is not always an easy task.
`Digital models are nowadays present everywhere, their use and diffusion are becoming
`very popular through the Internet and they can be displayed on low-cost computers. Although
`it seems easy to create a simple 3D model, the generation of a precise and photo-realistic
`computer model of a complex object still requires considerable effort.
`The most general classification of 3D object measurement and reconstruction techniques
`can be divided into contact methods (for example, using coordinate measuring machines,
`
`Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`Blackwell Publishing Ltd. 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street Malden, MA 02148, USA.
`
`Align Ex. 1023
`U.S. Patent No. 9,962,244
`
`
`
`Remondino and El-Hakim. Image-based 3D modelling: a review
`
`callipers, rulers and/or bearings) and non-contact methods (X-ray, SAR, photogrammetry, laser
`scanning). This paper will focus on modelling from reality (Ikeuchi and Sato, 2001) rather than
`computer graphics creation of artificial world models using graphics and animation software
`such as 3DMax, Lightwave or Maya. Here and throughout this paper, all proprietary names and
`trade marks are acknowledged; a list of websites providing details of many of these products is
`provided at the end of the paper. Starting from simple elements such as polygonal boxes, such
`packages can subdivide and smooth the geometric elements by using splines and thus provide
`realistic results. This kind of software is mainly used for movie production, games,
`architectural and object design.
`Nowadays the generation of a 3D model is mainly achieved using non-contact systems
`based on light waves, in particular using active or passive sensors (Fig. 1).
`In some applications, other information derived from CAD models, measured surveys or
`GPS may also be used and integrated with the sensor data. Active sensors directly provide
`range data containing the 3D coordinates necessary for the network (mesh) generation phase.
`Passive sensors provide images that need further processing to derive the 3D object
`coordinates. After the measurements, the data must be structured and a consistent polygonal
`surface is then created to build a realistic representation of the modelled scene. A photo-
`realistic visualisation can afterwards be generated by texturing the virtual model with image
`information.
`Considering active and passive sensors, four alternative methods for object and scene
`modelling can currently be distinguished:
`
`(1) Image-based rendering (IBR). This does not include the generation of a geometric 3D
`model but, for particular objects and under specific camera motions and scene con-
`
`Triangulation
`
`Projection of single spot, sheet of light
`or bundle of rays (Moiré, colour-coded
`projection, fringe projection, phase
`shifting)
`
`Active Sensors
`
`3D Model
`
`Time delay
`
`Time of flight (lidar, continuous
`modulation), interferometry
`
`Passive Sensors
`
`Shape from shading
`
`Shape from silhouette
`
`Shape from edges
`
`Shape from texture
`
`Focus/defocus
`
`Photogrammetry
`
`Fig. 1. Three-dimensional acquisition systems for object measurement using non-contact methods based on light
`waves.
`
`270 Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`
`
`The Photogrammetric Record
`
`ditions, it might be considered a good technique for the generation of virtual views
`(Shum and Kang, 2000). IBR creates novel views of 3D environments directly from
`input images. The technique relies on either accurately knowing the camera positions
`or performing automatic stereomatching that,
`in the absence of geometric data,
`requires a large number of closely spaced images to succeed. Object occlusions and
`discontinuities, particularly in large-scale and geometrically complex environments,
`will affect the output. The ability to move freely into the scene and view objects from
`any position may be limited depending on the method used. Therefore, the IBR
`method is generally only used for applications requiring limited visualisation.
`(2) Image-based modelling (IBM). This is the widely used method for geometric surfaces
`of architectural objects (Streilein, 1994; Debevec et al., 1996; van den Heuvel, 1999;
`Liebowitz et al., 1999; El-Hakim, 2002) or for precise terrain and city modelling
`(Gru¨n, 2000). In most cases, the most impressive and accurate results still remain
`those achieved with interactive approaches. IBM methods (including photogram-
`metry) use 2D image measurements (correspondences) to recover 3D object infor-
`mation through a mathematical model or they obtain 3D data using methods such as
`shape from shading (Horn and Brooks, 1989), shape from texture (Kender, 1981),
`shape from specularity (Healey and Binford, 1987), shape from contour (medical
`applications) (Asada, 1987; Ulupinar and Nevatia, 1995) and shape from 2D edge
`gradients (Winkelbach and Wahl, 2001). Passive image-based methods acquire 3D
`measurements from multiple views, although techniques to acquire three dimensions
`from single images (van den Heuvel, 1998; El-Hakim, 2001; Al Khalil and Grussen-
`meyer, 2002; Zhang et al., 2002; Remondino and Roditakis, 2003) are also neces-
`sary. IBM methods use projective geometry (Nister, 2004; Pollefeys et al., 2004)
`or a perspective camera model. They are very portable and the sensors are often low-
`cost.
`(3) Range-based modelling. This method directly captures the 3D geometric information
`of an object. It is based on costly (at least for now) active sensors and can provide a
`highly detailed and accurate representation of most shapes. The sensors rely on
`artificial lights or pattern projection (Rioux et al., 1987; Besl, 1988). Over many years,
`structured light (Maas, 1992; Gaertner et al., 1996; Sablatnig and Menard, 1997),
`coded light (Wahl, 1984) or laser light (Sequeira et al., 1999) has been used for the
`measurement of objects. In the past 25 years many advances have been made in the
`field of solid-state electronics and photonics and many active 3D sensors have been
`developed (Blais, 2004). Nowadays many commercial solutions are available
`(including Breuckmann, Cyberware, Cyrax, Leica, Optech, ShapeGrabber, Riegl and
`Z + F), based on triangulation (with laser light or stripe projection), time-of-flight,
`continuous wave, interferometry or reflectivity measurement principles. They are
`becoming a very common tool for the scientific community but also for non-expert
`users such as cultural heritage professionals. These sensors are still expensive,
`designed for specific ranges or applications and they are affected by the reflective
`characteristics of the surface. They require some expertise based on knowledge of the
`capability of each different technology at the desired range, and the resulting data
`must be filtered and edited. Most of the systems focus only on the acquisition of the
`3D geometry, providing only a monochrome intensity value for each range value.
`Some systems directly acquire colour information for each pixel (Blais, 2004) while
`others have a colour camera attached to the instrument, in a known configuration, so
`that the acquired texture is always registered with the geometry. However, this
`approach may not provide the best results since the ideal conditions for taking the
`
`Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`271
`
`
`
`Remondino and El-Hakim. Image-based 3D modelling: a review
`
`images may not coincide with those for scanning. Therefore, the generation of realistic
`3D models is often supported by textures obtained from separate high-resolution
`colour digital cameras (Beraldin et al., 2002; Guidi et al., 2003). The accuracy at a
`given range varies significantly from one scanner to another. Also, due to object size,
`shape and occlusions, it is usually necessary to perform multiple scans from different
`locations to cover every part of the object: the alignment and integration of the
`different scans can affect the final accuracy of the 3D model. Furthermore, long-range
`sensors often have problems with edges, resulting in blunders or smoothing effects.
`On the other hand, for small and medium size objects (up to the size of a human or a
`statue) range-based methods can provide accurate and complete details with a high
`degree of automation (Beraldin et al., 1999).
`(4) Combination of image- and range-based modelling. In many applications, a single
`modelling method that satisfies all the project requirements is still not available.
`Different investigations on sensor integration have been performed in El-Hakim and
`Beraldin (1994, 1995). Photogrammetry and laser scanning have been combined in
`particular for complex or large architectural objects, where no technique by itself can
`efficiently and quickly provide a complete and detailed model. Usually the basic
`shapes such as planar surfaces are determined by image-based methods while the fine
`details such as reliefs employ range sensors (Flack et al., 2001; Sequeira et al., 2001;
`Bernardini et al., 2002; Borg and Cannataci, 2002; El-Hakim et al., 2004; Beraldin
`et al., 2005).
`
`Comparisons between range-based and image-based modelling are reported in Bo¨hler and
`Marbs (2004), Kadobayashi et al. (2004), Bo¨hler (2005) and Remondino et al. (2005). At the
`moment it can safely be said that, for all types of objects and sites, there is no single modelling
`technique able to satisfy all requirements of high geometric accuracy, portability, full
`automation, photo-realism and low cost as well as flexibility and efficiency.
`In the next sections, only the terrestrial image-based 3D modelling problem for close
`range applications will be discussed in detail.
`
`Terrestrial Image-Based 3D Modelling
`
`Recovering a complete, detailed, accurate and realistic 3D model from images is still a
`difficult task, in particular for large and complex sites and if uncalibrated or widely separated
`images are used. Firstly, because the wrong recovery of the parameters could lead to inaccurate
`and deformed results. Secondly, because a wide baseline between the images always requires
`user interaction in the point measurements.
`For many years photogrammetry dealt with the precise 3D reconstruction of objects from
`images. Although precise calibration and orientation procedures are required, suitable
`commercial packages are now available. They are all based on manual or semi-automated
`measurements (Australis, Canoma, ImageModeler, iWitness, PhotoGenesis, PhotoModeler,
`ShapeCapture). After the tie point measurement and bundle adjustment phases, they allow
`sensor calibration and orientation data and 3D object point coordinates, as well as wire-frame
`or textured 3D models, to be obtained from multi-image networks.
`The overall image-based 3D modelling process consists of several well-known steps:
`design (sensor and network geometry); 3D measurements (point clouds, lines, etc.); structuring
`and modelling (segmentation, network/mesh generation, etc.); texturing and visualisation.
`In the remainder of the paper, attention is focused on the details of 3D modelling from
`multiple images.
`
`272 Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`
`
`The Photogrammetric Record
`
`The research activities in terrestrial image-based modelling can be classified as follows:
`
`(1) Approaches that try to obtain a 3D model of the scene from uncalibrated images
`automatically (also called ‘‘shape from video’’ or ‘‘VHS to VRML’’ or ‘‘Video-To-
`3D’’). Many efforts have been made to completely automate the process of taking
`images, calibrating and orienting them, recovering the 3D coordinates of the imaged
`scene and modelling them, but while promising, the methods are thus far not always
`successful or proven in practical applications. The fully automated procedure, widely
`reported in the computer vision community (Fitzgibbon and Zisserman, 1998; Nister,
`2004; Pollefeys et al., 2004), starts with a sequence of closely separated images taken
`with an uncalibrated camera. The system automatically extracts points of interest
`(such as corners), sequentially matches them across views and then computes camera
`parameters and 3D coordinates of the matched points using robust techniques. The
`key to the success of this fully automatic procedure is that successive images must not
`vary significantly, thus the images must be taken at short intervals. The first two
`images are generally used to initialise the sequence. This is done on a projective
`geometry basis and it is usually followed by a bundle adjustment. A ‘‘self-calibration’’
`(or auto-calibration) to compute the intrinsic camera parameters (usually only the
`focal length) is generally used in order to obtain metric reconstruction (up to a scale)
`from the projective one. The 3D surface model is then automatically generated. In
`case of complex objects, further matching procedures are applied in order to obtain
`dense depth maps and a complete 3D model. See Scharstein and Szeliski (2002) for a
`recent overview of dense stereo-correspondence algorithms. Some approaches have
`also been presented for the automated extraction of image correspondences between
`wide baseline images (Pritchett and Zisserman, 1998; Matas et al., 2002; Ferrari et al.,
`2003; Xiao and Shah, 2003; Lowe, 2004), but their reliability and applicability for
`automated image-based modelling of complex objects is still not satisfactory as they
`yield mainly a sparse set of matched feature points. However, dense matching results
`under wide baseline conditions were reported in Strecha et al. (2003) and Megyesi and
`Chetverikov (2004). Automated image-based modelling methods rely on features that
`can be extracted from the scene and automatically matched, therefore occlusions,
`illumination changes, limited locations for the image acquisition and untextured
`surfaces are problematic. However, recent invariant point detector and descriptor
`operators, such as the SIFT operator (Lowe, 2004), proved to be more robust under
`large image variations. Another problem is that it is very common that an automated
`process ends up with areas containing too many features that are not all required for
`modelling while there are areas without any features or with a minimum number that
`cannot produce a complete 3D model. Automated processes require highly structured
`images with good texture, high frame rate and uniform camera motion, otherwise they
`will inevitably fail. Image configurations that lead to ambiguous projective recon-
`structions have been identified in Hartley (2000) and Kahl et al. (2001) while self-
`calibration-critical motions have been studied in Sturm (1997) and Kahl et al. (2000).
`The level of automation is also strictly related to the quality (precision) of the required
`3D model. Automated reconstruction methods, even if able to recover the complete
`3D geometry of an object, reported errors up to 5% (accuracy of c. 1:20) (Pollefeys
`et al., 2004), limiting their use to applications that require only ‘‘nice-looking’’ partial
`3D models. Furthermore, post-processing operations are often required, which means
`that user interaction is still needed. Therefore, fully automated procedures are
`generally reliable and limited to finding point correspondences and camera poses
`
`Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`273
`
`
`
`Remondino and El-Hakim. Image-based 3D modelling: a review
`
`(Remondino, 2004; Roth, 2004; Forlani et al., 2005). ‘‘Nice-looking’’ 3D models can
`be used for visualisation purposes while for documentation, high accuracy and photo-
`realism, user interaction is mandatory. For all these reasons, more emphasis has been
`always put on semi-automated or interactive procedures, combining the human ability
`of image understanding with the powerful capacity and speed of computers. This has
`led to a number of promising approaches for the semi-automated modelling of
`architecture and other complex objects.
`(2) Approaches that perform a semi-automated 3D reconstruction of the scene from
`oriented images. These approaches interactively or automatically orient and calibrate
`the images and afterwards perform the semi-automated modelling relying on the
`human operator (Streilein, 1994; Debevec et al., 1996; El-Hakim, 2002; Gibson et al.,
`2003; Guarnieri et al., 2004). Semi-automated approaches are much more common, in
`particular in case of complex geometric objects. The interactive work consists of the
`definition of the topology, followed by editing and post-processing of 3D data. The
`output model, based only on the measured points, usually consists of surface
`boundaries that are irregular, overlapping and need some assumptions in order to
`generate a correct surface model. The degree of automation of modelling increases
`when certain assumptions about the object, such as perpendicularity or parallel sur-
`faces, can be introduced. Debevec et al. (1996) developed a hybrid easy-to-use system
`to create 3D models of architectural features from a small number of photographs. It is
`the well-known Fac¸ade program, afterwards included to some extent in the com-
`mercial software Canoma. The basic geometric shape of a structure is first recovered
`using models of polyhedral elements. In this interactive step, the actual size of the
`elements and the camera pose are captured assuming that
`the intrinsic camera
`parameters are known. The second step is an automated matching procedure to add
`geometric details, constrained by the now-known basic model. The approach proved
`to be effective in creating geometrically accurate and realistic 3D models. The
`drawback is the high level of interaction. Since the assumed shapes determine the
`camera poses and all 3D points, the results are as accurate as the assumption that
`the structure elements match those shapes. Liebowitz et al. (1999) presented a method
`for creating 3D graphical models of scenes from a limited number of images, in
`particular in situations where no scene coordinate measurements are available (due to
`occlusions). After manual point measurements,
`the method employs constraints
`available from geometric relationships that are common in architectural scenes, such
`as parallelism and orthogonality, together with constraints available from the camera.
`Van den Heuvel (1999) uses a line-photogrammetric mathematical model and geo-
`metric constraints to recover the 3D shapes of polyhedral objects. Using lines,
`occluded object points can also be reconstructed and parts of occluded objects can be
`modelled by means of the introduction of coplanarity constraints. El-Hakim (2002)
`developed a semi-automatic technique (partially implemented in ShapeCapture) able
`to recover a 3D model of simple as well as complex objects. The images are calibrated
`and oriented without any assumption of the object shapes but using a photogram-
`metric bundle adjustment, with or without self-calibration, depending on the given
`configuration. This achieves higher geometric accuracy independent from the shape of
`the object. The modelling of complex object parts, such as groin vault ceilings or
`columns, is achieved by manually measuring in multiple images a number of seed
`points and fitting a quadratic or cylindrical surface. Using the recovered parameters of
`the fitted surface and the known internal and external camera parameters for a given
`image, any number of 3D points can be added automatically within the boundary of
`
`274 Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`
`
`The Photogrammetric Record
`
`the section. Lee and Nevatia (2003) developed a semi-automatic technique to model
`architecture where the camera is calibrated using the known shapes of the buildings
`being modelled. The models are created in a hierarchical manner by dividing the
`structure into basic shapes, fac¸ade textures and detailed geometry such as columns and
`windows. The detailed modelling of the geometry is an interactive procedure that
`requires the user to provide shape information such as width, height and radius and
`then the shape is completed automatically.
`(3) Approaches that perform a fully automated 3D reconstruction of the scene from
`oriented images. The orientation and calibration are performed separately, interac-
`tively or automatically, while the 3D object reconstruction, based on object constraints,
`is fully automated. Most of the approaches explicitly make use of strong geometric
`constraints such as perpendicularity and verticality, which are likely to be found in
`architecture. Dick et al. (2001) employ the model-based recognition technique to
`extract high-level models in a single image and then use their projection onto other
`images for verification. The method requires parameterised building blocks with a
`priori distribution defined by the building style. The scene is modelled as a set of base
`planes corresponding to walls or roofs, each of which may contain offset 3D shapes
`that model common architectural elements such as windows and columns. Again, the
`full automation necessitates feature detection and a projective geometry approach;
`however, the technique also employs constraints, such as perpendicularity between
`planes, to improve the matching process. In Gru¨n et al. (2001), after a semi-automated
`image orientation step, a multi-photo geometrically constrained automated matching
`process is used to recover a dense point cloud of a complex object. The surface is
`measured fully automatically using multiple images and simultaneously deriving the
`3D object coordinates. Werner and Zisserman (2002) proposed a fully automated
`Fac¸ade-like approach: instead of the basic shapes, the principal planes of the scene are
`created automatically to assemble a coarse model. A similar approach was presented
`in Schindler and Bauer (2003) and Wilczkowiak et al. (2003). The latter method
`searches three dominant directions that are assumed to be perpendicular to each other:
`the coarse model guides a more refined polyhedral model of details such as windows,
`doors and wedge blocks. Since this is a fully automated approach, it requires feature
`detection and closely spaced images for the automatic matching and camera pose
`estimation using projective geometry. D’Apuzzo (2003) developed an automated
`surface measurement procedure that, starting from a few seed points measured in
`multiple images, is able to match the homologous points within the Voronoi regions
`defined by the seed points.
`
`In the next sections, the image-based modelling pipeline is analysed in detail.
`
`Design and Recovery of the Network Geometry
`
`The authors’ experience and different studies in close range photogrammetry (including
`Clarke et al., 1998; Fraser, 2001; Gru¨n and Beyer, 2001; El-Hakim et al., 2003) confirm that:
`
`(a) the accuracy of a network increases with the increase of the base to depth (B:D) ratio
`and using convergent images rather than images with parallel optical axes;
`(b) the accuracy improves significantly with the number of images in which a point
`appears. But measuring the point in more than four images gives less significant
`improvement;
`
`Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`275
`
`
`
`Remondino and El-Hakim. Image-based 3D modelling: a review
`
`(c) the accuracy increases with the number of measured points per image. However, the
`increase is not significant if the geometric configuration is strong and the measured
`points are well defined (like targets) and well distributed in the image;
`(d) the image resolution (number of pixels) influences the accuracy of the computed
`object coordinates: on natural features, the accuracy improves significantly with image
`resolution, while the improvement is less significant on well-defined, large, resolved
`targets.
`
`Factors concerning the camera calibration are:
`
`(a) self-calibration (with or without known control points) is reliable only when the
`geometric configuration is favourable, mainly highly convergent images of a large
`number of (3D) targets spatially well distributed;
`(b) a flat (2D) testfield could be employed for camera calibration if the images are
`acquired at many different distances, to allow the recovery of the correct focal length;
`(c) at least two or three images should be rotated by 90 degrees to allow the recovery of
`the principal point, that is, to break any projective coupling between the principal
`point offset and the camera station coordinates, and to provide a different variation of
`scale within the image;
`(d) a complete camera calibration should be performed, in particular for the lens distor-
`tions. In most cases, particularly in modern digital cameras and for unedited images,
`the camera focal length can be found, albeit with less accuracy, in the header of the
`digital images. This can be used on uncalibrated cameras if self-calibration is not
`possible or unreliable.
`
`In the light of all the above, in order to optimise the accuracy and reliability of 3D point
`measurement, particular attention must be given to the design of the network. Designing a
`network includes deciding on a suitable sensor and image measurement scheme; which camera
`to use; the imaging locations and orientations to achieve good imaging geometry; and many
`other considerations. The network configuration determines the quality of the calibration and
`defines the imaging geometry. Unfortunately, in many applications, the network design phase
`is not considered or is impossible to apply in the actual object setting, or the images are already
`available from archives, leading to less than ideal imaging geometry. Therefore, in practical
`cases, rather than simultaneously calibrate the internal camera parameters and reconstruct the
`object, it may be better first to calibrate the camera at a given setting using the most appropriate
`network design and afterwards recover the object geometry using the calibration parameters at
`the same camera setting. Advanced digital cameras can reliably save several settings.
`The network geometry is optimally recovered by means of bundle adjustment (Brown,
`1976; Triggs et al., 2000) (with or without self-calibration, depending on the given network
`configuration).
`
`Surface Measurements
`
`Once the images are oriented, the surface measurement step can be performed with
`manual or automated procedures. Automated photogrammetric matching algorithms developed
`for terrestrial images (Gru¨n et al., 2001, 2004; D’Apuzzo, 2003; Santel et al., 2003; Ohdake
`and Chikatsu, 2005) are usually area-based techniques that rely on the least squares matching
`algorithm (Gru¨n, 1985) which can be used on stereo or multiple images (Baltsavias, 1991).
`These methods can produce very dense point clouds but often do not take into consideration
`the geometrical conditions of the object surface and perhaps work with smoothing constraints
`
`276 Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
`
`
`
`The Photogrammetric Record
`
`(Gru¨n et al., 2004; Remondino et al., 2005). Therefore, it is often quite difficult to correctly
`turn randomly generated point clouds into polygonal structures of high quality without losing
`important information and details. The smoothing effects of automated least-squares-based
`matching algorithms mainly result from the following:
`
`(a) the image patches of the matching algorithm are assumed to correspond to planar
`object surface patches (for this reason the affine transformation is generally applied
`for the purpose of matching). Along small objects or corners, this assumption is no
`longer valid, therefore the features are smoothed out (Fig. 2);
`(b) smaller image patches could theoretically avoid or reduce the smoothing effects, but
`may not be suitable for the correct determination of the matching reshaping para-
`meters because a small patch may not include enough image signal content.
`
`The use of high-resolution images (say 10 megapixels) in combination with advanced
`matching techniques (see, for example, Zhang, 2005) would enable the recovery of the fine
`details of an object and would also avoid smoothing effects.
`After the image measurements, the matched 2D coordinates are transformed into 3D
`object coordinates using the previously recovered camera parameters (forward intersection). In
`the case of multi-photo geometrically constrained matching (Baltsavias, 1991), the 3D object
`coordinates are simultaneously derived together with the image points.
`In the vision community, two-frame stereo-correspondence algorithms are predominantly
`used (Dhond and Aggarwal, 1989; Brown, 1992; Scharstein and Szeliski, 2002), producing a
`dense disparity map consisting of a parallax estimate at each pixel. Often the second image is
`resampled in accordance with the epipolar line, so as to have a parallax value in only one
`direction. A large number of algorithms have been developed and the dense output is generally
`used for view synthesis, image-based rendering or modelling of complete regions. Feature-
`based matching techniques differ from area-based methods by performing the matching on
`automatically extracted features or corners using operators such as the SIFT corner detector
`(Lowe, 2004) which produce points that are invariant under large geometric transformations.
`In theory, automated measurements should produce more accurate results compared to
`manual procedures: for example, on single points such as artificial targets, they can obtain an
`accuracy smaller than 1/25 of a pixel with least squares template matching. But within an
`automated procedure, mismatches, irrelevant points and missing parts (due to lack of texture)
`are usually present in the results, requiring a post-processing check and editing of the data.
`
`Perspective
`centre
`
`Image
`
`Object
`
`Fig. 2. Patch definition in least squares matching measurement (left). Triplet of images where patches are assumed
`to correspond to planar object surfaces and the assumption is not valid (right).
`
`Ó 2006 The Authors. Journal Compilation Ó 2006 The Remote Sensing and Photogrammetry Society and Blackwell Publi