`Vol. 1, No. 2, June 2010, 137 158
`
`A critical review of image registration methods
`
`Zhen Xiong and Yun Zhang*
`
`Department of Geodesy and Geomatics Engineering, University of New Brunswick,
`15 Dineen Drive, PO Box 4400, Fredericton, NB E3B 5A3, Canada
`
`(Received 2 September 2009; final version received 10 November 2009)
`
`Image registration is the process of precisely overlaying two (or more) images
`of the same area through geometrically aligning common features (or control
`points) identified in the images. It mainly consists of four steps: feature detection,
`feature matching, transformation function estimation and image resampling.
`Image registration is usually applied in photogrammetry, remote sensing,
`computer vision, pattern recognition and medical image registration. This article
`presents a review of image registration techniques. We emphasise on feature point
`detection and matching. The goal of this article is to provide the readers
`an overview of such techniques, a perspective on the technical advances and
`a reference to relevant research.
`
`feature
`registration;
`Keywords: image
`transformation function; image resampling
`
`detection;
`
`feature matching;
`
`1. Introduction
`
`Image registration is the process of precisely overlaying two (or more) images of the same
`area through geometrically aligning common features (or control points) identified in
`the images (Habib and Ai-Ruzouq 2005, Xiong and Zhang 2009a). Image registration can
`be more generalised as a mapping between two images both spatially and with respect
`to intensity (Brown 1992). The images can be taken at different times, from different
`viewpoints or by different sensors. Therefore, image registration techniques normally can
`be grouped into four categories: multi-modal registration, template registration, multi-
`viewpoints registration and multi-temporal registration (Brown 1992, Zitova and Flusser
`2003).
`The registered images can be used for different purposes, such as (1) integrating or
`fusing information taken from different sensors, (2) finding changes in the images taken
`at different times or under different conditions, (3) inferring three-dimensional (3-D)
`information from images in which either the camera or the objects in the scene have moved
`and (4) for model-based object recognition (Brown 1992).
`feature detection and
`(1)
`Normally,
`image registration consists of
`four steps:
`extraction, (2) feature matching, (3) transformation function fitting and (4) image
`transformation and image resampling (Zitova and Flusser 2003, Xiong and Zhang 2009a).
`
`*Corresponding author. Email: yunzhang@unb.ca
`
`ISSN 1947 9832 print/ISSN 1947 9824 online
`ß 2010 Taylor & Francis
`DOI: 10.1080/19479831003802790
`http://www.informaworld.com
`
`APPL-1016 / Page 1 of 22
`Apple v. Corephotonics
`
`
`
`138
`
`Z. Xiong and Y. Zhang
`
`Up to date, in the feature detection, feature extraction and feature matching, we still face
`huge technical problems. On the other hand, compared with steps 1 and 2, steps 3 and 4
`are much easier. So the feature detection, feature extraction and feature matching are hot
`research topics in the communities of photogrammetry, remote sensing, computer vision,
`pattern recognition and image processing. Therefore, this article reviews the techniques
`which are applied in the above four steps, with emphasis on the techniques of feature point
`extraction and matching.
`
`2. Feature detection
`
`For image registration, sufficient number of control points (common features) is required
`in order to estimate an optimal geometric transformation between two images. The control
`points can be selected manually or extracted automatically. They are, normally, any of the
`following features (Xiong and Zhang 2009a):
`
`. line intersections;
`. points of locally maximum curvature (such as building corners);
`. gravity centres of closed boundary regions (such as centres of building roofs or
`traffic islands) and
`. centres of windows having locally maximum variance.
`
`Besides point features, linear features and areal features can also be used for image
`registration, especially for multi-modal image registration (e.g. registration of optical
`images and laser scanner images) (Brown 1992, Zitova and Flusser 2003, Habib and
`Ai-Ruzouq 2005).
`
`2.1 Point feature
`
`Point features can be extracted in the space domain and frequency domain of an image.
`A wide variety of point feature detectors in the space domain exist in the literature. They
`can be categorised into three classes: intensity based, parametric model based and contour
`based methods (Schmid et al. 2000).
`
`2.1.1 Intensity-based methods
`
`Feature points can be extracted by using the first or second derivatives of the intensity
`surface. The derivatives are used for feature detection by Moravec (1977), Beaudet (1978),
`Dreschler and Nagel (1982), Heitger et al. (1992) and Reisfeld et al. (1995). Sun and
`Kweon (1997) developed an algorithm crosses as oriented pair (COP), which uses two
`oriented cross-operators. Compared with other conventional corner detectors, COP is a
`very fast, accurate and robust corner detector (based on univalue segment assimilating
`nucleus, USAN, and gradient). Colour information can make a significant contribution to
`feature detection. For multi-spectral images, Nicu et al. (2006) developed a colour-based
`corner detection algorithm to detect the most distinctive features.
`Many algorithms use the auto-correlation function for feature detections (Fo¨ rstner
`and Gu¨ lch 1987, Harris and Stephens 1988, Fo¨ rstner 1994). A matrix related to the
`
`APPL-1016 / Page 2 of 22
`
`
`
`International Journal of Image and Data Fusion
`
`139
`
`auto-correlation function is widely used in feature detection. This matrix averages
`P
`P
`derivatives of the signal in a window around a point (x, y):
`P
`P
`ðIxðxk, ykÞÞ2
`Ixðxk, ykÞIyðxk, ykÞ
`ðxk,ykÞ2W
`ðxk,ykÞ2W
`Ixðxk, ykÞIyðxk, ykÞ
`ðIyðxk, ykÞÞ2
`ðxk,ykÞ2W
`ðxk,ykÞ2W
`
`ð1Þ
`
`375
`
`264
`
`where I(x, y) is the image function and (xk, yk) are the points in the window around (x, y).
`Harris detector is a typical representative of algorithms of using the above matrix for
`feature detection. If a grey scale, 2-dimensional (2-D) image I is used; taking an image
`patch over the area (u, v) and shifting it by (x, y), the sum of squared differences (SSD)
`between these two patches, S is given by
`S ¼
`
`ð
`
`Iðu, vÞ Iðu x, v yÞÞ2
`
`ð2Þ
`
`X v
`X u
`
`
`
`ð3Þ
`
`The Harris matrix (denotes A) is found by taking the second derivative (the Hessian) of
`
`S around (x, y)¼ (0, 0). A is given by
`A ¼ hI 2
`xi
`hIxIyi
`
`hIxIyi
`
`hI 2yi
`where the angle brackets denote averaging (summation over (u, v)), and the typical
`notation for partial derivatives is used. If a circular window (or circularly weighted
`window, such as a Gaussian) is used, then the response will be isotropic (Harris and
`Stephens 1988).
`The strength of the corner is determined by ‘how much’ the second derivative is. This
`is done by considering the eigenvalues (1 and 2) of A. Based on the magnitudes of the
`eigenvalues, the following inferences can be made (Harris and Stephens 1988):
`(1) If 1¼ 0 and 2 ¼ 0, then there are no features of interest at this pixel (x, y).
`(2) If 1¼ 0 and 2 is some large positive values, then an edge is found.
`(3) If both 1 and 2 are large, distinct positive values, then a corner is found.
`
`Harris and Stephens (1988) note that exact computation of the eigenvalues is
`computationally expensive (since it requires a square root) and instead suggest the
`following function Mc, where is a tunable parameter which determines how ‘edge-
`phobic’ the algorithm is.
`
`Mc ¼ 12 ð1 þ 2Þ2
`Mc ¼ detðAÞ trace2ðAÞ
`Therefore, the algorithm does not actually have to compute the eigenvalue decom-
`position of the matrix A and instead it is sufficient to evaluate the determinant to find
`corners, or rather interest points in general. The value of has to be determined
`empirically, and in the literature values in the range 0.04–0.06 have been reported as
`feasible. If Mc 4 0, it is a corner, otherwise, it is not a corner (Harris and Stephens 1988).
`Lowe (2004) developed a scale invariant feature transform (SIFT) method for
`extracting distinctive invariant features from images that can be used to perform reliable
`
`ð4Þ
`
`ð5Þ
`
`APPL-1016 / Page 3 of 22
`
`
`
`I40
`
`Z. Xiong and Y. Zhang
`
`
`
`
`
`EEIIHIHVJ
`
`Illllfllllfil
`Image gradients
`
`
`
`
`
`Keypoint descriptor
`
`Figure l. SIFT keypoint descriptor, a 2 x 2 descriptor array (right) computed from an 8 x 8 set
`of samples (left) (Lowe 2004). Notes: The gradient magnitudes and orientations at individual
`image sample points (pixels), indicated by the overlaid circle, are weighted by a Gaussian window.
`Four orientation histograms (right) with the length of each arrow corresponding to the sum of the
`gradient magnitudes ncar that direction within the region which corresponds to a 4 x 4 sub regions
`(sample points) (left) are shown in the right diagram.
`
`matching between different views of an object or scene. SIFT is one of the best algorithms
`for the extraction of the feature points. The features are highly distinctive, in the sense that
`a single feature can be correctly matched with high probability against a large database
`of features from many images. Following are the major steps of computation used to
`generate the image features (Figure l; Lowe 2004):
`
`(l) Scale-space extreme detection: the first stage of computation searches over all
`scales and image locations. It is implemented efficiently by using a difference-
`of-Gaussian (DOG) function to identify potential interest points that are invariant
`to scale and orientation.
`
`is fit to
`localisation: at each candidate location, a detailed model
`(2) Keypoint
`determine the location and scale. Keypoints are selected based on the measures
`of their stability. the keypoint, with low contrast (<0.03) or the ratio between
`the largest magnitude eigenvalue and smallest one is very large, e.g. 10, will be
`eliminated.
`
`(3) Orientation assignment: one or more orientations are assigned to each keypoint
`location based on the local image gradient directions. All future operations are
`performed on image data that has been transformed relative to the assigned
`orientation, scale and location for cach fcature, thereby providing invariance to
`these transformations.
`
`(4) Keypoint descriptor: the local image gradients are measured at the selected scale in
`the region around each keypoint. These are transformed into a representation that
`allows for significant levels of local shape distortion and change in illumination
`(Figure l).
`
`The SIFT keypoints are particularly useful due to their distinctiveness, which enables
`the correct match for a keypoint to be selected from a large database of other keypoints.
`This distinctiveness is achieved by assembling a high-dimensional vector representing
`the image gradients within a local region of the image. The keypoints have been shown to
`be invariant to image rotation and scale, and robust across a substantial range of affine
`distortion, addition of noise and change in illumination (Lowe 2004).
`The features described by SIFI' descriptor use only a monochrome intensity image;
`therefore, further distinctiveness could be derived from including illumination-invariant
`colour descriptors (Funt and Finlayson 1995, Brown and Lowe 2002, Lowe 2004).
`
`APPL-1016l Page 4 (£22
`
`
`
`International Journal of Image and Data Fusion
`
`141
`
`Local texture measures appear to play an important role in human vision and could be
`incorporated into feature descriptors in a more general form than the single spatial
`frequency used by the SIFT.
`The SIFT operator has another two drawbacks in the case of stereo matching
`in photogrammetry. First of all, the DOG detects mainly blob-like interest points, while
`the significant points, such as the corners of buildings and saddle points near the edges
`of roads, could not be successfully extracted, and this disadvantage is critical to the 3-D
`reconstruction. Second, the interest points DOG detected may be not dense enough to
`fulfil the generation of digital surface model through image matching and the exterior
`orientation (Zhu et al. 2007).
`Mikolajczyk and Schmid (2004) developed a scale and affine invariant interest point
`detector. This scale and affine invariant detector is based on the following results:
`
`(1) Interest points extracted with the Harris detector can be adapted to affine
`transformations and give repeatable results (geometrically stable).
`(2) The characteristic scale of a local structure is indicated by a local extreme over
`the scale of normalised derivatives (the Laplacian).
`(3) The affine shape of a point neighbourhood is estimated based on the second
`moment matrix.
`
`This detector first computes a multi-scale representation for the Harris interest point
`detector, then selects the local maximal points over scales. The scale invariant detector
`is extended to affine invariant by estimating the affine shape of a point neighbourhood.
`An iterative algorithm modifies location, scale and neighbourhood of each point and
`converges to affine invariant points. This method can deal with significant affine
`transformations including large-scale changes. The characteristic scale and the affine
`shape of neighbourhood determine an affine invariant region for each point (Mikolajczyk
`and Schmid 2004).
`The scale invariant detector can deal with larger scale changes better than the affine
`invariant detector, but it fails for images with large affine transformations. The affine
`invariant points provide reliable matching even for the images with significant perspective
`deformations. However, the stability and convergence of affine regions is the subject
`of further investigation as well as their robustness to occlusions (Mikolajczyk and
`Schmid 2004).
`
`2.1.2 Parametric model-based methods
`
`Parametric model-based methods can extract features with high accuracy. Rohr (1992)
`developed a template parametric model, where the parameters of the model are adjusted
`by a least squares method. For the ‘L’ corners, the parameters include the angle of
`L-corner, the angle between the symmetry axis of the L-corner and the x-axis, the grey
`values, the position of the point and the amount of blur (Schmid et al. 2000). Deriche and
`Blaszka (1993) improved Rohr’s method by substituting an exponential for the Gaussian
`smoothing function. Baker et al. (1998) and Parida et al. (1998) also developed parametric
`methods for feature extraction (Schmid et al. 2000).
`
`2.1.3 Contour-based methods
`
`Up to date, many contour-based algorithms have been developed (Freeman and Davis
`1977, Beus and Tiu 1987, Liu and Srinath 1990). A number of frequently cited approaches
`
`APPL-1016 / Page 5 of 22
`
`
`
`142
`
`Z. Xiong and Y. Zhang
`
`are discussed in the survey by Liu and Srinath (1990), where comparative experimental
`results are also given. These algorithms are Rosenfeld and Johnston (1973), Rosenfeld and
`Weszka (1975), Freeman and Davis (1977) and Beus and Tiu (1987). But these algorithms
`cannot extract accurate corner position and usually lose many corners (Dmitry and Zsolt
`1999). Based on these algorithms, Dmitry and Zsolt (1999) developed a new algorithm:
`IPAN99 algorithm. It is a fast and efficient algorithm for detection of high-curvature
`points. But this algorithm has its fatal shortcoming. It can only detect high-curvature
`corners. For the low-curvature corners, this algorithm cannot detect them and usually
`extracts many wrong corners.
`The changes in the curvatures of contours are normally used to detect feature
`points (Asada and Brady 1986, Mokhtarian and Mackworth 1986, Pikaz and Dinstein
`1994, Shilat et al. 1997, Mokhtarian and Suomela 1998). B-splines are usually used
`to approximate the contours and then the feature points are detected (Medioni and
`Yasumoto 1987). Some algorithms search the segments along the contours and then find
`line intersections (Horaud et al. 1990, Mokhtarian and Suomela 1998).
`Eduard (2002) developed a wonderful corner detector based on the Bayesian
`estimation. The probability of the event that a point belongs to the approximation of
`the straight segment of the isoline containing the corner candidate is determined using the
`technique of the Bayesian estimation, and then used for computing the angle between
`the segments. In the sense of the successfulness of detection, this algorithm has several
`merits: (1) the detection of corners is highly reliable; (2) the localisation errors are small;
`(3) the sensitivity to noise is low and (4) the implementation of the algorithm is not
`complicated.
`Hough transform (Sung et al. 2005) and Radon transform (Seung et al. 2004) can
`also be used to detect corners. The basic idea is to find the straight lines in the images and
`then search for their intersections, which are the corner points of the objects in the images.
`Muhammad et al. (2006) developed a contour-based method. This method is based on
`calculation of distances from the straight line joining the two contour points on the two
`sides of that corner. It is simple to implement, efficient and robust to noise.
`Normally, contour-based methods are used in the image which has plenty of linear
`features to extract the object’s corners, such as building’s corners (Schmid et al. 2000).
`
`2.1.4 Feature extraction in frequency domain
`
`Fourier transform and wavelet transform can be used for feature extraction in the
`frequency domain. Hong and Zhang (2008) developed a wavelet-based feature extraction
`technique and a relaxation-based image-matching technique to find tie points for image
`registration. Chen et al. (1995) also use wavelet transformation to detect corners. The
`algorithms of extracting feature points in frequency domain are essentially gradient based.
`These algorithms are not so robust as other algorithms of space domain, because they are
`sensitive to scale, rotation and noise.
`
`2.1.4.1 Summary.
`
`. Schmid et al. (2000) compared six algorithms: Harris and Stephens (1988)
`(intensity based), an improved version of Horaud et al. (1990) (contour based),
`Heitger et al. (1992) (intensity based) and Fo¨ rstner (1994) (intensity based).
`Harris-based detector obtains the best results under image rotation, illumination
`
`APPL-1016 / Page 6 of 22
`
`
`
`International Journal of Image and Data Fusion
`
`143
`
`variation and viewpoint change. But Harris is sensitive to scale change and
`rotation (Zhu et al. 2007).
`. Each SIFT point is characterised by 128 unsigned eight-bit numbers, which
`defines the multi-scale gradient orientation histogram. It is really more distinctive
`than Harris and other descriptors to describe features. But SIFT is not very
`sensitive to corners and sometimes cannot detect enough dense interest points
`(Zhu et al. 2007).
`. On the other hand, most of these algorithms were developed initially for computer
`vision. They did not consider the needs and differences of photogrammetry. For
`computer vision, the atmospheric refraction can be omitted, but the refraction
`affection to satellite images is very severe. Normally in computer vision, a series
`of images are taken at nearly the same time, but the satellite images may be taken
`with different orientation, position, date and atmospheric situation. For computer
`vision, Lowe (2004) used a 2 2 descriptor in his experiments and achieved a
`satisfied result, but this even window descriptor is obviously not suitable for point
`accurate location in photogrammetry.
`
`2.2 Linear feature
`
`Linear features can be extracted by derivative-based edge detectors, such as Canny
`detector, or line extraction algorithms, such as Hough transform (Habib and Ai-Ruzouq
`2005). Linear feature detection is always a hot research topic in the community of
`photogrammetry and remote sensing. A large number of
`linear feature detection
`algorithms have been developed. Here only several famous operators are collected.
`For more details of the advances and evaluation of linear feature extraction techniques,
`please refer to the review papers published by Davis (1975), Pinho and Almeida (1997) and
`Quackenbush (2004).
`
`264
`
`Ly ¼
`
`375
`
`0 1
`
`0 2
`
`0 1
`
`(1) Sobel operator is based on the following filters:
` 1
`þ1 þ2 þ1
` 2
`0
`0
`0
` 1
` 1 2 1
`
`264
`
`Lx ¼
`
`ð6Þ
`
`ð7Þ
`
`ð8Þ
`
`ð9Þ
`
`375
`
`375
`
`q
`The gradient magnitude is computed as follows:
`
`rLj j ¼ L2
`x þ L2
`
`y
`
`264
`
`Ly ¼
`
`
`
` 1 1 1
`0
`0
`0
`þ1 þ1 þ1
`
`
`Ly ¼ 0 þ1
` 1
`
`0
`
`375
`
`0 1
`
`0 1
`
`0 1
`
`264
`
`(2) Prewitt filters:
`
`Lx ¼
`
`(3) Roberts filters:
`
` 1
` 1
` 1
`
`Lx ¼ þ1
`0
`0 1
`
`APPL-1016 / Page 7 of 22
`
`
`
`144
`
`Z. Xiong and Y. Zhang
`
`ð10Þ
`
`35
`
`0
`1
`0
`1 4 1
`0
`1
`0
`
`24
`
`(4) Laplacian operator convolution mask:
`
`(5) LOG operator: In order to reduce the image noise, the LOG operator applies the
`Gaussian smoothing process first, and then uses the Laplacian filter to detect edge
`points. The Gaussian distribution function in two variables, g(x, y), is defined by
`ð11Þ
`gðx, yÞ ¼ 1
`22 e
`where is the standard deviation representing the width of the Gaussian
`distribution.
`(6) Canny detector: The Canny edge detection operator was developed by John
`F. Canny in 1986. It basically includes four steps.
`
`ðx2þy2Þ=22
`
`2.2.1 Step 1: Image smoothing
`
`It is necessary to reduce the affection of image noise to the edge extraction. The general
`method is image smoothing by Gaussian convolution (Dmitry and Zsolt 1999).
`
`2.2.2 Step 2: Generate gradient image
`
`Then, a simple 2-D first derivative operator is applied to the smoothed image to highlight
`the regions of the image with high first spatial derivatives (Figure 2):
`
`T1 ¼ gði, j þ 1Þ gði, jÞ þ gði þ 1Þ gði þ 1, jÞð
`Þ
`T 2 ¼ gði, jÞ gði þ 1, jÞ þ gði, j þ 1Þ gði þ 1, j þ 1Þð Þ
`
`p
`2:0
`T 3 ¼ ðT12 þ T22Þ
`
`ð12Þ
`
`ð13Þ
`
`ð14Þ
`
`2:0
`
`
`
`ð15Þ
`
`Angle ¼ tan 1 T 2
`T1
`
`T 1
`
`i, j
`
`i, j + 1
`
`T 2
`
`i + 1, j
`
`i + 1, j + 1
`
`Figure 2. 2 D first derivative operator.
`
`APPL-1016 / Page 8 of 22
`
`
`
`International Journal of Image and Data Fusion
`
`145
`
`where T1 is the first derivative along x direction, T2 is the first derivative along y direction,
`T3 is the final gradient magnitude and ‘angle’ represents the gradient direction.
`
`2.2.3 Step 3: Non-maximal suppression
`
`Edges give rise to ridges in the gradient magnitude image. The algorithm then tracks along
`the top of these ridges and sets zero to all pixels that are not actually on the ridge top so as
`to give a thin line in the output, a process known as non-maximal suppression.
`Figure 3 shows the ridge. This ridge is in the south–north direction. For the central
`point, its gradient magnitude is 23. It is greater than its left pixel and right pixel. So this
`point is a ridge point.
`
`2.2.4 Step 4: Edge point tracking
`
`The tracking process is controlled by two thresholds: threshold1 and threshold2 with
`threshold1 4 threshold2. Tracking can only begin at a ridge point which is higher than
`threshold1. Tracking then continues in both directions out from that point until the height
`of the ridge falls below threshold2. This process helps to ensure that noisy edges are not
`broken up into multiple edge fragments. For example, the threshold1 is set to 22 and
`threshold2 20, then the tracking begins from 23 (Figure 3), and later, the ridge points
`20 and 21 can be found.
`
`2.3 Areal feature
`
`Image segmentation and image classification can be used to extract areal features
`(Habib and Ai-Ruzouq 2005). Like linear feature detection, image classification and
`segmentation is another hot topic in photogrammetry and remote sensing. Basically, the
`classification algorithms can be grouped into two big categories: supervised classification
`and unsupervised classification. Supervised classification algorithms include minimum
`distance classification; parallelepiped classification; maximum likelihood classification;
`hybrid classification, etc. Supervised classification algorithm includes three main steps:
`
`Step 1: Select a set of training pixels for each spectral class.
`
`Step 2: Determine the mean and standard deviation of each class.
`
`Step 3: Compute relative likelihoods or distance for each pixel in the image and label the
`pixel according to the highest likelihood or shortest distance.
`
`Unsupervised classification includes clustering, ISODATA and so on.
`
`10
`
`11
`
`14
`
`20
`
`23
`
`21
`
`13
`
`10
`
`12
`
`Figure 3. Non maximal suppression.
`
`APPL-1016 / Page 9 of 22
`
`
`
`146
`
`Z. Xiong and Y. Zhang
`
`For the details of the classical classification algorithms, please refer to the book Remote
`Sensing Digital Image Analysis, An Introduction by Richard and Jia (2006). For more
`details of the advances and evaluation of classification techniques, please refer to the
`papers published by Lu and Weng (2007), Kotsiantis (2007) and Thair (2009).
`
`3. Feature matching
`
`After automated feature detection and extraction, numerous features can be extracted.
`However, many feature points may be extracted from one image but not from the other.
`Therefore, feature matching is necessary to find corresponding points (common features)
`in both images. The feature matching usually starts from one feature point on one image,
`and then searches for the corresponding point on the other image.
`Basically, there are two big categories for feature matching: area-based methods
`and feature-based methods. In area-based methods, cross-correlation is often used as a
`similarity measure to find the corresponding point. In feature-based methods, however,
`the SSD is generally used to identify the corresponding point. The points with the greatest
`similarity value are identified as matching or corresponding points.
`
`3.1 Area-based methods
`
`3.1.1 Correlation-like methods
`
`Cross-correlation is a basic statistical method for image matching. Images in photogram-
`metry and remote sensing contain local distortions caused by ground relief variations and
`differing imaging viewpoints. Because of the stability and reliability, cross-correlation
`methods are generally used in remote sensing for interest point matching.
`
`P
`
`q
`P
`
` ¼
`
`P
`¼1 ðxi xÞð yi yÞ
`¼1 ðxi xÞ2
`¼1 ð yi yÞ2
`
`ð16Þ
`
`N i
`
`Ni
`
`Ni
`
`Photogrammetric scientists are always attempting to improve the stability and
`reliability of interest point matching techniques. Hierarchical matching, least squares
`and relaxation algorithms are typical examples of such attempts (Gruen 1985, Gruen and
`Baltsavias 1987, Gruen and Zhang 2002, Poli et al. 2004, Zheng et al. 2004, Gruen and
`Akca 2005). At the same time, great efforts are also being made to reduce the search area
`and increase the matching speed. The use of epipolar geometry is one of the most
`important achievements of such work (Kim 2000).
`
`3.1.2 Fourier methods
`
`These methods search for the optimal match according to the information in the frequency
`domain. If the images were acquired under varying conditions or they are corrupted by
`frequency-dependent noise,
`then Fourier methods are preferred rather
`than the
`correlation-like methods (Zitova and Flusser 2003). However, this method is feasible
`only for images which have been at most rigidly misaligned. If the images have significant
`white noise, noise which is spread across all frequencies, then the location of the peak will
`be inaccurate since the phase difference at each frequency is corrupted (Brown 1992).
`
`APPL-1016 / Page 10 of 22
`
`
`
`International Journal of Image and Data Fusion
`
`147
`
`3.1.3 Mutual information methods
`
`Mutual information method stems from the information theory. It is suitable for multi-
`modal registration (Zitova and Flusser 2003). The first mutual information method was
`proposed by Viola and Wells (1997) and Thevenaz and Unser (1996, 1997, 1998). The
`algorithms proposed by Maes et al. (1997) and Thevenaz and Unser (1997) were
`successfully applied for multi-modal image registration.
`
`3.1.3.1 Summary. Area-based methods still have many drawbacks. The main limitations
`can be summarised as follows: (1) the rectangular image window used for finding matching
`areas is only suitable for the images with distortions caused by translation (in theory);
`(2) these methods cannot find matches in homogeneous areas (areas without prominent
`texture); (3) the methods are sensitive to the image intensity changes caused by noise,
`varying illumination and the use of different sensors and (4) images must have similar
`intensity function (Zitova and Flusser 2003).
`
`3.2 Feature-based methods
`
`Because of the large number of feature-based algorithms used in interest point matching,
`there are many classification methods for categorising these algorithms. Normally,
`feature-based algorithms can be categorised into rigid and non-rigid (according to the
`transformation between images), global and local (according to the image distortions),
`or corrected and uncorrected (according to the image variations). In addition, most of the
`feature-based algorithms search for correspondences and also address the refinement of a
`transformation function. Therefore, feature-based algorithms can also be grouped into
`three additional categories (Chui and Rangarajan 2003). They either solve the correspon-
`dence only, solve the transformation only, or solve both the correspondence and the
`transformation.
`
`3.2.1 Global algorithm
`
`Although numerous feature-based algorithms have been developed, there is no general
`algorithm which is suitable for a variety of different applications. Every method must take
`into account the specific geometric image deformation (Zitova and Flusser 2003). The first
`category of algorithms processes the global distortions. The iterative closest point (ICP)
`algorithm is a classical global algorithm (Besl and McKay 1992, Yanget al. 2007). Because
`this algorithm assumes that one surface is a subset of the other, it is only suitable
`for registration of images with global distortions (Williams and Bennamoun 2001). For
`medical image registration and pattern recognition, many rigid global transformations
`are used (Mount et al., 1997, Besl and McKay 2002, Tu et al. 2008). The B-spline and thin
`plate spline (TPS) deformation model is a common model for global distortion in medical
`image registration (Booksten 1989, Kybic and Unser 2003).
`
`3.2.2 Local algorithm
`
`The second category of algorithms deals with the local distortions. For non-rigid local
`distortions, more complicated transformations are developed. TPS was proposed initially
`for global transformations, but it was improved for smooth local distortions for medical
`
`APPL-1016 / Page 11 of 22
`
`
`
`148
`
`Z. Xiong and Y. Zhang
`
`image registration (Gold et al. 1997, Chui and Rangarajan 2003, Auer et al. 2005).
`Another common local distortion model is the elastic deformation model (Rexilius et al.
`2001, Auer et al. 2005). Hierarchical matching strategy, least squares and relaxation
`technique are widely used to deal with the local distortion (Pla and Marchant 1997,
`Rueckert et al. 1999, Jiang et al. 2007, Wang et al. 2007, Lv et al. 2008, Smith et al. 2008).
`Multi-feature matching is also used to process the images with local distortions. Guo et al.
`(2007) used linear feature, areal feature and point feature together for object matching.
`Roger et al. (2007) used edge and point for the registration of cadastre graphs and images.
`Wei et al. (1998) used intensity and gradient matching for 3-D reconstruction.
`
`3.2.3 Invariant descriptor
`
`Many algorithms do not need a transformation function. In computer vision systems
`and pattern recognition, invariant feature descriptors are usually used for feature matching
`(Belongie et al. 2002, Kaplan et al. 2004, Lepetit et al. 2005, Terasawa et al. 2005, Zhao
`et al. 2006, Lv et al. 2008). SIFT is one of the best descriptors for interest point matching
`(Lowe 2004).
`
`3.2.4 Spatial relations
`
`Spatial relations are of great significance to feature matching (Grauman and Darrell 2004,
`Ling and Jacobs 2009). In graph matching algorithms, topological relationship is the key
`feature and is widely used in pattern recognition (Gold and Rangarajan 1996, Cross and
`Hancock 1998, Caetano et al. 2004, Demirci et al. 2004, Shokoufandeh et al. 2006). Spatial
`relations with neighbour interest points are used for feature points matching (Cheng and
`Huang 1984). Xiong and Zhang (2009b) construct a control network based on ‘super
`points’ to provide spatial information for interest point matching.
`
`3.2.5 Classification
`
`Another idea for interest point matching is to consider such issue as a classification
`problem. The features from the reference image are used to train the classifier. This
`method applies mainly in pattern recognition (Boffy et al. 2009, Lepetit et al. 2009).
`
`3.2.5.1 Summary. Although many of
`the feature-based algorithms described above
`are useful
`in solving problems for specific applications,
`they have four common
`drawbacks: (1) the features cannot be exactly matched, because of the variation