throbber
International Journal of Computer Vision 60(2), 91–110, 2004
`c(cid:1) 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
`
`Distinctive Image Features from Scale-Invariant Keypoints
`
`DAVID G. LOWE
`Computer Science Department, University of British Columbia, Vancouver, B.C., Canada
`Lowe@cs.ubc.ca
`
`Received January 10, 2003; Revised January 7, 2004; Accepted January 22, 2004
`
`Abstract. This paper presents a method for extracting distinctive invariant features from images that can be used
`to perform reliable matching between different views of an object or scene. The features are invariant to image scale
`and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in
`3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a
`single feature can be correctly matched with high probability against a large database of features from many images.
`This paper also describes an approach to using these features for object recognition. The recognition proceeds by
`matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm,
`followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification
`through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify
`objects among clutter and occlusion while achieving near real-time performance.
`
`Keywords:
`
`invariant features, object recognition, scale invariance, image matching
`
`1.
`
`Introduction
`
`Image matching is a fundamental aspect of many prob-
`lems in computer vision, including object or scene
`recognition, solving for 3D structure from multiple im-
`ages, stereo correspondence, and motion tracking. This
`paper describes image features that have many prop-
`erties that make them suitable for matching differing
`images of an object or scene. The features are invariant
`to image scaling and rotation, and partially invariant to
`change in illumination and 3D camera viewpoint. They
`are well localized in both the spatial and frequency do-
`mains, reducing the probability of disruption by occlu-
`sion, clutter, or noise. Large numbers of features can be
`extracted from typical images with efficient algorithms.
`In addition, the features are highly distinctive, which
`allows a single feature to be correctly matched with
`high probability against a large database of features,
`providing a basis for object and scene recognition.
`The cost of extracting these features is minimized by
`taking a cascade filtering approach, in which the more
`
`expensive operations are applied only at locations that
`pass an initial test. Following are the major stages of
`computation used to generate the set of image features:
`
`1. Scale-space extrema detection: The first stage of
`computation searches over all scales and image lo-
`cations. It is implemented efficiently by using a
`difference-of-Gaussian function to identify poten-
`tial interest points that are invariant to scale and
`orientation.
`2. Keypoint localization: Ateach candidate location, a
`detailed model is fit to determine location and scale.
`Keypoints are selected based on measures of their
`stability.
`3. Orientation assignment: One or more orientations
`are assigned to each keypoint location based on local
`image gradient directions. All future operations are
`performed on image data that has been transformed
`relative to the assigned orientation, scale, and loca-
`tion for each feature, thereby providing invariance
`to these transformations.
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 1 of 20
`
`

`

`92
`
`Lowe
`
`4. Keypoint descriptor: The local image gradients are
`measured at the selected scale in the region around
`each keypoint. These are transformed into a repre-
`sentation that allows for significant levels of local
`shape distortion and change in illumination.
`
`This approach has been named the Scale Invariant Fea-
`ture Transform (SIFT), as it transforms image data into
`scale-invariant coordinates relative to local features.
`An important aspect of this approach is that it gen-
`erates large numbers of features that densely cover the
`image over the full range of scales and locations. A typ-
`ical image of size 500×500 pixels will give rise to about
`2000 stable features (although this number depends on
`both image content and choices for various parame-
`ters). The quantity of features is particularly important
`for object recognition, where the ability to detect small
`objects in cluttered backgrounds requires that at least
`3 features be correctly matched from each object for
`reliable identification.
`For image matching and recognition, SIFT features
`are first extracted from a set of reference images and
`stored in a database. A new image is matched by indi-
`vidually comparing each feature from the new image to
`this previous database and finding candidate matching
`features based on Euclidean distance of their feature
`vectors. This paper will discuss fast nearest-neighbor
`algorithms that can perform this computation rapidly
`against large databases.
`The keypoint descriptors are highly distinctive,
`which allows a single feature to find its correct match
`with good probability in a large database of features.
`However, in a cluttered image, many features from
`the background will not have any correct match in
`the database, giving rise to many false matches in ad-
`dition to the correct ones. The correct matches can
`be filtered from the full set of matches by identify-
`ing subsets of keypoints that agree on the object and
`its location, scale, and orientation in the new image.
`The probability that several features will agree on
`these parameters by chance is much lower than the
`probability that any individual feature match will be
`in error. The determination of these consistent clus-
`ters can be performed rapidly by using an efficient
`hash table implementation of the generalized Hough
`transform.
`Each cluster of 3 or more features that agree on an
`object and its pose is then subject to further detailed
`verification. First, a least-squared estimate is made for
`an affine approximation to the object pose. Any other
`
`image features consistent with this pose are identified,
`and outliers are discarded. Finally, a detailed compu-
`tation is made of the probability that a particular set of
`features indicates the presence of an object, given the
`accuracy of fit and number of probable false matches.
`Object matches that pass all these tests can be identified
`as correct with high confidence.
`
`2. Related Research
`
`The development of image matching by using a set of
`local interest points can be traced back to the work of
`Moravec (1981) on stereo matching using a corner de-
`tector. The Moravec detector was improved by Harris
`and Stephens (1988) to make it more repeatable un-
`der small image variations and near edges. Harris also
`showed its value for efficient motion tracking and 3D
`structure from motion recovery (Harris, 1992), and the
`Harris corner detector has since been widely used for
`many other image matching tasks. While these feature
`detectors are usually called corner detectors, they are
`not selecting just corners, but rather any image location
`that has large gradients in all directions at a predeter-
`mined scale.
`The initial applications were to stereo and short-
`range motion tracking, but the approach was later ex-
`tended to more difficult problems. Zhang et al. (1995)
`showed that it was possible to match Harris corners
`over alarge image range by using a correlation window
`around each corner to select likely matches. Outliers
`were then removed by solving for a fundamental ma-
`trix describing the geometric constraints between the
`two views of rigid scene and removing matches that did
`not agree with the majority solution. At the same time,
`a similar approach was developed by Torr (1995) for
`long-range motion matching, in which geometric con-
`straints were used to remove outliers for rigid objects
`moving within an image.
`The ground-breaking work of Schmid and Mohr
`(1997) showed that invariant local feature matching
`could be extended to general image recognition prob-
`lems in which a feature was matched against a large
`database of images. They also used Harris corners to
`select interest points, but rather than matching with
`a correlation window, they used a rotationally in-
`variant descriptor of the local image region. This al-
`lowed features to be matched under arbitrary orien-
`tation change between the two images. Furthermore,
`they demonstrated that multiple feature matches could
`accomplish general recognition under occlusion and
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 2 of 20
`
`

`

`Distinctive Image Features from Scale-Invariant Keypoints
`
`93
`
`clutter by identifying consistent clusters of matched
`features.
`The Harris corner detector is very sensitive to
`changes in image scale, so it does not provide a good ba-
`sis for matching images of different sizes. Earlier work
`by the author (Lowe, 1999) extended the local feature
`approach to achieve scale invariance. This work also
`described a new local descriptor that provided more
`distinctive features while being less sensitive to local
`image distortions such as 3D viewpoint change. This
`current paper provides a more in-depth development
`and analysis of this earlier work, while also present-
`ing a number of improvements in stability and feature
`invariance.
`There is a considerable body of previous research on
`identifying representations that are stable under scale
`change. Some of the first work in this area was by
`Crowley and Parker (1984), who developed a repre-
`sentation that identified peaks and ridges in scale space
`and linked these into a tree structure. The tree structure
`could then be matched between images with arbitrary
`scale change. More recent work on graph-based match-
`ing by Shokoufandeh et al. (1999) provides more dis-
`tinctive feature descriptors using wavelet coefficients.
`The problem of identifying an appropriate and con-
`sistent scale for feature detection has been studied in
`depth by Lindeberg (1993, 1994). He describes this as
`a problem of scale selection, and we make use of his
`results below.
`Recently, there has been an impressive body of
`work on extending local features to be invariant to
`full affine transformations (Baumberg, 2000; Tuyte-
`laars and Van Gool, 2000; Mikolajczyk and Schmid,
`2002; Schaffalitzky and Zisserman, 2002; Brown and
`Lowe, 2002). This allows for invariant matching to fea-
`tures on a planar surface under changes in orthographic
`3D projection, in most cases by resampling the image in
`a local affine frame. However, none of these approaches
`are yet fully affine invariant, as they start with initial
`feature scales and locations selected in a non-affine-
`invariant manner due to the prohibitive cost of explor-
`ing the full affine space. The affine frames are are also
`more sensitive to noise than those of the scale-invariant
`features, so in practice the affine features have lower
`repeatability than the scale-invariant features unless the
`affine distortion is greater than about a 40 degree tilt of
`a planar surface (Mikolajczyk, 2002). Wider affine in-
`variance may not be important for many applications,
`as training views are best taken at least every 30 de-
`grees rotation in viewpoint (meaning that recognition
`
`is within 15 degrees of the closest training view) in or-
`der to capture non-planar changes and occlusion effects
`for 3D objects.
`While the method to be presented in this paper is not
`fully affine invariant, a different approach is used in
`which the local descriptor allows relative feature posi-
`tions to shift significantly with only small changes in
`the descriptor. This approach not only allows the de-
`scriptors to be reliably matched across a considerable
`range of affine distortion, but it also makes the features
`more robust against changes in 3D viewpoint for non-
`planar surfaces. Other advantages include much more
`efficient feature extraction and the ability to identify
`larger numbers of features. On the other hand, affine
`invariance is a valuable property for matching planar
`surfaces under very large view changes, and further
`research should be performed on the best ways to com-
`bine this with non-planar 3D viewpoint invariance in
`an efficient and stable manner.
`Many other feature types have been proposed for use
`in recognition, some of which could be used in addition
`to the features described in this paper to provide fur-
`ther matches under differing circumstances. One class
`of features are those that make use of image contours
`or region boundaries, which should make them less
`likely to be disrupted by cluttered backgrounds near
`object boundaries. Matas et al. (2002) have shown that
`their maximally-stable extremal regions can produce
`large numbers of matching features with good stabil-
`ity. Mikolajczyk et al. (2003) have developed a new
`descriptor that uses local edges while ignoring unre-
`lated nearby edges, providing the ability to find stable
`features even near the boundaries of narrow shapes su-
`perimposed on background clutter. Nelson and Selinger
`(1998) have shown good results with local features
`based on groupings of image contours. Similarly, Pope
`and Lowe (2000) used features based on the hierarchi-
`cal grouping of image contours, which are particularly
`useful for objects lacking detailed texture.
`The history of research on visual recognition con-
`tains work on a diverse set of other image properties
`that can be used as feature measurements. Carneiro and
`Jepson (2002) describe phase-based local features that
`represent the phase rather than the magnitude of local
`spatial frequencies, which is likely to provide improved
`invariance to illumination. Schiele and Crowley (2000)
`have proposed the use of multidimensional histograms
`summarizing the distribution of measurements within
`image regions. This type of feature may be particu-
`larly useful for recognition of textured objects with
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 3 of 20
`
`

`

`94
`
`Lowe
`
`deformable shapes. Basri and Jacobs (1997) have
`demonstrated the value of extracting local region
`boundaries for recognition. Other useful properties to
`incorporate include color, motion, figure-ground dis-
`crimination, region shape descriptors, and stereo depth
`cues. The local feature approach can easily incorporate
`novel feature types because extra features contribute
`to robustness when they provide correct matches, but
`otherwise do little harm other than their cost of compu-
`tation. Therefore, future systems are likely to combine
`many feature types.
`
`3. Detection of Scale-Space Extrema
`
`As described in the introduction, we will detect key-
`points using a cascade filtering approach that uses effi-
`cient algorithms to identify candidate locations that are
`then examined in further detail. The first stage of key-
`point detection is to identify locations and scales that
`can be repeatably assigned under differing views of
`the same object. Detecting locations that are invariant
`to scale change of the image can be accomplished by
`searching for stable features across all possible scales,
`using a continuous function of scale known as scale
`space (Witkin, 1983).
`It has been shown by Koenderink (1984) and
`Lindeberg (1994) that under a variety of reasonable
`assumptions the only possible scale-space kernel is
`the Gaussian function. Therefore, the scale space of
`an image is defined as a function, L(x, y, σ ), that
`is produced from the convolution of a variable-scale
`Gaussian, G(x, y, σ ), with an input image, I (x, y):
`L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y),
`where ∗ is the convolution operation in x and y,
`and
`
`−(x 2+y2)/2σ 2 .
`
`G(x, y, σ ) = 1
`2π σ 2 e
`To efficiently detect stable keypoint locations in scale
`space, we have proposed (Lowe, 1999) using scale-
`space extrema in the difference-of-Gaussian function
`convolved with the image, D(x, y, σ ), which can be
`computed from the difference of two nearby scales sep-
`arated by a constant multiplicative factor k:
`D(x, y, σ ) = (G(x, y, kσ ) − G(x, y, σ )) ∗ I (x, y)
`= L(x, y, kσ ) − L(x, y, σ ).
`(1)
`
`There are a number of reasons for choosing this
`function. First, it is a particularly efficient function to
`compute, as the smoothed images, L , need to be com-
`puted in any case for scale space feature description,
`and D can therefore be computed by simple image
`subtraction.
`In addition, the difference-of-Gaussian function pro-
`vides a close approximation to the scale-normalized
`Laplacian of Gaussian, σ 2∇2G, asstudied by
`Lindeberg (1994). Lindeberg showed that the normal-
`ization of the Laplacian with the factor σ 2 is required
`for true scale invariance. In detailed experimental com-
`parisons, Mikolajczyk (2002) found that the maxima
`and minima of σ 2∇2G produce the most stable image
`features compared to a range of other possible image
`functions, such as the gradient, Hessian, or Harris cor-
`ner function.
`The relationship between D and σ 2∇2G can be un-
`derstood from the heat diffusion equation (parameter-
`ized in terms of σ rather than the more usual t = σ 2):
`= σ∇2G.
`∂G
`∂σ
`From this, we see that ∇2G can be computed from the
`finite difference approximation to ∂G/∂σ , using the
`difference of nearby scales at kσ and σ :
`σ∇2G = ∂G
`
`≈ G(x, y, kσ ) − G(x, y, σ )
`kσ − σ
`
`∂σ
`
`and therefore,
`G(x, y, kσ ) − G(x, y, σ ) ≈ (k − 1)σ 2∇2G.
`
`This shows that when the difference-of-Gaussian func-
`tion has scales differing by a constant factor it already
`incorporates the σ 2 scale normalization required for
`the scale-invariant Laplacian. The factor (k − 1) in the
`equation is a constant over all scales and therefore does
`not influence extrema location. The approximation er-
`ror will go to zero as k goes to 1, but in practice we have
`found that the approximation has almost no impact on
`the stability of extrema detection or localization for
`2.
`An efficient approach to construction of D(x, y, σ )
`is shown in Fig. 1. The initial image is incrementally
`convolved with Gaussians to produce images separated
`by a constant factor k in scale space, shown stacked in
`the left column. We choose to divide each octave of
`scale space (i.e., doubling of σ ) into an integer num-
`ber, s, ofintervals, so k = 21/s . We must produce s + 3
`
`even significant differences in scale, such as k = √
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 4 of 20
`
`

`

`Distinctive Image Features from Scale-Invariant Keypoints
`
`95
`
`Figure 1. For each octave of scale space, the initial image is repeatedly convolved with Gaussians to produce the set of scale space images
`shown on the left. Adjacent Gaussian images are subtracted to produce the difference-of-Gaussian images on the right. After each octave, the
`Gaussian image is down-sampled by a factor of 2, and the process repeated.
`
`images in the stack of blurred images for each octave,
`so that final extrema detection covers a complete oc-
`tave. Adjacent image scales are subtracted to produce
`the difference-of-Gaussian images shown on the right.
`Once a complete octave has been processed, we resam-
`ple the Gaussian image that has twice the initial value
`of σ (it will be 2 images from the top of the stack) by
`taking every second pixel in each row and column. The
`accuracy of sampling relative to σ is no different than
`for the start of the previous octave, while computation
`is greatly reduced.
`
`3.1. Local Extrema Detection
`
`In order to detect the local maxima and minima of
`D(x, y, σ ), each sample point is compared to its eight
`neighbors in the current image and nine neighbors in
`the scale above and below (see Fig. 2). It is selected
`only if it is larger than all of these neighbors or smaller
`than all of them. The cost of this check is reasonably
`low due to the fact that most sample points will be
`eliminated following the first few checks.
`An important issue is to determine the frequency
`of sampling in the image and scale domains that is
`needed to reliably detect the extrema. Unfortunately,
`it turns out that there is no minimum spacing of sam-
`
`Figure 2. Maxima and minima of the difference-of-Gaussian im-
`ages are detected by comparing a pixel (marked with X) to its 26
`neighbors in 3 × 3 regions at the current and adjacent scales (marked
`with circles).
`
`ples that will detect all extrema, as the extrema can be
`arbitrarily close together. This can be seen by consid-
`ering a white circle on a black background, which will
`have a single scale space maximum where the circular
`positive central region of the difference-of-Gaussian
`function matches the size and location of the circle.
`For a very elongated ellipse, there will be two max-
`ima near each end of the ellipse. As the locations of
`maxima are a continuous function of the image, for
`some ellipse with intermediate elongation there will
`be a transition from a single maximum to two, with
`the maxima arbitrarily close to each other near the
`transition.
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 5 of 20
`
`

`

`96
`
`Lowe
`
`Figure 3. The top line of the first graph shows the percent of keypoints that are repeatably detected at the same location and scale in a
`transformed image as a function of the number of scales sampled per octave. The lower line shows the percent of keypoints that have their
`descriptors correctly matched to a large database. The second graph shows the total number of keypoints detected in a typical image as a function
`of the number of scale samples.
`
`Therefore, we must settle for a solution that trades
`off efficiency with completeness. In fact, as might be
`expected and is confirmed by our experiments, extrema
`that are close together are quite unstable to small per-
`turbations of the image. We can determine the best
`choices experimentally by studying a range of sampling
`frequencies and using those that provide the most reli-
`able results under a realistic simulation of the matching
`task.
`
`3.2. Frequency of Sampling in Scale
`
`The experimental determination of sampling frequency
`that maximizes extrema stability is shown in Figs. 3 and
`4. These figures (and most other simulations in this pa-
`per) are based on a matching task using a collection of
`32 real images drawn from a diverse range, including
`outdoor scenes, human faces, aerial photographs, and
`industrial images (the image domain was found to have
`almost no influence on any of the results). Each image
`was then subject to a range of transformations, includ-
`ing rotation, scaling, affine stretch, change in brightness
`and contrast, and addition of image noise. Because the
`changes were synthetic, it was possible to precisely
`predict where each feature in an original image should
`appear in the transformed image, allowing for measure-
`ment of correct repeatability and positional accuracy
`for each feature.
`Figure 3 shows these simulation results used to ex-
`amine the effect of varying the number of scales per
`octave at which the image function is sampled prior to
`
`Figure 4. The top line in the graph shows the percent of keypoint
`locations that are repeatably detected in a transformed image as a
`function of the prior image smoothing for the first level of each
`octave. The lower line shows the percent of descriptors correctly
`matched against a large database.
`
`extrema detection. In this case, each image was resam-
`pled following rotation by a random angle and scaling
`by a random amount between 0.2 of 0.9 times the orig-
`inal size. Keypoints from the reduced resolution image
`were matched against those from the original image so
`that the scales for all keypoints would be be present in
`the matched image. In addition, 1% image noise was
`added, meaning that each pixel had a random number
`added from the uniform interval [−0.01, 0.01] where
`pixel values are in the range [0, 1] (equivalent to pro-
`viding slightly less than 6 bits of accuracy for image
`pixels).
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 6 of 20
`
`

`

`Distinctive Image Features from Scale-Invariant Keypoints
`
`97
`
`The top line in the first graph of Fig. 3 shows the
`percent of keypoints that are detected at a matching
`location and scale in the transformed image. For all
`√
`examples in this paper, we define a matching scale as
`being within a factor of
`2 ofthe correct scale, and
`a matching location as being within σ pixels, where
`σ is the scale of the keypoint (defined from Eq. (1) as
`the standard deviation of the smallest Gaussian used
`in the difference-of-Gaussian function). The lower line
`on this graph shows the number of keypoints that are
`correctly matched to a database of 40,000 keypoints
`using the nearest-neighbor matching procedure to be
`described in Section 6 (this shows that once the key-
`point is repeatably located, it is likely to be useful for
`recognition and matching tasks). As this graph shows,
`the highest repeatability is obtained when sampling
`3 scales per octave, and this is the number of scale
`samples used for all other experiments throughout this
`paper.
`It might seem surprising that the repeatability does
`not continue to improve as more scales are sampled.
`The reason is that this results in many more local ex-
`trema being detected, but these extrema are on average
`less stable and therefore are less likely to be detected
`in the transformed image. This is shown by the second
`graph in Fig. 3, which shows the average number of
`keypoints detected and correctly matched in each im-
`age. The number of keypoints rises with increased sam-
`pling of scales and the total number of correct matches
`also rises. Since the success of object recognition often
`depends more on the quantity of correctly matched key-
`points, as opposed to their percentage correct match-
`ing, for many applications it will be optimal to use a
`larger number of scale samples. However, the cost of
`computation also rises with this number, so for the ex-
`periments in this paper we have chosen to use just 3
`scale samples per octave.
`To summarize, these experiments show that the
`scale-space difference-of-Gaussian function has a large
`number of extrema and that it would be very expensive
`to detect them all. Fortunately, we can detect the most
`stable and useful subset even with a coarse sampling
`of scales.
`
`3.3. Frequency of Sampling in the Spatial Domain
`
`Just as we determined the frequency of sampling per oc-
`tave of scale space, so we must determine the frequency
`of sampling in the image domain relative to the scale of
`smoothing. Given that extrema can be arbitrarily close
`
`together, there will be a similar trade-off between sam-
`pling frequency and rate of detection. Figure 4 shows
`an experimental determination of the amount of prior
`smoothing, σ , that is applied to each image level before
`building the scale space representation for an octave.
`Again, the top line is the repeatability of keypoint de-
`tection, and the results show that the repeatability con-
`tinues to increase with σ . However, there is a cost to
`using a large σ in terms of efficiency, so we have cho-
`sen to use σ = 1.6, which provides close to optimal
`repeatability. This value is used throughout this paper
`and was used for the results in Fig. 3.
`Of course, if we pre-smooth the image before ex-
`trema detection, we are effectively discarding the high-
`est spatial frequencies. Therefore, to make full use of
`the input, the image can be expanded to create more
`sample points than were present in the original. We
`double the size of the input image using linear inter-
`polation prior to building the first level of the pyra-
`mid. While the equivalent operation could effectively
`have been performed by using sets of subpixel-offset
`filters on the original image, the image doubling leads
`to a more efficient implementation. We assume that the
`original image has a blur of at least σ = 0.5 (the min-
`imum needed to prevent significant aliasing), and that
`therefore the doubled image has σ = 1.0 relative to
`its new pixel spacing. This means that little additional
`smoothing is needed prior to creation of the first oc-
`tave of scale space. The image doubling increases the
`number of stable keypoints by almost a factor of 4, but
`no significant further improvements were found with a
`larger expansion factor.
`
`4. Accurate Keypoint Localization
`
`Once a keypoint candidate has been found by com-
`paring a pixel to its neighbors, the next step is to per-
`form a detailed fit to the nearby data for location, scale,
`and ratio of principal curvatures. This information al-
`lows points to be rejected that have low contrast (and
`are therefore sensitive to noise) or are poorly localized
`along an edge.
`The initial implementation of this approach (Lowe,
`1999) simply located keypoints at the location and scale
`of the central sample point. However, recently Brown
`has developed a method (Brown and Lowe, 2002) for
`fitting a 3D quadratic function to the local sample points
`to determine the interpolated location of the maximum,
`and his experiments showed that this provides a sub-
`stantial improvement to matching and stability. His
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 7 of 20
`
`

`

`98
`
`Lowe
`
`approach uses the Taylor expansion (up to the quadratic
`terms) of the scale-space function, D(x, y, σ ), shifted
`so that the origin is at the sample point:
`
`D(x) = D + ∂ D
`∂x
`
`T
`
`x + 1
`2
`
`xT
`
`∂ 2 D
`∂x2 x
`
`(2)
`
`where D and its derivatives are evaluated at the sample
`point and x = (x, y, σ )T is the offset from this point.
`The location of the extremum, ˆx, isdetermined by tak-
`ing the derivative of this function with respect to x and
`setting it to zero, giving
`
`ˆx = − ∂ 2 D
`∂x2
`
`−1 ∂ D
`∂x
`
`.
`
`(3)
`
`As suggested by Brown, the Hessian and derivative
`of D are approximated by using differences of neigh-
`boring sample points. The resulting 3×3 linear system
`can be solved with minimal cost. If the offset ˆx is larger
`than 0.5 in any dimension, then it means that the ex-
`tremum lies closer to a different sample point. In this
`
`case, the sample point is changed and the interpolation
`performed instead about that point. The final offset ˆx
`is added to the location of its sample point to get the
`interpolated estimate for the location of the extremum.
`The function value at the extremum, D(ˆx), is useful
`for rejecting unstable extrema with low contrast. This
`can be obtained by substituting Eqs (3) into (2), giving
`
`D(ˆx) = D + 1
`2
`
`∂ D
`∂x
`
`T
`
`ˆx.
`
`For the experiments in this paper, all extrema with
`a value of D(ˆx) less than 0.03 were discarded (as
`before, we assume image pixel values in the range
`[0, 1]).
`Figure 5 shows the effects of keypoint selection on
`a natural image. In order to avoid too much clutter, a
`low-resolution 233 by 189 pixel image is used and key-
`points are shown as vectors giving the location, scale,
`and orientation of each keypoint (orientation assign-
`ment is described below). Figure 5(a) shows the origi-
`nal image, which is shown at reduced contrast behind
`
`Figure 5. This figure shows the stages of keypoint selection. (a) The 233 × 189 pixel original image. (b) The initial 832 keypoints locations
`at maxima and minima of the difference-of-Gaussian function. Keypoints are displayed as vectors indicating scale, orientation, and location.
`(c) After applying a threshold on minimum contrast, 729 keypoints remain. (d) The final 536 keypoints that remain following an additional
`threshold on ratio of principal curvatures.
`
`AVIGILON EX. 2021
`IPR2019-00311
`Page 8 of 20
`
`

`

`Distinctive Image Features from Scale-Invariant Keypoints
`
`99
`
`the subsequent figures. Figure 5(b) shows the 832
`keypoints at all detected maxima and minima of the
`difference-of-Gaussian function, while (c) shows the
`729 keypoints that remain following removal of those
`with a value of D(ˆx) less than 0.03. Part (d) will be
`explained in the following section.
`
`4.1. Eliminating Edge Responses
`
`For stability, it is not sufficient to reject keypoints with
`low contrast. The difference-of-Gaussian function will
`have a strong response along edges, even if the loca-
`tion along the edge is poorly determined and therefore
`unstable to small amounts of noise.
`A poorly defined peak in the difference-of-Gaussian
`function will have a large principal curvature across
`the edge but a small one in the perpendicular direction.
`The principal curvatures can be computed from a 2× 2
`Hessian matrix, H, computed at the location and scale
`of the keypoint:
`
`one, so that α = rβ. Then,
`= (α + β)2
`= (rβ + β)2
`Tr(H)2
`Det(H)
`
`rβ2
`
`αβ
`
`= (r + 1)2
`
`r
`
`,
`
`which depends only on the ratio of the eigenval-
`ues rather than their individual values. The quantity
`(r + 1)2/r is at a minimum when the two eigenvalues
`are equal and it increases with r. Therefore, to check
`that the ratio of principal curvatures is below some
`threshold, r, weonly need to check
`(r + 1)2
`r
`
`Tr(H)2
`Det(H)
`
`<
`
`.
`
`This is very efficient to compute, with less than 20
`floating point operations required to test each keypoint.
`The experiments in this paper use a value of r = 10,
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket