throbber
Object Recognition from Local Scale-Invariant Features
`
`David G. Lowe
`Computer Science Department
`University of British Columbia
`Vancouver, B.C., V6T 1Z4, Canada
`lowe@cs.ubc.ca
`
`Abstract
`
`An object recognition system has been developed that uses a
`new class of local image features. The features are invariant
`to image scaling, translation, and rotation, and partially in-
`variant to illumination changes and affine or 3D projection.
`These features share similar properties with neurons in in-
`ferior temporal cortex that are used for object recognition
`in primate vision. Features are efficiently detected through
`a staged filtering approach that identifies stable points in
`scale space. Image keys are created that allow for local ge-
`ometric deformations by representing blurred image gradi-
`ents in multiple orientation planes and at multiple scales.
`The keys are used as input to a nearest-neighbor indexing
`method that identifies candidate object matches. Final veri-
`fication of each match is achieved by finding a low-residual
`least-squares solution for the unknown model parameters.
`Experimental results show that robust object recognition
`can be achieved in cluttered partially-occluded images with
`a computation time of under 2 seconds.
`
`1. Introduction
`
`Object recognition in cluttered real-world scenes requires
`local image features that are unaffected by nearby clutter or
`partial occlusion. The features must be at least partially in-
`variant to illumination, 3D projective transforms, and com-
`mon object variations. On the other hand, the features must
`also be sufficiently distinctive to identify specific objects
`among many alternatives. The difficulty of the object recog-
`nition problem is due in large part to the lack of success in
`finding such image features. However, recent research on
`the use of dense local features (e.g., Schmid & Mohr [19])
`has shown that efficient recognition can often be achieved
`by using local image descriptors sampled at a large number
`of repeatable locations.
`This paper presents a new method for image feature gen-
`eration called the Scale Invariant Feature Transform (SIFT).
`This approach transforms an image into a large collection
`of local feature vectors, each of which is invariant to image
`
`translation, scaling, and rotation, and partially invariant to
`illumination changes and affine or 3D projection. Previous
`approaches to local feature generation lacked invariance to
`scale and were more sensitive to projective distortion and
`illumination change. The SIFT features share a number of
`properties in common with the responses of neurons in infe-
`rior temporal (IT) cortex in primate vision. This paper also
`describes improved approaches to indexing and model ver-
`ification.
`The scale-invariant features are efficiently identified by
`using a staged filtering approach. The first stage identifies
`key locations in scale space by looking for locations that
`are maxima or minima of a difference-of-Gaussian function.
`Each point is used to generate a feature vector that describes
`the local image region sampled relative to its scale-space co-
`ordinate frame. The features achieve partial invariance to
`local variations, such as affine or 3D projections, by blur-
`ring image gradient locations. This approach is based on a
`model of the behavior of complex cells in the cerebral cor-
`tex of mammalian vision. The resulting feature vectors are
`called SIFT keys. In the current implementation, each im-
`age generates on the order of 1000 SIFT keys, a process that
`requires less than 1 second of computation time.
`The SIFT keys derived from an image are used in a
`nearest-neighbour approach to indexing to identify candi-
`date object models. Collections of keys that agree on a po-
`tential model pose are first identified through a Hough trans-
`form hash table, and then through a least-squares fit to a final
`estimate of model parameters. When at least 3 keys agree
`on the model parameters with low residual, there is strong
`evidence for the presence of the object. Since there may be
`dozens of SIFT keys in the image of a typical object, it is
`possible to have substantial levels of occlusion in the image
`and yet retain high levels of reliability.
`The current object models are represented as 2D loca-
`tions of SIFT keys that can undergo affine projection. Suf-
`ficient variation in feature location is allowed to recognize
`perspective projection of planar shapes at up to a 60 degree
`rotation away from the camera or to allow up to a 20 degree
`rotation of a 3D object.
`
`Yuneec Exhibit 1023 Page 1
`
`

`
`2. Related research
`Object recognition is widely used in the machine vision in-
`dustry for the purposes of inspection, registration, and ma-
`nipulation. However, current commercial systems for object
`recognition depend almost exclusively on correlation-based
`template matching. While very effective for certain engi-
`neered environments, where object pose and illumination
`are tightly controlled, template matching becomes computa-
`tionally infeasible when object rotation, scale, illumination,
`and 3D pose are allowed to vary, and even more so when
`dealing with partial visibility and large model databases.
`An alternative to searching all
`image locations for
`matches is to extract features from the image that are at
`least partially invariant to the image formation process and
`matching only to those features. Many candidate feature
`types have been proposed and explored, including line seg-
`ments [6], groupings of edges [11, 14], and regions [2],
`among many other proposals. While these features have
`worked well for certain object classes, they are often not de-
`tected frequently enough or with sufficient stability to form
`a basis for reliable recognition.
`There has been recent work on developing much denser
`collections of image features. One approach has been to
`use a corner detector (more accurately, a detector of peaks
`in local image variation) to identify repeatable image loca-
`tions, around which local image properties can be measured.
`Zhang et al. [23] used the Harris corner detector to iden-
`tify feature locations for epipolar alignment of images taken
`from differing viewpoints. Rather than attempting to cor-
`relate regions from one image against all possible regions
`in a second image, large savings in computation time were
`achieved by only matching regions centered at corner points
`in each image.
`For the object recognition problem, Schmid & Mohr
`[19] also used the Harris corner detector to identify in-
`terest points, and then created a local image descriptor at
`each interest point from an orientation-invariant vector of
`derivative-of-Gaussian image measurements. These image
`descriptors were used for robust object recognition by look-
`ing for multiple matching descriptors that satisfied object-
`based orientation and location constraints. This work was
`impressive both for the speed of recognition in a large
`database and the ability to handle cluttered images.
`The corner detectors used in these previous approaches
`have a major failing, which is that they examine an image
`at only a single scale. As the change in scale becomes sig-
`nificant, these detectors respond to different image points.
`Also, since the detector does not provide an indication of the
`object scale, it is necessary to create image descriptors and
`attempt matching at a large number of scales. This paper de-
`scribes an efficient method to identify stable key locations
`in scale space. This means that different scalings of an im-
`age will have no effect on the set of key locations selected.
`
`Furthermore, an explicit scale is determined for each point,
`which allows the image description vector for that point to
`be sampled at an equivalent scale in each image. A canoni-
`cal orientation is determined at each location, so that match-
`ing can be performed relative to a consistent local 2D co-
`ordinate frame. This allows for the use of more distinctive
`image descriptors than the rotation-invariant ones used by
`Schmid and Mohr, and the descriptor is further modified to
`improve its stability to changes in affine projection and illu-
`mination.
`Other approaches to appearance-based recognition in-
`clude eigenspace matching [13], color histograms [20], and
`receptive field histograms [18]. These approaches have all
`been demonstrated successfully on isolated objects or pre-
`segmented images, but due to their more global features it
`has been difficult to extend them to cluttered and partially
`occluded images. Ohba & Ikeuchi [15] successfully apply
`the eigenspace approach to cluttered images by using many
`small local eigen-windows, but this then requires expensive
`search for each window in a new image, as with template
`matching.
`
`3. Key localization
`We wish to identify locations in image scale space that are
`invariant with respect to image translation, scaling, and ro-
`tation, and are minimally affected by noise and small dis-
`tortions. Lindeberg [8] has shown that under some rather
`general assumptions on scale invariance, the Gaussian ker-
`nel and its derivatives are the only possible smoothing ker-
`nels for scale space analysis.
`To achieve rotation invariance and a high level of effi-
`ciency, we have chosen to select key locations at maxima
`and minima of a difference of Gaussian function applied in
`scale space. This can be computed very efficiently by build-
`ing an image pyramid with resampling between each level.
`Furthermore, it locates key points at regions and scales of
`high variation, making these locations particularly stable for
`characterizing the image. Crowley & Parker [4] and Linde-
`berg [9] have previously used the difference-of-Gaussian in
`scale space for other purposes. In the following, we describe
`a particularly efficient and stable method to detect and char-
`acterize the maxima and minima of this function.
`As the 2D Gaussian function is separable, its convolution
`with the input image can be efficiently computed by apply-
`ing two passes of the 1D Gaussian function in the horizontal
`and vertical directions:
`
`gx =
`
`
`p
`
`e(cid:0)x= 
`
`For key localization, all smoothing operations are done us-
`ing = p, which can be approximated with sufficient ac-
`curacy using a 1D kernel with 7 sample points.
`
`Yuneec Exhibit 1023 Page 2
`
`

`
`The input image is first convolved with the Gaussian
`function using = p to give an image A. This is then
`repeated a second time with a further incremental smooth-
`ing of = p to give a new image, B, which now has an
`effective smoothing of = . The difference of Gaussian
`function is obtained by subtracting image B from A, result-
`ing in a ratio of =p = p between the two Gaussians.
`To generate the next pyramid level, we resample the al-
`ready smoothed image B using bilinear interpolation with a
`pixel spacing of 1.5 in each direction. While it may seem
`more natural to resample with a relative scale of p, the
`only constraint is that sampling be frequent enough to de-
`tect peaks. The 1.5 spacing means that each new sample will
`be a constant linear combination of 4 adjacent pixels. This
`is efficient to compute and minimizes aliasing artifacts that
`would arise from changing the resampling coefficients.
`Maxima and minima of this scale-space function are de-
`termined by comparing each pixel in the pyramid to its
`neighbours. First, a pixel is compared to its 8 neighbours at
`the same level of the pyramid. If it is a maxima or minima
`at this level, then the closest pixel location is calculated at
`the next lowest level of the pyramid, taking account of the
`1.5 times resampling. If the pixel remains higher (or lower)
`than this closest pixel and its 8 neighbours, then the test is
`repeated for the level above. Since most pixels will be elim-
`inated within a few comparisons, the cost of this detection is
`small and much lower than that of building the pyramid.
`If the first level of the pyramid is sampled at the same rate
`as the input image, the highest spatial frequencies will be ig-
`nored. This is due to the initial smoothing, which is needed
`to provide separation of peaks for robust detection. There-
`fore, we expand the input image by a factor of 2, using bilin-
`ear interpolation, prior to building the pyramid. This gives
`on the order of 1000 key points for a typical     pixel
`image, compared to only a quarter as many without the ini-
`tial expansion.
`
`3.1. SIFT key stability
`To characterize the image at each key location, the smoothed
`image A at each level of the pyramid is processed to extract
`image gradients and orientations. At each pixel, Aij, the im-
`age gradient magnitude, Mij, and orientation, R ij, are com-
`puted using pixel differences:
`
`Mij =qAij (cid:0) Ai+ ;j + Aij (cid:0) Ai;j+ 
`R ij = atan Aij (cid:0) Ai+ ;j; Ai;j+ (cid:0) Aij
`The pixel differences are efficient to compute and provide
`sufficient accuracy due to the substantial level of previous
`smoothing. The effective half-pixel shift in position is com-
`pensated for when determining key location.
`Robustness to illuminationchange is enhanced by thresh-
`olding the gradient magnitudes at a value of 0.1 times the
`
`Figure 1: The second image was generated from the first by
`rotation, scaling, stretching, change of brightness and con-
`trast, and addition of pixel noise. In spite of these changes,
`78% of the keys from the first image have a closely match-
`ing key in the second image. These examples show only a
`subset of the keys to reduce clutter.
`
`maximum possible gradient value. This reduces the effect
`of a change in illumination direction for a surface with 3D
`relief, as an illumination change may result in large changes
`to gradient magnitude but is likely to have less influence on
`gradient orientation.
`Each key location is assigned a canonical orientation so
`that the image descriptors are invariant to rotation. In or-
`der to make this as stable as possible against lighting or con-
`trast changes, the orientation is determined by the peak in a
`histogram of local image gradient orientations. The orien-
`tation histogram is created using a Gaussian-weighted win-
`dow with of 3 times that of the current smoothing scale.
`These weights are multiplied by the thresholded gradient
`values and accumulated in the histogram at locations corre-
`sponding to the orientation, R ij. The histogram has 36 bins
`covering the 360 degree range of rotations, and is smoothed
`prior to peak selection.
`The stability of the resulting keys can be tested by sub-
`jecting natural images to affine projection, contrast and
`brightness changes, and addition of noise. The location of
`each key detected in the first image can be predicted in the
`transformed image from knowledge of the transform param-
`eters. This framework was used to select the various sam-
`pling and smoothing parameters given above, so that max-
`
`Yuneec Exhibit 1023 Page 3
`
`

`
`Image transformation
`A. Increase contrast by 1.2
`B. Decrease intensity by 0.2
`C. Rotate by 20 degrees
`D. Scale by 0.7
`E. Stretch by 1.2
`F. Stretch by 1.5
`G. Add 10% pixel noise
`H. All of A,B,C,D,E,G.
`
`Match % Ori %
`89.0
`86.6
`88.5
`85.9
`85.4
`81.0
`85.1
`80.3
`83.5
`76.1
`77.7
`65.0
`90.3
`88.4
`78.6
`71.8
`
`Figure 2: For various image transformations applied to a
`sample of 20 images, this table gives the percent of keys that
`are found at matching locations and scales (Match %) and
`that also match in orientation (Ori %).
`
`imum efficiency could be obtained while retaining stability
`to changes.
`Figure 1 shows a relatively small number of keys de-
`tected over a 2 octave range of only the larger scales (to
`avoid excessive clutter). Each key is shown as a square, with
`a line from the center to one side of the square indicating ori-
`entation. In the second half of this figure, the image is ro-
`tated by 15 degrees, scaled by a factor of 0.9, and stretched
`by a factor of 1.1 in the horizontal direction. The pixel inten-
`sities, in the range of 0 to 1, have 0.1 subtracted from their
`brightness values and the contrast reduced by multiplication
`by 0.9. Random pixel noise is then added to give less than
`5 bits/pixel of signal. In spite of these transformations, 78%
`of the keys in the first image had closely matching keys in
`the second image at the predicted locations, scales, and ori-
`entations
`The overall stability of the keys to image transformations
`can be judged from Table 2. Each entry in this table is gen-
`erated from combining the results of 20 diverse test images
`and summarizes the matching of about 15,000 keys. Each
`line of the table shows a particular image transformation.
`The first figure gives the percent of keys that have a match-
`ing key in the transformed image within in location (rel-
`ative to scale for that key) and a factor of 1.5 in scale. The
`second column gives the percent that match these criteria as
`well as having an orientation within 20 degrees of the pre-
`diction.
`
`4. Local image description
`Given a stable location, scale, and orientation for each key, it
`is now possible to describe the local image region in a man-
`ner invariant to these transformations. In addition, it is desir-
`able to make this representation robust against small shifts
`in local geometry, such as arise from affine or 3D projection.
`
`One approach to this is suggested by the response properties
`of complex neurons in the visual cortex, in which a feature
`position is allowed to vary over a small region while orienta-
`tion and spatial frequency specificity are maintained. Edel-
`man, Intrator & Poggio [5] have performed experiments that
`simulated the responses of complex neurons to different 3D
`views of computer graphic models, and found that the com-
`plex cell outputs provided much better discrimination than
`simple correlation-based matching. This can be seen, for ex-
`ample, if an affine projection stretches an image in one di-
`rection relative to another, which changes the relative loca-
`tions of gradient features while having a smaller effect on
`their orientations and spatial frequencies.
`This robustness to local geometric distortion can be ob-
`tained by representing the local image region with multiple
`images representing each of a number of orientations (re-
`ferred to as orientation planes). Each orientation plane con-
`tains only the gradients corresponding to that orientation,
`with linear interpolation used for intermediate orientations.
`Each orientation plane is blurred and resampled to allow for
`larger shifts in positions of the gradients.
`This approach can be efficiently implemented by using
`the same precomputed gradients and orientations for each
`level of the pyramid that were used for orientation selection.
`For each keypoint, we use the pixel sampling from the pyra-
`mid level at which the key was detected. The pixels that fall
`in a circle of radius 8 pixels around the key location are in-
`serted into the orientation planes. The orientation is mea-
`sured relative to that of the key by subtracting the key’s ori-
`entation. For our experiments we used 8 orientation planes,
`each sampled over a    grid of locations, with a sample
`spacing 4 times that of the pixel spacing used for gradient
`detection. The blurring is achieved by allocating the gradi-
`ent of each pixel among its 8 closest neighbors in the sam-
`ple grid, using linear interpolation in orientation and the two
`spatial dimensions. This implementation is much more effi-
`cient than performing explicit blurring and resampling, yet
`gives almost equivalent results.
`In order to sample the image at a larger scale, the same
`process is repeated for a second level of the pyramid one oc-
`tave higher. However, this time a    rather than a   
`sample region is used. This means that approximately the
`same image region will be examined at both scales, so that
`any nearby occlusions will not affect one scale more than the
`other. Therefore, the total number of samples in the SIFT
`key vector, from both scales, is    +    or 160
`elements, giving enough measurements for high specificity.
`
`5. Indexing and matching
`
`For indexing, we need to store the SIFT keys for sample im-
`ages and then identify matching keys from new images. The
`problem of identifyingthe most similar keys for high dimen-
`
`Yuneec Exhibit 1023 Page 4
`
`

`
`Figure 3: Model images of planar objects are shown in the
`top row. Recognition results below show model outlines and
`image keys used for matching.
`
`the equation above can be rewritten as
`
` 
`
`u v...
`
`=
`
`
` 
`
`This equation shows a single match, but any number of fur-
`ther matches can be added, with each match contributing
`two more rows to the first and last matrix. At least 3 matches
`are needed to provide a solution.
`We can write this linear system as
`
`m
`
`m
`
`m
`
`m
`
`tx
`ty
`
`
`
` 
`
`
`
`y
`
`
`
`x y
`
`
`
`
`
`
`
`x
`
`: : :
`
`: : :
`
`
`
`sional vectors is known to have high complexity if an ex-
`act solution is required. However, a modification of the k-d
`tree algorithm called the best-bin-first search method (Beis
`& Lowe [3]) can identify the nearest neighbors with high
`probability using only a limited amount of computation. To
`further improve the efficiency of the best-bin-first algorithm,
`the SIFT key samples generated at the larger scale are given
`twice the weight of those at the smaller scale. This means
`that the larger scale is in effect able to filter the most likely
`neighbours for checking at the smaller scale. This also im-
`proves recognition performance by giving more weight to
`the least-noisy scale. In our experiments, it is possible to
`have a cut-off for examining at most 200 neighbors in a
`probabilisticbest-bin-first search of 30,000 key vectors with
`almost no loss of performance compared to finding an exact
`solution.
`An efficient way to cluster reliable model hypotheses
`is to use the Hough transform [1] to search for keys that
`agree upon a particular model pose. Each model key in the
`database contains a record of the key’s parameters relative
`to the model coordinate system. Therefore, we can create
`an entry in a hash table predicting the model location, ori-
`entation, and scale from the match hypothesis. We use a
`bin size of 30 degrees for orientation, a factor of 2 for scale,
`and 0.25 times the maximum model dimension for location.
`These rather broad bin sizes allow for clustering even in the
`presence of substantial geometric distortion, such as due to a
`change in 3D viewpoint. To avoid the problem of boundary
`effects in hashing, each hypothesis is hashed into the 2 clos-
`est bins in each dimension, giving a total of 16 hash table
`entries for each hypothesis.
`
`6. Solution for affine parameters
`
`The hash table is searched to identify all clusters of at least
`3 entries in a bin, and the bins are sorted into decreasing or-
`der of size. Each such cluster is then subject to a verification
`procedure in which a least-squares solution is performed for
`the affine projection parameters relating the model to the im-
`age.
`The affine transformation of a model point x yT to an
`image point u vT can be written as
`
`
`
`" uv  =" m mm m " xy  +" txty 
`
`
`
`
`
`
`
`where the model translation is tx tyT and the affine rota-
`tion, scale, and stretch are represented by the mi parameters.
`We wish to solve for the transformation parameters, so
`
`Ax = b
`
`The least-squares solution for the parameters x can be deter-
`
`Yuneec Exhibit 1023 Page 5
`
`

`
`Figure 4: Top row shows model images for 3D objects with
`outlines found by background segmentation. Bottom image
`shows recognition results for 3D objects with model outlines
`and image keys used for matching.
`
`mined by solving the corresponding normal equations,
`
`x = ATA(cid:0) ATb
`
`which minimizes the sum of the squares of the distances
`from the projected model locations to the corresponding im-
`age locations. This least-squares approach could readily be
`extended to solving for 3D pose and internal parameters of
`articulated and flexible objects [12].
`Outliers can now be removed by checking for agreement
`between each image feature and the model, given the param-
`eter solution. Each match must agree within 15 degrees ori-
`entation, p change in scale, and 0.2 times maximum model
`size in terms of location. If fewer than 3 points remain after
`discarding outliers, then the match is rejected. If any outliers
`are discarded, the least-squares solution is re-solved with the
`remaining points.
`
`Figure 5: Examples of 3D object recognition with occlusion.
`
`7. Experiments
`The affine solution provides a good approximation to per-
`spective projection of planar objects, so planar models pro-
`vide a good initial test of the approach. The top row of Fig-
`ure 3 shows three model images of rectangular planar faces
`of objects. The figure also shows a cluttered image contain-
`ing the planar objects, and the same image is shown over-
`layed with the models following recognition. The model
`keys that are displayed are the ones used for recognition and
`final least-squares solution. Since only 3 keys are needed
`for robust recognition, it can be seen that the solutions are
`highly redundant and would survive substantial occlusion.
`Also shown are the rectangular borders of the model images,
`projected using the affine transform from the least-square
`solution. These closely agree with the true borders of the
`planar regions in the image, except for small errors intro-
`duced by the perspective projection. Similar experiments
`have been performed for many images of planar objects, and
`the recognition has proven to be robust to at least a 60 degree
`rotation of the object in any direction away from the camera.
`Although the model images and affine parameters do not
`account for rotation in depth of 3D objects, they are still
`sufficient to perform robust recognition of 3D objects over
`about a 20 degree range of rotation in depth away from each
`model view. An example of three model images is shown in
`
`Yuneec Exhibit 1023 Page 6
`
`

`
`est match to the correct corresponding key in the second im-
`age. Any 3 of these keys would be sufficient for recognition.
`While matching keys are not found in some regions where
`highlights or shadows change (for example on the shiny top
`of the camera) in general the keys show good invariance to
`illumination change.
`
`8. Connections to biological vision
`The performance of human vision is obviously far superior
`to that of current computer vision systems, so there is poten-
`tially much to be gained by emulating biological processes.
`Fortunately, there have been dramatic improvements within
`the past few years in understanding how object recognition
`is accomplished in animals and humans.
`Recent research in neuroscience has shown that object
`recognition in primates makes use of features of intermedi-
`ate complexity that are largely invariant to changes in scale,
`location, and illumination (Tanaka [21], Perrett & Oram
`[16]). Some examples of such intermediate features found
`in inferior temporal cortex (IT) are neurons that respond to
`a dark five-sided star shape, a circle with a thin protruding
`element, or a horizontal textured region within a triangular
`boundary. These neurons maintain highly specific responses
`to shape features that appear anywhere within a large por-
`tion of the visual field and over a several octave range of
`scales (Ito et. al [7]). The complexity of many of these fea-
`tures appears to be roughly the same as for the current SIFT
`features, although there are also some neurons that respond
`to more complex shapes, such as faces. Many of the neu-
`rons respond to color and texture properties in addition to
`shape. The feature responses have been shown to depend
`on previous visual learning from exposure to specific objects
`containing the features (Logothetis, Pauls & Poggio [10]).
`These features appear to be derived in the brain by a highly
`computation-intensive parallel process, which is quite dif-
`ferent from the staged filtering approach given in this paper.
`However, the results are much the same: an image is trans-
`formed into a large set of local features that each match a
`small fraction of potential objects yet are largely invariant
`to common viewing transformations.
`It is also known that object recognition in the brain de-
`pends on a serial process of attention to bind features to ob-
`ject interpretations, determine pose, and segment an object
`from a cluttered background [22]. This process is presum-
`ably playing the same role in verification as the parameter
`solving and outlier detection used in this paper, since the
`accuracy of interpretations can often depend on enforcing a
`single viewpoint constraint [11].
`
`9. Conclusions and comments
`The SIFT features improve on previous approaches by being
`largely invariant to changes in scale, illumination, and local
`
`Figure 6: Stability of image keys is tested under differing
`illumination. The first image is illuminated from upper left
`and the second from center right. Keys shown in the bottom
`image were those used to match second image to first.
`
`the top row of Figure 4. The models were photographed on a
`black background, and object outlines extracted by segment-
`ing out the background region. An example of recognition is
`shown in the same figure, again showing the SIFT keys used
`for recognition. The object outlines are projected using the
`affine parameter solution, but this time the agreement is not
`as close because the solution does not account for rotation
`in depth. Figure 5 shows more examples in which there is
`significant partial occlusion.
`The images in these examples are of size     pix-
`els. The computation times for recognition of all objects in
`each image are about 1.5 seconds on a Sun Sparc 10 pro-
`cessor, with about 0.9 seconds required to build the scale-
`space pyramid and identify the SIFT keys, and about 0.6
`seconds to perform indexing and least-squares verification.
`This does not include time to pre-process each model image,
`which would be about 1 second per image, but would only
`need to be done once for initial entry into a model database.
`The illumination invariance of the SIFT keys is demon-
`strated in Figure 6. The two images are of the same scene
`from the same viewpoint, except that the first image is il-
`luminated from the upper left and the second from the cen-
`ter right. The full recognition system is run to identify the
`second image using the first image as the model, and the
`second image is correctly recognized as matching the first.
`Only SIFT keys that were part of the recognition are shown.
`There were 273 keys that were verified as part of the final
`match, which means that in each case not only was the same
`key detected at the same location, but it also was the clos-
`
`Yuneec Exhibit 1023 Page 7
`
`

`
`affine distortions. The large number of features in a typical
`image allow for robust recognition under partial occlusion in
`cluttered images. A final stage that solves for affine model
`parameters allows for more accurate verification and pose
`determination than in approaches that rely only on indexing.
`An important area for further research is to build models
`from multiple views that represent the 3D structure of ob-
`jects. This would have the further advantage that keys from
`multiple viewing conditions could be combined into a single
`model, thereby increasing the probability of finding matches
`in new views. The models could be true 3D representations
`based on structure-from-motion solutions, or could repre-
`sent the space of appearance in terms of automated cluster-
`ing and interpolation (Pope & Lowe [17]). An advantage of
`the latter approach is that it could also model non-rigid de-
`formations.
`The recognition performance could be further improved
`by adding new SIFT feature types to incorporate color, tex-
`ture, and edge groupings, as well as varying feature sizes
`and offsets. Scale-invariant edge groupings that make local
`figure-ground discriminations would be particularly useful
`at object boundaries where background clutter can interfere
`with other features. The indexing and verification frame-
`work allows for all types of scale and rotation invariant fea-
`tures to be incorporated into a single model representation.
`Maximum robustness would be achieved by detecting many
`different feature types and relying on the indexing and clus-
`tering to select those that are most useful in a particular im-
`age.
`
`References
`
`[1] Ballard, D.H., “Generalizing the Hough transform to detect
`arbitrary patterns,” Pattern Recognition, 13, 2 (1981), pp.
`111-122.
`[2] Basri, Ronen, and David. W. Jacobs, “Recognition using re-
`gion correspondences,” International Journal of Computer
`Vision,25, 2 (1996), pp. 141–162.
`[3] Beis, Jeff, and David G. Lowe, “Shape indexing using
`approximate nearest-neighbour search in high-dimensional
`spaces,”ConferenceonComputerVisionandPatternRecog-
`nition,Puerto Rico (1997), pp. 1000–1006.
`[4] Crowley, James L., and Alice C. P

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket