IPR2017-00123, No. 1024-23 Exhibit - Exhibit 1024 (P.T.A.B. Oct. 21, 2016)

Available online at www.sciencedirect.com
`
`Computer Vision and Image Understanding 110 (2008) 346–359
`
`www.elsevier.com/locate/cviu
`
`Speeded-Up Robust Features (SURF)
`
`Herbert Bay a, Andreas Ess a,*, Tinne Tuytelaars b, Luc Van Gool a,b
`
`a ETH Zurich, BIWI, Sternwartstrasse 7, CH-8092 Zurich, Switzerland
`b K.U. Leuven, ESAT-PSI, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
`
`Received 31 October 2006; accepted 5 September 2007
`Available online 15 December 2007
`
`Abstract
`
`This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features).
`SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness,
`yet can be computed and compared much faster.
`This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors
`and descriptors (speciﬁcally, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by sim-
`plifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps.
`The paper encompasses a detailed description of the detector and descriptor and then explores the eﬀects of the most important param-
`eters. We conclude the article with SURF’s application to two challenging, yet converse goals: camera calibration as a special case of image
`registration, and object recognition. Our experiments underline SURF’s usefulness in a broad range of topics in computer vision.
`Ó 2007 Elsevier Inc. All rights reserved.
`
`Keywords: Interest points; Local features; Feature description; Camera calibration; Object recognition
`
`1. Introduction
`
`The task of ﬁnding point correspondences between two
`images of the same scene or object is part of many com-
`puter vision applications. Image registration, camera cali-
`bration, object recognition, and image retrieval are just a
`few.
`The search for discrete image point correspondences can
`be divided into three main steps. First, ‘interest points’ are
`selected at distinctive locations in the image, such as cor-
`ners, blobs, and T-junctions. The most valuable property
`of an interest point detector is its repeatability. The repeat-
`ability expresses the reliability of a detector for ﬁnding the
`same physical interest points under diﬀerent viewing condi-
`tions. Next, the neighbourhood of every interest point is
`represented by a feature vector. This descriptor has to be
`distinctive and at the same time robust to noise, detection
`
`* Corresponding author.
`E-mail address: aess@vision.ee.ethz.ch (A. Ess).
`
`1077-3142/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved.
`doi:10.1016/j.cviu.2007.09.014
`
`displacements and geometric and photometric deforma-
`tions. Finally, the descriptor vectors are matched between
`diﬀerent images. The matching is based on a distance
`between the vectors, e.g. the Mahalanobis or Euclidean dis-
`tance. The dimension of the descriptor has a direct impact
`on the time this takes, and less dimensions are desirable for
`fast interest point matching. However, lower dimensional
`feature vectors are in general less distinctive than their
`high-dimensional counterparts.
`It has been our goal to develop both a detector and
`descriptor that, in comparison to the state-of-the-art, are
`fast to compute while not sacriﬁcing performance. In order
`to succeed, one has to strike a balance between the above
`requirements like simplifying the detection scheme while
`keeping it accurate, and reducing the descriptor’s size while
`keeping it suﬃciently distinctive.
`A wide variety of detectors and descriptors have already
`been proposed in the literature (e.g. [21,24,27,39,25]). Also,
`detailed comparisons and evaluations on benchmarking
`datasets have been performed [28,30,31]. Our fast detector
`and descriptor,
`called SURF (Speeded-Up Robust
`
`Yuneec Exhibit 1024 Page 1
`
`

`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`347
`
`Features), was introduced in [4]. It is built on the insights
`gained from this previous work. In our experiments on
`these benchmarking datasets, SURF’s detector and
`descriptor are not only faster, but the former is also more
`repeatable and the latter more distinctive.
`We focus on scale and in-plane rotation-invariant detec-
`tors and descriptors. These seem to oﬀer a good compromise
`between feature complexity and robustness to commonly
`occurring photometric deformations. Skew, anisotropic
`scaling, and perspective eﬀects are assumed to be second
`order eﬀects, that are covered to some degree by the overall
`robustness of the descriptor. Note that the descriptor can
`be extended towards aﬃne-invariant regions using aﬃne
`normalisation of the ellipse (cf. [31]), although this will have
`an impact on the computation time. Extending the detector,
`on the other hand, is less straightforward. Concerning the
`photometric deformations, we assume a simple linear model
`with a bias (oﬀset) and contrast change (scale factor). Nei-
`ther detector nor descriptor use colour information.
`The article is structured as follows. In Section 2, we give
`a review over previous work in interest point detection and
`description. In Section 3, we describe the strategy applied
`for fast and robust interest point detection. The input
`image is analysed at diﬀerent scales in order to guarantee
`invariance to scale changes. The detected interest points
`are provided with a rotation and scale-invariant descriptor
`in Section 4. Furthermore, a simple and eﬃcient ﬁrst-line
`indexing technique, based on the contrast of the interest
`point with its surrounding, is proposed.
`In Section 5, some of the available parameters and their
`eﬀects are discussed, including the beneﬁts of an upright
`version (not invariant to image rotation). We also investi-
`gate SURF’s performance in two important application
`scenarios. First, we consider a special case of image regis-
`tration, namely the problem of camera calibration for 3D
`reconstruction. Second, we will explore SURF’s applica-
`tion to an object recognition experiment. Both applications
`highlight SURF’s beneﬁts in terms of speed and robustness
`as opposed to other strategies. The article is concluded in
`Section 6.
`
`2. Related work
`
`2.1. Interest point detection
`
`The most widely used detector is probably the Harris
`corner detector [15], proposed back in 1988. It is based
`on the eigenvalues of the second moment matrix. However,
`Harris corners are not scale invariant. Lindeberg [21] intro-
`duced the concept of automatic scale selection. This allows
`to detect interest points in an image, each with their own
`characteristic scale. He experimented with both the deter-
`minant of the Hessian matrix as well as the Laplacian
`(which corresponds to the trace of the Hessian matrix) to
`detect blob-like structures. Mikolajczyk and Schmid [26]
`reﬁned this method, creating robust and scale-invariant
`feature detectors with high repeatability, which they coined
`
`Harris-Laplace and Hessian-Laplace. They used a (scale-
`adapted) Harris measure or the determinant of the Hessian
`matrix to select the location, and the Laplacian to select the
`scale. Focusing on speed, Lowe [23] proposed to approxi-
`mate the Laplacian of Gaussians (LoG) by a Diﬀerence
`of Gaussians (DoG) ﬁlter.
`Several other scale-invariant interest point detectors
`have been proposed. Examples are the salient region detec-
`tor, proposed by Kadir and Brady [17], which maximises
`the entropy within the region, and the edge-based region
`detector proposed by Jurie and Schmid [16]. They seem less
`amenable to acceleration though. Also several aﬃne-invari-
`ant feature detectors have been proposed that can cope
`with wider viewpoint changes. However, these fall outside
`the scope of this article.
`From studying the existing detectors and from published
`comparisons [29,30], we can conclude that Hessian-based
`detectors are more stable and repeatable than their Harris-
`based counterparts. Moreover, using the determinant of
`the Hessian matrix rather than its trace (the Laplacian)
`seems advantageous, as it ﬁres less on elongated, ill-localised
`structures. We also observed that approximations like the
`DoG can bring speed at a low cost in terms of lost accuracy.
`
`2.2. Interest point description
`
`An even larger variety of feature descriptors has been
`proposed, like Gaussian derivatives [11], moment invari-
`ants
`[32], complex features
`[1],
`steerable ﬁlters
`[12],
`phase-based local features [6], and descriptors representing
`the distribution of smaller-scale features within the interest
`point neighbourhood. The latter, introduced by Lowe [24],
`have been shown to outperform the others [28]. This can be
`explained by the fact that they capture a substantial
`amount of information about the spatial intensity patterns,
`while at the same time being robust to small deformations
`or localisation errors. The descriptor in [24], called SIFT
`for short, computes a histogram of local oriented gradients
`around the interest point and stores the bins in a 128D vec-
`tor (8 orientation bins for each of 4 4 location bins).
`Various reﬁnements on this basic scheme have been pro-
`posed. Ke and Sukthankar [18] applied PCA on the gradi-
`ent image around the detected interest point. This PCA-
`SIFT yields a 36D descriptor which is fast for matching,
`but proved to be less distinctive than SIFT in a second
`comparative study by Mikolajczyk and Schmid [30]; and
`applying PCA slows down feature computation. In the
`same paper [30], the authors proposed a variant of SIFT,
`called GLOH, which proved to be even more distinctive
`with the same number of dimensions. However, GLOH is
`computationally more expensive as it uses again PCA for
`data compression.
`The SIFT descriptor still seems the most appealing
`descriptor for practical uses, and hence also the most
`widely used nowadays. It is distinctive and relatively fast,
`which is crucial for on-line applications. Recently, Se
`et al. [37] implemented SIFT on a Field Programmable
`
`Yuneec Exhibit 1024 Page 2
`
`

`
`348
`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`Gate Array (FPGA) and improved its speed by an order of
`magnitude. Meanwhile, Grabner et al. [14] also used inte-
`gral images to approximate SIFT. Their detection step is
`based on diﬀerence-of-mean (without interpolation), their
`description step on integral histograms. They achieve
`about the same speed as we do (though the description step
`is constant in speed), but at the cost of reduced quality
`compared to SIFT. Generally, the high dimensionality of
`the descriptor is a drawback of SIFT at the matching step.
`For on-line applications relying only on a regular PC, each
`one of the three steps (detection, description, matching) has
`to be fast.
`An entire body of work is available on speeding up the
`matching step. All of them come at the expense of getting
`an approximative matching. Methods include the best-
`bin-ﬁrst proposed by Lowe [24], balltrees [35], vocabulary
`trees [34], locality sensitive hashing [9], or redundant bit
`vectors [13]. Complementary to this, we suggest the use
`of the Hessian matrix’s trace to signiﬁcantly increase the
`matching speed. Together with the descriptor’s low dimen-
`sionality, any matching algorithm is bound to perform
`faster.
`
`3. Interest point detection
`
`Our approach for interest point detection uses a very
`basic Hessian matrix approximation. This lends itself to
`the use of integral images as made popular by Viola and
`Jones [41], which reduces the computation time drastically.
`Integral images ﬁt in the more general framework of box-
`lets, as proposed by Simard et al. [38].
`
`3.1. Integral images
`
`Fig. 1. Using integral images, it takes only three additions and four
`memory accesses to calculate the sum of intensities inside a rectangular
`region of any size.
`
`o2
`
`;
`
`ð2Þ
`
`determinant of the Hessian also for the scale selection, as
`done by Lindeberg [21].
`Given a point x ¼ ðx; yÞ in an image I, the Hessian
`
`
`matrix Hðx; rÞ in x at scale r is deﬁned as follows
`Hðx; rÞ ¼ Lxxðx; rÞ Lxyðx; rÞ
`Lxyðx; rÞ Lyyðx; rÞ
`where Lxxðx; rÞ is the convolution of the Gaussian second
`ox2 gðrÞ with the image I in point x, and
`order derivative
`similarly for Lxyðx; rÞandLyyðx; rÞ.
`Gaussians are optimal for scale-space analysis [19,20],
`but in practice they have to be discretised and cropped
`(Fig. 2,
`left half). This leads to a loss in repeatability
`under image rotations around odd multiples of p
`4. This
`weakness holds for Hessian-based detectors in general.
`Fig. 3 shows the repeatability rate of
`two detectors
`based on the Hessian matrix for pure image rotation.
`The repeatability attains a maximum around multiples
`of p
`2. This is due to the square shape of the ﬁlter. Nev-
`ertheless, the detectors still perform well, and the slight
`decrease in performance does not outweigh the advan-
`tage of fast convolutions brought by the discretisation
`and cropping. As real ﬁlters are non-ideal in any case,
`and given Lowe’s success with his LoG approximations,
`we push the approximation for the Hessian matrix even
`further with box ﬁlters (in the right half of Fig. 2).
`These approximate second order Gaussian derivatives
`and can be evaluated at a very low computational cost
`
`Fig. 2. Left to right: The (discretised and cropped) Gaussian second order
`partial derivative in y- (Lyy) and xy-direction (Lxy), respectively; our
`approximation for the second order Gaussian partial derivative in y- (Dyy)
`and xy-direction (Dxy). The grey regions are equal to zero.
`
`ð1Þ
`
`In order to make the article more self-contained, we
`brieﬂy discuss the concept of integral images. They allow
`for fast computation of box type convolution ﬁlters. The
`entry of an integral image I RðxÞ at a location x ¼ ðx; yÞT
`represents the sum of all pixels in the input image I within
`a rectangular region formed by the origin and x.
`6y
`6x
`
`Iði; jÞ
`
`Xj
`Xi
`
`i¼0
`
`j¼0
`
`I RðxÞ ¼
`
`Once the integral image has been computed, it takes
`three additions to calculate the sum of the intensities over
`any upright, rectangular area (see Fig. 1). Hence, the calcu-
`lation time is independent of its size. This is important in
`our approach, as we use big ﬁlter sizes.
`
`3.2. Hessian matrix-based interest points
`
`We base our detector on the Hessian matrix because of
`its good performance in accuracy. More precisely, we
`detect blob-like structures at locations where the determi-
`nant is maximum. In contrast to the Hessian-Laplace
`detector by Mikolajczyk and Schmid [26], we rely on the
`
`Yuneec Exhibit 1024 Page 3
`
`

`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`349
`
`3.3. Scale space representation
`
`Interest points need to be found at diﬀerent scales, not
`least because the search of correspondences often requires
`their comparison in images where they are seen at diﬀerent
`scales. Scale spaces are usually implemented as an image
`pyramid. The images are repeatedly smoothed with a
`Gaussian and then sub-sampled in order to achieve a
`higher level of the pyramid. Lowe [24] subtracts these pyr-
`amid layers in order to get the DoG (Diﬀerence of Gaussi-
`ans) images where edges and blobs can be found.
`Due to the use of box ﬁlters and integral images, we do
`not have to iteratively apply the same ﬁlter to the output of
`a previously ﬁltered layer, but instead can apply box ﬁlters
`of any size at exactly the same speed directly on the original
`image and even in parallel (although the latter is not
`exploited here). Therefore, the scale space is analysed by
`up-scaling the ﬁlter size rather than iteratively reducing
`the image size, Fig. 4. The output of the 9 9 ﬁlter, intro-
`duced in previous section, is considered as the initial scale
`layer, to which we will refer as scale s ¼ 1:2 (approximating
`Gaussian derivatives with r ¼ 1:2). The following layers
`are obtained by ﬁltering the image with gradually bigger
`masks, taking into account the discrete nature of integral
`images and the speciﬁc structure of our ﬁlters.
`Note that our main motivation for this type of sampling
`is its computational eﬃciency. Furthermore, as we do not
`have to downsample the image, there is no aliasing. On
`the downside, box ﬁlters preserve high-frequency compo-
`nents that can get lost in zoomed-out variants of the same
`scene, which can limit scale-invariance. This was however
`not noticeable in our experiments.
`The scale space is divided into octaves. An octave repre-
`sents a series of ﬁlter response maps obtained by convolv-
`ing the same input image with a ﬁlter of increasing size. In
`total, an octave encompasses a scaling factor of 2 (which
`implies that one needs to more than double the ﬁlter size,
`see below). Each octave is subdivided into a constant num-
`ber of scale levels. Due to the discrete nature of integral
`images, the minimum scale diﬀerence between two subse-
`quent scales depends on the length l0 of the positive or neg-
`ative lobes of the partial second order derivative in the
`direction of derivation (x or y), which is set to a third of
`the ﬁlter size length. For the 9 9 ﬁlter, this length l0 is
`3. For two successive levels, we must increase this size by
`
`Fig. 4. Instead of iteratively reducing the image size (left), the use of
`integral images allows the up-scaling of the ﬁlter at constant cost (right).
`
`Fig. 3. Top: Repeatability score for image rotation of up to 180°. Hessian-
`based detectors have in general a lower repeatability score for angles
`around odd multiples of p
`4. Bottom: Sample images from the sequence that
`was used. Fast-Hessian is the more accurate version of our detector (FH-
`15), as explained in Section 3.3.
`
`images. The calculation time therefore is
`using integral
`independent of the ﬁlter size. As shown in Section 5
`and Fig. 3, the performance is comparable or better
`than with the discretised and cropped Gaussians.
`The 9 9 box ﬁlters in Fig. 2 are approximations of a
`Gaussian with r ¼ 1:2 and represent the lowest scale (i.e.
`highest spatial resolution) for computing the blob response
`maps. We will denote them by Dxx, Dyy, and Dxy. The
`weights applied to the rectangular regions are kept simple
`for computational eﬃciency. This yields
`ð3Þ
`detðHapproxÞ ¼ DxxDyy ðwDxyÞ2:
`The relative weight w of the ﬁlter responses is used to bal-
`ance the expression for the Hessian’s determinant. This is
`needed for the energy conservation between the Gaussian
`kernels and the approximated Gaussian kernels,
`w ¼ j Lxyð1:2ÞjF j Dyyð9ÞjF
`ð4Þ
`¼ 0:912::: ’ 0:9;
`j Lyyð1:2ÞjF j Dxyð9ÞjF
`where j xjF is the Frobenius norm. Notice that for theoret-
`ical correctness, the weighting changes depending on the
`scale. In practice, we keep this factor constant, as this did
`not have a signiﬁcant
`impact on the results in our
`experiments.
`Furthermore, the ﬁlter responses are normalised with
`respect to their size. This guarantees a constant Frobenius
`norm for any ﬁlter size, an important aspect for the scale
`space analysis as discussed in the next section.
`The approximated determinant of the Hessian repre-
`sents the blob response in the image at location x. These
`responses are stored in a blob response map over diﬀerent
`scales, and local maxima are detected as explained in Sec-
`tion 3.4.
`
`Yuneec Exhibit 1024 Page 4
`
`

`
`350
`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`a minimum of 2 pixels (1 pixel on every side) in order to
`keep the size uneven and thus ensure the presence of the
`central pixel. This results in a total increase of the mask size
`by 6 pixels (see Fig. 5). Note that for dimensions diﬀerent
`from l0 (e.g. the width of the central band for the vertical
`ﬁlter in Fig. 5), rescaling the mask introduces rounding-
`oﬀ errors. However, since these errors are typically much
`smaller than l0, this is an acceptable approximation.
`The construction of the scale space starts with the 9 9
`ﬁlter, which calculates the blob response of the image for
`scale. Then, ﬁlters with sizes 15 15,
`the smallest
`21 21, and 27 27 are applied, by which even more than
`a scale change of two has been achieved. But this is needed,
`as a 3D non-maximum suppression is applied both spa-
`tially and over the neighbouring scales. Hence, the ﬁrst
`and last Hessian response maps in the stack cannot contain
`such maxima themselves, as they are used for reasons of
`comparison only. Therefore, after interpolation, see Sec-
`tion 3.4, the smallest possible scale is r ¼ 1:6 ¼ 1:212
`9 corre-
`sponding to a ﬁlter size of 12 12, and the highest to
`r ¼ 3:2 ¼ 1:224
`9 . For more details, we refer to [2].
`Similar considerations hold for the other octaves. For
`each new octave, the ﬁlter size increase is doubled (going
`from 6–12 to 24–48). At the same time, the sampling inter-
`vals for the extraction of the interest points can be doubled
`as well for every new octave. This reduces the computation
`time and the loss in accuracy is comparable to the image
`sub-sampling of the traditional approaches. The ﬁlter sizes
`for the second octave are 15, 27, 39, 51. A third octave is com-
`puted with the ﬁlter sizes 27, 51, 75, 99 and, if the original
`image size is still larger than the corresponding ﬁlter sizes,
`the scale space analysis is performed for a fourth octave,
`
`using the ﬁlter sizes 51, 99, 147, and 195. Fig. 6 gives an over-
`view of the ﬁlter sizes for the ﬁrst three octaves. Further
`octaves can be computed in a similar way. In typical scale-
`space analysis however, the number of detected interest
`points per octave decays very quickly, cf. Fig. 7.
`The large scale changes, especially between the ﬁrst ﬁl-
`ters within these octaves (from 9 to 15 is a change of
`1.7), renders the sampling of scales quite crude. Therefore,
`we have also implemented a scale space with a ﬁner sam-
`pling of the scales. This computes the integral image on
`the image up-scaled by a factor of 2, and then starts the
`ﬁrst octave by ﬁltering with a ﬁlter of size 15. Additional
`ﬁlter sizes are 21, 27, 33, and 39. Then a second octave
`starts, again using ﬁlters which now increase their sizes
`by 12 pixels, after which a third and fourth octave follow.
`Now the scale change between the ﬁrst two ﬁlters is only
`1.4 (21/15). The lowest scale for the accurate version that
`can be detected through quadratic
`interpolation is
`9Þ=2 ¼ 1:2.
`s ¼ ð1:218
`As the Frobenius norm remains constant for our ﬁlters
`at any size, they are already scale normalised, and no fur-
`ther weighting of the ﬁlter response is required, for more
`information on that topic, see [22].
`
`3.4. Interest point localisation
`
`In order to localise interest points in the image and over
`scales, a non-maximum suppression in a 3 3 3 neigh-
`bourhood is applied. Speciﬁcally, we use a fast variant
`introduced by Neubeck and Van Gool [33]. The maxima
`of the determinant of the Hessian matrix are then interpo-
`lated in scale and image space with the method proposed
`by Brown and Lowe [5].
`Scale space interpolation is especially important in our
`case, as the diﬀerence in scale between the ﬁrst layers of every
`octave is relatively large. Fig. 8 shows an example of the
`detected interest points using our ‘Fast-Hessian’ detector.
`
`4. Interest point description and matching
`
`Our descriptor describes the distribution of the intensity
`content within the interest point neighbourhood, similar to
`
`Fig. 5. Filters Dyy (top) and Dxy (bottom) for two successive scale levels
`(9 9 and 15 15). The length of the dark lobe can only be increased by
`an even number of pixels in order to guarantee the presence of a central
`pixel (top).
`
`Fig. 6. Graphical representation of the ﬁlter side lengths for three diﬀerent
`octaves. The logarithmic horizontal axis represents the scales. Note that
`the octaves are overlapping in order to cover all possible scales seamlessly.
`
`Yuneec Exhibit 1024 Page 5
`
`

`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`351
`
`4.1. Orientation assignment
`
`In order to be invariant to image rotation, we identify a
`reproducible orientation for the interest points. For that
`purpose, we ﬁrst calculate the Haar wavelet responses in
`x and y direction within a circular neighbourhood of radius
`6s around the interest point, with s the scale at which the
`interest point was detected. The sampling step is scale
`dependent and chosen to be s. In keeping with the rest, also
`the size of the wavelets are scale dependent and set to a side
`length of 4s. Therefore, we can again use integral images
`for fast ﬁltering. The used ﬁlters are shown in Fig. 9. Only
`six operations are needed to compute the response in x or y
`direction at any scale.
`Once the wavelet responses are calculated and weighted
`with a Gaussian (r ¼ 2s) centred at the interest point, the
`responses are represented as points in a space with the hor-
`izontal response strength along the abscissa and the vertical
`response strength along the ordinate. The dominant orien-
`tation is estimated by calculating the sum of all responses
`within a sliding orientation window of size p
`3, see Fig. 10.
`The horizontal and vertical responses within the window
`are summed. The two summed responses then yield a local
`orientation vector. The longest such vector over all win-
`dows deﬁnes the orientation of the interest point. The size
`of the sliding window is a parameter which had to be cho-
`sen carefully. Small sizes ﬁre on single dominating gradi-
`ents,
`large sizes tend to yield maxima in vector length
`that are not outspoken. Both result in a misorientation of
`the interest point.
`Note that for many applications, rotation invariance is
`not necessary. Experiments of using the upright version
`of SURF (U-SURF, for short) for object detection can
`be found in [3,4]. U-SURF is faster to compute and can
`increase distinctivity, while maintaining a robustness to
`rotation of about ±15°.
`
`4.2. Descriptor based on sum of Haar wavelet responses
`
`For the extraction of the descriptor, the ﬁrst step con-
`sists of constructing a square region centred around the
`interest point and oriented along the orientation selected
`in previous section. The size of this window is 20s. Exam-
`ples of such square regions are illustrated in Fig. 11.
`The region is split up regularly into smaller 4 4 square
`sub-regions. This preserves important spatial information.
`For each sub-region, we compute Haar wavelet responses
`
`Fig. 9. Haar wavelet ﬁlters to compute the responses in x (left) and y
`direction (right). The dark parts have the weight 1 and the light parts þ1.
`
`Fig. 7. Histogram of the detected scales. The number of detected interest
`points per octave decays quickly.
`
`Fig. 8. Detected interest points for a Sunﬂower ﬁeld. This kind of scenes
`shows the nature of the features obtained using Hessian-based detectors.
`
`the gradient information extracted by SIFT [24] and its
`variants. We build on the distribution of ﬁrst order Haar
`wavelet responses in x and y direction rather than the gra-
`dient, exploit integral images for speed, and use only 64D.
`This reduces the time for feature computation and match-
`ing, and has proven to simultaneously increase the robust-
`ness. Furthermore, we present a new indexing step based
`on the sign of the Laplacian, which increases not only the
`robustness of the descriptor, but also the matching speed
`(by a factor of 2 in the best case). We refer to our detec-
`tor-descriptor
`scheme as SURF—Speeded-Up Robust
`Features.
`The ﬁrst step consists of ﬁxing a reproducible orienta-
`tion based on information from a circular region around
`the interest point. Then, we construct a square region
`aligned to the selected orientation and extract the SURF
`descriptor from it. Finally, features are matched between
`two images. These three steps are explained in the
`following.
`
`Yuneec Exhibit 1024 Page 6
`
`

`
`352
`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`Fig. 12. To build the descriptor, an oriented quadratic grid with 4 4
`square sub-regions is laid over the interest point (left). For each square,
`the wavelet responses are computed from 5 5 samples (for illustrative
`purposes, we show only 2 2 sub-divisions here). For each ﬁeld, we
`j d xj; dy, and j d y j, computed relatively to the
`collect the sums dx,
`orientation of the grid (right).
`
`feature vector. In order to bring in information about the
`polarity of the intensity changes, we also extract the sum
`of the absolute values of the responses, j dx j and j d y j.
`P
`P
`P
`P
`Hence, each sub-region has a 4D descriptor vector v for
`v ¼ ð
`dy;
`dx;
`its
`underlying
`intensity
`structure
`j dy jÞ. Concatenating this for all 4 4 sub-
`j dx j;
`regions, this results in a descriptor vector of length 64.
`The wavelet responses are invariant to a bias in illumina-
`tion (oﬀset). Invariance to contrast (a scale factor) is
`achieved by turning the descriptor into a unit vector.
`Fig. 13 shows the properties of the descriptor for three
`distinctively diﬀerent
`image-intensity patterns within a
`sub-region. One can imagine combinations of such local
`intensity patterns, resulting in a distinctive descriptor.
`SURF is, up to some point, similar in concept as SIFT,
`in that they both focus on the spatial distribution of gradi-
`ent information. Nevertheless, SURF outperforms SIFT in
`practically all cases, as shown in Section 5. We believe this
`is due to the fact that SURF integrates the gradient infor-
`mation within a subpatch, whereas SIFT depends on the
`orientations of the individual gradients. This makes SURF
`
`Fig. 10. Orientation assignment: a sliding orientation window of size p
`3
`detects the dominant orientation of the Gaussian weighted Haar wavelet
`responses at every sample point within a circular neighbourhood around
`the interest point.
`
`Fig. 11. Detail of the Graﬃti scene showing the size of the oriented
`descriptor window at diﬀerent scales.
`
`at 5 5 regularly spaced sample points. For reasons of
`simplicity, we call dx the Haar wavelet response in horizon-
`tal direction and dy the Haar wavelet response in vertical
`direction (ﬁlter size 2s), see Fig. 9 again. ‘‘Horizontal’’
`and ‘‘vertical’’ here is deﬁned in relation to the selected
`interest point orientation (see Fig. 12).1 To increase the
`robustness towards geometric deformations and localisa-
`tion errors, the responses dx and dy are ﬁrst weighted with
`a Gaussian (r ¼ 3:3s) centred at the interest point.
`Then, the wavelet responses dx and dy are summed up
`over each sub-region and form a ﬁrst set of entries in the
`
`1 For eﬃciency reasons,
`the Haar wavelets are calculated in the
`unrotated image and the responses are then interpolated,
`instead of
`actually rotating the image.
`
`P
`
`Fig. 13. The descriptor entries of a sub-region represent the nature of the
`underlying intensity pattern. Left: In case of a homogeneous region, all
`P
`P
`values are relatively low. Middle: In presence of frequencies in x direction,
`j dx j is high, but all others remain low. If the intensity is
`the value of
`j dx j are high.
`dx and
`gradually increasing in x direction, both values
`
`Yuneec Exhibit 1024 Page 7
`
`

`
`H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359
`
`353
`
`Fig. 15. If the contrast between two interest points is diﬀerent (dark on
`light background vs. light on dark background), the candidate is not
`considered a valuable match.
`
`Fig. 14. Due to the global integration of SURF’s descriptor, it stays more
`robust to various image perturbations than the more locally operating
`SIFT descriptor.
`
`extra information deﬁnes a meaningful hyperplane for
`splitting the data, as opposed to randomly choosing an ele-
`ment or using feature statistics.
`
`less sensitive to noise, as illustrated in the example of
`Fig. 14.
`In order to arrive at these SURF descriptors, we exper-
`imented with fewer and more wavelet features, second
`order derivatives, higher-order wavelets, PCA, median val-
`ues, average values, etc. From a thorough evaluation, the
`proposed sets turned out to perform best. We then varied
`the number of sample points and sub-regions. The 4 4
`sub-region division solution provided the best results, see
`also Section 5. Considering ﬁner subdivisions appeared to
`be less robust and would increase matching times too
`much. On the other hand, the short descriptor with 3 3
`sub-regions (SURF-36) performs slightly worse, but allows
`for very fast matching and is still acceptable in comparison
`to other descriptors in the literature.
`We also tested an alternative version of the SURF
`descriptor that adds a couple of similar features (SURF-
`128). It again uses the same sums as before, but now splits
`these values up further. The sums of dx and j dx j are com-
`puted separately for dy < 0 and dy P 0. Similarly, the sums
`of d y and j dy j are split up according to the sign of dx,
`thereby doubling the number of features. The descriptor
`is more distinctive

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go Monthly

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases