throbber
GTL 1016
`IPR of U.S. Patent 9,007,420
`
`

`
`the detector by focusing attention on promising regions of
`the image. The notion behind focus of attention approaches
`is that it is often possible to rapidly determine where in an
`image an object might occur [16, 7. 1]. More complex pro-
`cessing is reserved only for these promising regions. The
`key measure of such an approach is the “false negative” rate
`of the attentional process. It must be the case that all, or
`almost all. object instances are selected by the attentional
`filter.
`
`We will describe a process for training an extremely sim-
`ple and efficient classifier which can be used as a “super-
`vised" focus of attention operator. The term supervised
`refers to the fact that the attentional operator is trained to
`detect examples of a particular class. In the domain of face
`detection it is possible to achieve fewer than l% false neg-
`atives and 40% false positives using a classifier constructed
`from two Harr-like features. The effect of this filter is to
`
`reduce by over one half the -number of locations where the
`final detector must be evaluated.
`
`Those sub-windows which are not rejected by the initial
`classifier are processed by a sequence of classifiers, each
`slightly more complex than the last. If any classifier rejects
`the sub-window. no -_further processing is performed. The
`structure of the cascaded detection process is essentially
`that of a degenerate decision tree, and as such is related to
`the work of Geman and colleagues [1, 3].
`An extremely fast face detector will have broad prac-
`tical applications. These include user interfaces.
`image
`databases. and teleconferencing.
`In applications where
`rapid frame-rates are not necessary. our system will allow
`for significant additional post-processing and analysis.
`In
`addition our system can be implemented on a wide range of
`small low power devices. including hand-helds and embed-
`ded processors.
`In our lab we have implemented this face
`detector on the Compaq iPaq handheld and have achieved
`detection at two frames per second (this device has a low
`power 200 MIPS Strong Arm processor which lacks float~
`ing point hardware).
`The remainder of the paper describes our contributions
`and a number of experimental results. including a detailed
`description of our experimental methodology. Discussion
`of closely related work takes place at the end of each sec-
`tion.
`
`2. Features
`
`Our object detection procedure classifies images based on
`the value of simple features. There are many motivations
`for using features rather than the pixels directly. The most
`common reason is that features can act to encode ad-hoc
`
`domain knowledge that is difficult to learn using a finite
`quantity of training data. For this system there is also a
`second critical motivation for features:
`the feature based
`
`system operates much faster than a pixel-based system.
`The simple features used are reminiscent of Bear basis
`functions which have been used by Papageorgiou et al. [9].
`
`
`
`Figure 1: Example rectangle features shown relative to the
`enclosing detection window. The sum of the pixels which
`lie within the white rectangles are subtracted from the sum
`of pixels in the grey rectangles. 'I\vo-rectangle features are
`shown in (A) and (B). Figure (C) shows a three-rectangle
`feature, and (D) a four-rectangle feature.
`
`More specifically, we use three kinds of features. The value
`of 3 two-rectangle feature is the difference between the sum
`of the pixels within two rectangular regions. The regions
`have the same size and shape and are horizontally or ver-
`tically adjacent (see Figure 1}. A rhree~t'ectartgIe feature
`computes the sum within two outside rectangles subtracted
`from the sum in a center rectangle. Finally a four-rectangle
`feature computes the difference between diagonal pairs of
`rectangles.
`Given that the base resolution of t.he detector is 24x24,
`
`the exhaustive set of rectangle features is quite large, over
`180.000 . Note that unlike the Haar basis. the set of rectan-
`gle features is overcomplete’.
`
`2.1. Integral Image
`Rectangle features can be computed very rapidly using an
`intermediate representation for the image which we call the
`integral image.” The integral image at location I, y contains
`the sum of the pixels above and to the left of .15, y, inclusive:
`
`ii(==.y} =
`
`2 it-*0’. ti’).
`='sa=.tr’ S1:
`
`where ii'(m, y) is the integral image and £(z, y) is the origi-
`nal image. Using the following pair of recurrences:
`
`stony)
`
`s(z.y - 1) +%‘(r.y)
`
`time) = ='=7(=¢= - 1. S!) + stay)
`
`(1)
`
`(2)
`
`(where s(:c, y) is the cumulative row sum. s(:c, -1) = 0,
`and t'i(-1, y) = 0) the integral image can be computed in
`one pass over the original image.
`
`’A complete basis has no linear dependence between basis elements
`and has the same number of elements as the image space. in this case 576.
`The full set of 180.000 thousand features is many times over-complete.
`‘There is a close relation to "summed area tables" as used in graphics
`[2}. We choose a difierent nitrite here in order to emphasize its use for the
`analysis of images. rather than for texture mapping.
`
`1-512
`
`

`
`
`
`Figure 2: The sum of the pixels within rectangle D can be
`computed with four array references. The value of the inte-
`gral image at location 1 is the sum of the pixels in rectangle
`A. The value at location 2 is A + B . at location 3 is A + C.
`and at location 4 is A + B + C + D. The sum within D can
`be computed as 4 +1 —- (2 + 3).
`
`image any rectangular sum can be
`Using the integral
`computed in four array references (see Figure 2). Clearly
`the difference between two rectangular sums can be com-
`puted in eight references. Since the two-rectangle features
`defined above involve adjacent rectangular sums they can
`be computed in six array references. eight in the case of
`the three-rectangle features. and nine for four-rectangle fea-
`lures.
`
`2.2. Feature Discussion
`
`Rectangle features are somewhat primitive when compared
`with alternatives such as steerable filters [4, 6]. Steerable fil-
`
`ters, and their relatives, are excellent for the detailed analy-
`sis of boundaries. image compression, and texture analysis.
`In contrast rectangle features. while sensitive to the pres-
`ence of edges, bars. and other simple image structure, are
`quite coarse. Unlike steerable filters the only orientations
`available are vertical. horizontal, and diagonal. The set of
`rectangle features do however provide a rich image repre-
`sentation which supports effective learning. In conjunction
`with the integral image . the efficiency of the rectangle fea-
`ture set provides ample compensation for their limited flex-
`ibility.
`
`3. Learning Classification Functions
`
`Given a feature set and a training set of positive and neg-
`ative images. any number of machine learning approaches
`could he used to learn a classification function. In our sys-
`tem 21 variant of AdaBoost is used both to select a small set
`
`of features and train the classifier {S}. In its original form.
`the AdaBoost learning algorithm is used to boost the clas-
`sification performance of a simple (sometimes called weak)
`learning algorithm. There are a number of fonrial guaran-
`tees provided by the Adal?-oost learning procedure. Freund
`and Schapire proved that the training error of the strong
`
`classifier approaches zero exponentially in the numberof
`rounds. More importantly a number of results were later
`proved about generalization performance [12]. The key
`insight is that generalization performance is related to the
`margin of the examples. and that AdaBoost achieves large
`margins rapidly. '
`Recall that there are over 180.000 rectangle features as-
`sociated with each image sub-window. a number far larger
`than the number of pixels. Even though each feature can
`be computed very efficiently. computing the complete set is
`prohibitively expensive. Our hypothesis. which is borne out
`by experiment, is that a very small number of these features
`can be combined to form an effective classifier. The main
`challenge is to find these features.
`In support of this goal. the weak learning algorithm is
`designed to select the single rectangle feature which best
`separates the positive and negative examples (this is similar
`to the approach of [l5] in the domain of image database
`retrieval). For -each feature, the weak learner determines
`the optimal threshold classification function. such that the
`minimum number of examples are rnisclassified. A weak
`classifier by (3) thus consists of afearure f,-, a threshold 3,-
`and a polarity P3‘ indicating the direction of the inequality
`sign:
`
`an -{a ::.r;a:2
`
`Here 3 is a 24x24 pixel sub-window of an image. See Fig-
`ure 3 for a summary of the boosting process.
`In practice no single feature can perform the classifica-
`tion task with low error. Features which are selected in early
`rounds of the boosting process had error rates between 0.1
`and 0.3. Features selected in later rounds, as the task be-
`comes more difficult, yield error rates between 0.4 and 0.5.
`
`3.1. Learning Discussion
`
`Many general feature selection procedures have been pro-
`posed (see chapter 8 of [17] for a review). Our final appli-
`cation demanded a very aggressive approach which would
`discard the vast majority of features. For a similar recogni-
`tion problem Papageorgiou et al. proposed a scheme for fea-
`ture selection based on feature variance [9]. They demon»
`strated good results selecting 37 features out of a total 1734
`features.
`
`Roth et al. propose a feature selection process based
`on the Winnow exponential perceptron learning rule [10].
`The Winnow learning process converges to a solution where
`many of these weights are zero. Nevertheless a very large
`number of features are retained (perhaps a few hundred or
`thousand).
`
`3.2. Learning Results
`
`While details on the training and performance of the final
`system are presented in Section 5, several simple results
`merit discussion.
`Initial experiments demonstrated that a
`
`I-S13
`
`

`
`
`
`Figure 4: The first and second features selected by Ad-
`aBoost. The two features are shown in the top row and then
`overlayed on a typical training face in the bottom row. The
`first feature measures the difference in intensity between the
`region of the eyes and a region across the upper cheeks. The
`feature capitalizes on the observation that the eye region is
`often darker than the checks. The second feature compares
`the intensities in the eye regions to the intensity across the
`bridge of the nose.
`
`4. The Attentional Cascade
`
`This section describes an algorithm for constructing a cas-
`cade of classifiers which achieves increased detection per-
`formance while radically reducing computation time. The
`key insight is that smaller. and therefore more efficient,
`boosted classifiers can be constructed which reject many of
`the negative sub-windows while detecting almost all posi-
`tive instances (i .e. the threshold of a boosted classifier can
`be adjusted so that the false negative rate is close to zero}.
`Simpler classifiers are used to reject the majority of sub-
`windows before more complex classifiers are called upon
`to achieve low false positive rates.
`The overall form of the detection process is that of a de-
`generate dccision tree, what we call a “cascade” (see Fig-
`ure 5). A positive result from the first classifier triggers the
`evaluation of a second classifier which has also been ad-
`
`justed to achieve very high detection rates. A positive result
`from the second classifier triggers a third classifier. and so
`on. A negative outcome at any point leads to the immediate
`rejection of the sub-window.
`Stages in the cascade are constructed by training clas-
`sifiers using AdaBoost and then adjusting the threshold to
`minimize false negatives. Note that the default AdaBoost
`threshold is designed to yield a low error rate on the train-
`ing data. In general a lower threshold yields higher detec-
`tion rates and higher false positive rates.
`For example an excellent first stage classifier can be con-
`sttucted from a two-feature strong classifier by reducing the
`threshold to minimize false negatives. Measured against a
`validation training set. the threshold can be adjusted to de-
`tect 100% of the faces with a false positive rate of 40%. See
`
`I-514
`
`
`
`
`
`0 Given example images (:n,yt).. .. ,(:c..,y..) where
`y.- = 0, 1 for negative and positive examples respec-
`tively.
`
`o Initialize weights w1,.- = g, § for 1;: = 0,1 respec-
`tively, where m and I are the number of negatives and
`positives respectively.
`'
`a Fort=1,...,'1":
`
`l. Normalize the weights.
`
`1.L1f_.1' {'-
`
`so that w; is a probability distribution.
`
`2. For each feature, 3'. train a classifier hj which
`is restricted to using a single feature.
`The
`error is evaluated with respect
`to wt. e,- =
`2,. w.- Ihns.-J - ya.
`3. Choose the classifier, hr, with the lowest error 6;.
`
`4. Update the weights:
`
`'HJ't+1,n‘
`
`]_ .
`‘—‘ we.o9t
`S‘
`
`where at = 0 if example mg is classified cor-
`rectly, ti.‘ = 1 otherwise, and ,3: = -3-l—lz '
`
`o The final strong classifier is:
`
`= { 1
`
`0
`
`otherwise
`
`215:} aihi ix) 2 é‘E'ih=1 Q‘
`
`
`
`
`
`where (2; = log 31;
`
`
`Figure 3: The AdaBoost algorithm for classifier learn-
`ing. Each round of boosting selects one feature from the
`180,000 potential features.
`
`frontal face classifier constructed from 200 features yields
`a detection rate of 95% with a false positive rate of 1 in
`14084. These results are compelling, but not sufficient for
`many real-world tasks. In terms of computation, this clas-
`sifier is probably faster than any other published system,
`requiring 0.7 seconds to scan an 384 by 288 pixel image.
`Unfortunately. the most straightforward technique for im-
`proving detection performance, adding features to the clas-
`sifier. directly increases computation time.
`
`For the task of face detection, the initial rectangle fea-
`tures selected by AdaBoost are meaningful and easily inter-
`preted. The first feature selected seems to focus on the prop-
`erty that the region of the eyes is often darker than the region
`of the nose and checks (see Figure 4). This feature is rel-
`atively large in comparison with the detection sub-window.
`and should be somewhat insensitive to size and location of
`
`the face. The second feature selected relies on the property
`that the eyes are darker than the bridge of the nose.
`
`

`
`ing this optimum is a tremendously difficult problem.
`In practice a very simple framework is used to produce
`an effective classifier which is highly efficient. Each stage
`in the cascade reduces the false positive rate and decreases
`the detection rate. A target is selected for the minimum
`reduction in false positives and the maximum decrease in
`detection. Each stage is trained by adding features until the
`target detection and false positives rates are met ( these rates
`are determined by testing the detector on a validation set}.
`Stages are added until the overall target for false positive
`and detection rate is met.
`
`4.2. Detector Cascade Discussion
`
`The complete face detection cascade has 38 stages with over
`6000 features. Nevertheless t.he cascade structure results in
`
`fast average detection times. On a difficult dataset. con-
`taining 507 faces and 75 million sub-windows. faces are
`detected using an average of 10 feature evaluations per sub-
`window. In comparison. this system is about 15 times faster
`than an implementation of the detection system constructed
`by Rowley et al.3 [I 1]
`A notion similar to the cascade appears in the face de-
`tection system described by Rowley et al. in which two de-
`tection networks are used [I 1]. Rowley et al. used a faster
`yet less accurate network to prescreen the image in order to
`find candidate regions for a slower more accurate network.
`Though it is difficult to determine exactly, it appears that
`Rowley et al.’s two network face system is the fastest exist-
`ing face detector.‘
`The structure of the cascaded detection process is es-
`sentially that of a degenerate decision tree. and as such is
`related to the work of Amit and Geman [1]. Unlike tech-
`niques which use a fixed detector, Amit and Geman propose
`an alternative point of view where unusual co-occurrences
`of simple image features are used to trigger the evaluation
`of a more complex detection process.
`In this way the full
`detection process need not be evaluated at many of the po-
`tential image locations and scales. While this basic insight
`is very valuable, in their implementation it is necessary to
`first evaluate some feature detector at every location. These
`features are then grouped to find unusual co-occurrences. In
`practice. since the form of our detector and the features that
`it uses are extremely efficient, the amortized cost of evalu-
`ating our detector at every scale and location is much faster
`than finding and grouping edges throughout the image.
`In recent work Fleuret and Geman have presented a face
`detection technique which relies on a “chain“ of tests in or-
`der to signify the presence of a face at a particular scale and
`
`3Hen.ry Rowley very graciously supplied us with irnplementutions of
`his detection system for direct comparison. Reported results are against
`his fastest system. It is difficult to determine from the published literature,
`but the Rowley-Ba.|uja—Kanade detector is widely considered the fastest
`detection system and has been heavily tested on real-world problems.
`‘Other published detectors have either neglected to discuss perfor-
`mance in detail. or have never published detection and false positive rates
`on a large and ditlicult training set.
`
`Figure 5: Schematic depiction of a the detection cascade.
`A series of classifiers are applied to every sub-window. The
`initial classifier eliminates a large number of negative exam-
`ples with very little processing. Subsequent layers eliminate
`additional negatives but require additional computation. Af-
`ter several stages of processing the number of sub-windows
`have been reduced radically. Further processing can take
`any form such as additional stages of the cascade (as in our
`detection system) or an alternative detection system.
`
`Figure 4 for a description of the two features used in this
`classifier.
`
`Computation of the two feature classifier amounts to
`about 60 microprocessor instructions.
`It seems hard to
`imagine that any simpler filter could achieve higher rejec-
`tion rates. By comparison, scanning a simple image tem-
`plate. or a single layer perceptron, would require at least 20
`times as many operations per sub-window.
`The structure of the cascade reflects the fact
`
`that
`
`within any single image an overwhelming majority of sub-
`windows are negative. As such. the cascade attempts to re-
`ject as many negatives as possible at the earliest stage pos-
`sible. While a positive instance will trigger the evaluation
`of every classifier in the cascade, this is an exceedingly rare
`B\"Bl'll.
`
`Much like a decision tree, subsequent classifiers are
`trained using those examples which pass through all the
`previous stages. A5 a result. the second classifier faces a
`more difficult task than the first. The examples which make
`it through the first stage are “harder” than typical exam-
`ples. The more difficult examples faced by deeper classi-
`fiers push the entire receiver operating characteristic (ROC)
`curve downward. At a given detection rate. deeper classi-
`fiers have correspondingly higher false positive rates.
`
`4.1. Training a Cascade of Classifiers
`
`The cascade training process involves two types of trade-
`offs.
`In most cases classifiers with more features will
`
`achieve higher detection rates and lower false positive rates.
`At the same time classifiers with more features require more
`time to compute. In principle one could define an optimiza-
`tion framework in which: i) the number of classifier stages.
`ii) the number of features in each stage. and iii) the thresh-
`old of each stage. are traded off in order to minimize the
`expected number of evaluated features. Unfortunately find-
`
`I-515
`
`

`
`
`
`Figure 6: Example of frontal upright face images used for
`training.
`
`location [3]. The image properties measured by Fleuret and
`Geman, disjnnctions of fine scale edges, are quite different
`than rectangle features which are simple. exist at all scales.
`and are somewhat interpretable. The two approaches also
`differ radically in their learning philosophy. The motivation
`for Fleuret and Gen1an‘s learning process is density estima-
`tion and density discrimination, while our detector is purely
`discriminative. Finally the false positive rate of Fleuret and
`Ciernan’s approach appears to be higher than that of previ-
`ous approaches like Rowley et al. and this approach. Un-
`fortunately the paper does not report quantitative results of
`this kind. The included example images each have between
`2 and 10 false positives.
`
`5 Results
`
`A 38 layer cascaded classifier was trained to detect frontal
`upright faces. To train the detector. a set of face and non-
`face training images were used. The face training set con-
`sisted of 4916 hand labeled faces scaled and aligned to a
`base resolution of 24 by 24 pixels. The faces were ex-
`tracted from images downloaded during a random crawl of
`the world wide web. Some typical face examples are shown
`in Figure 6. The non-face subwindows used to train the
`detector come from 9544 images which were manually in-
`spected and found to not contain any faces. There are about
`350 million subwindows within these non-face images.
`The number of features in the first five layers of the de-
`tector is l, 10, 25, 25 and 50 features respectively. The
`remaining layers have increasingly more features. The total
`number of features in all layers is 6061.
`Each classifier in the cascade was trained with the 4916
`
`training faces (plus their vertical mirror images for a total
`
`of 9832 training faces) and 10,000 non-face sub-windows
`(also of size 24 by 24 pixels) using the Adaboost training
`procedure. “or the initial one feature classifier, the non-
`face training examples were collected by selecting random
`sub-windows from a set of 9544 images which did not con-
`tain faces. The non-face examples used to train subsequent
`layers were obtained by scanning the partial cascade across
`the non—face images and collecting false positives. A maxi-
`mum of 10.000 such non-face sub-windows were collected
`for each layer.
`
`Speed of the Final Detector
`The speed of the cascaded detector is directly related to
`the number of features evaluated per scanned sub-window.
`Evaluated on the MlT+CMU test set [11]. an average of 10
`features out of a total of 6061 are evaluated per sub-window.
`This is possible because a large majority of sub—windows
`are rejected by the first or second layer in the cascade. On
`a T00 Mhz Pentium III processor. the face detector can pro-
`cess a 384 by 288 pixel image in about .067 seconds (us-
`ing a starting scale of 1.25 and a step size of 1.5 described
`below). This is roughly 15 times faster than the Rowley-
`Baluja-Kanade detector [ll] and about 600 times faster than
`the Schneiderrnan- Kanade detector [13].
`
`Image Processing
`All example sub-windows used for training were vari-
`ance normalized to minimize the effect of different light-
`ing conditions. Normalization is therefore necessary during
`detection as well. The variance of an image sub-window
`can be computed quickly using a pair of integral images.
`Recall that 0'2 = mg -— % 2 :::2, where 0 is the standard
`deviation. in is the mean. and as is the pixel value within
`the sub-window. The mean of a sub-window can be com-
`
`puted using the integral image. The sum of squared pixels
`is computed using an integral image of the image squared
`(i.e. two integral images are used in the scanning process).
`During scanning the effect of image normalization can be
`achieved by post-multiplying the feature values rather than
`pre-multiplying the pixels.
`
`Scanning the Detector
`The final detector is scanned across the image at multi-
`ple scales and locations. Scaling is achieved by scaling the
`detector itself, rather than scaling the image. This process
`makes sense because the features can be evaluated at any
`scale with the same cost. Good results were obtained using
`a set of scales a factor of 1.25 apart.
`The detector is also scanned across location. Subsequent
`locations are obtained by shifting the window some number
`of pixels A. This shifting process is affected by the scale of
`the detector: if the current scale is s the window is shifted
`
`by [all], where D is the rounding operation.
`The choice of A affects both the speed of the detector as
`well as accuracy. The results we present are for A = 1.0.
`We can achieve a significant speedup by setting A = 1.5
`with only a slight decrease in accuracy.
`
`I-516
`
`

`
`Table l: Detection rates for various numbers of false positives on the MIT-I-CMU test set‘ containing 130 images and 507
`faces.
`
`
`
`
`
`
`
`Viola-Jones (voting)
`Row|ey~Balujn~Kt1nnde
`5chneidenm.n—l(unade
`
`Ro1h—Ya_rIg—Ahuju
`
`
`
`
`
`
`
`
`
`“-
`=— 94-4% —_2
`in --
`
`81.1%
`83.2%
`
`Integration of Multiple Detections
`Since the final detector is insensitive to small changes in
`translation and scale. multiple detections will usually occur
`around each face in a scanned image. The same is often true
`of some types of false positives. In practice it often makes
`sense to return one final detection per face. Toward this end
`it is useful to postprocess the detected sub-windows in order
`to combine overlapping detections into a single detection.
`In these experiments detections are combined in a very
`simple fashion. The set of detections are first partitioned
`into disjoint subsets. 'I\vo detections are in the same subset
`if their bounding regions overlap. Each partition yields a
`single final detection. The corners of the final bounding
`region are the average of the corners of all detections in the
`set.
`
`Experiments on a Real-World Test Set
`We tested our system on the MIT+CMLl frontal face test
`set [1 1]. This set consists of 130 images with 50'? labeled
`frontal faces. A ROC curve showing the performance of our
`detector on this test set is shown in Figure 7. To create the
`ROC curve the threshold of the final layer classifier is ad-
`justed from —oo to +oo. Adjusting the threshold to +oo
`will yield a detection rate of 0.0 and a false positive rate
`of 0.0. Adjusting the threshold to —-oo. however. increases
`both the detection rate and false positive rate. but only to a
`certain point. Neither rate can be higher than the rate of the
`detection cascade minus the final layer. In effect. a thresh-
`old of -—-co is equivalent to removing that layer. Furtlter
`increasing the detection and false positive rates requires de-
`creasing the threshold of the next classifier in the cascade.
`Thus. in order to construct a complete ROC curve, classifier
`layers are removed. We use the number of false positives as
`opposed to the rare of false positives for the x-axis of the
`ROC curve to facilitate comparison with other systems. To
`compute the false positive rate. simply divide by the total
`number of sub-windows scanned.
`In our experiments, the
`number of sub—windows scanned is 75,081,800.
`
`Unfortunately, most previous published results on face
`detection have only included a single operating regime (Le.
`single point on the ROC curve). To make comparison with
`our detector easier we have listed our detection rate for the
`
`false positive rates reported by the other systems. Table I
`
`POC mm brbeedeleclcrwnllal-nun=1.0
`
`
`
`Figure 7: ROC curve for our face detector on the
`MIT+CMU test set. The detector was run using a step size
`of 1.0 and starting scale of 1.0 05,081,800 sub—windows
`scanned).
`'
`-
`
`lists the detection rate for various numbers of false detec-
`
`tions for our system as well as other published systems. For
`the Rowley-Baluja-Kanade results [1 1]. a number of differ-
`ent versions of their detector were tested yielding a number
`of different results they are all listed in under the same head-
`ing. For the Roth-Yang-Ahuja detector [10]. they reported
`their result on the MIT+CMU test set minus 5 images con-
`taining line drawn faces removed.
`Figure 8 shows the output of our face detector on some
`test images from the MIT+CMU test set.
`
`A simple voting scheme to further improve results
`In table 1 we also show results from running three de-
`tectors (the 38 layer one described above plus two similarly
`trained detectors} and outputting the majority vote of the
`three detectors. This improves the detection rate as well as
`eliminating more false positives. The improvement would
`be greater if the detectors were more independent. The cor-
`relation of their errors results in a modest improvement over
`the best single detector.
`
`I-517
`
`

`
`
`
`Figure 8: Output of our face detector on a number of test images from the MIT-t-CMU test set.
`
`6 Conclusions
`
`We have presented an approach for object detection which
`minimizes computation time while achieving high detection
`accuracy. The approach was used to construct a face detec-
`tion system which is approximately 15 times faster than any
`previous approach.
`This paper brings together new algorithms. representa-
`tions, and insights which are quite generic and may well
`have broader application in computer vision and image pro-
`ceasing.
`Finally this paper presents a set of detailed experiments
`on a difficult face detection dataset which has been widely
`studied. This dataset includes faces under a very wide range
`of conditions including: illumination, scale. pose, and earn-
`era variation. Experiments on such a large and complex
`dataset are difficult and time consuming. Nevertheless sys-
`tems which work under these conditions are unlikely to be
`brittle or limited to a single set of conditions. More impor-
`tantly conclusions drawn from this dataset are unlikely to
`be experimental artifacts.
`
`References
`
`[11 Y. Amit. D. Geman. and K. Wilder. Joint induction of shape
`features and tree classifiers. I997.
`
`In
`Summed-area tables for texture mapping.
`[2] F. Crow.
`Proceedings of SIGGRAPH, volume 18(3), pages 207-212,
`1984.
`
`[3] F. Fleuret and D. Gernan. Coarsedo-fine face detection. int.
`J‘. Computer Vision, 200] .
`
`{4} William T. Freeman and Edward H. Adelson. The design
`and use of steerable filters.
`IEEE Transactions on Pattern
`Analysis and Machine intelligence. l3(9):89i—9{]6, 1991.
`
`{5} Yoav Freund and Robert E. Schapire. A decision-theoretic
`generalization of on~line leaming and an application to
`boosting. In Computational Leaming Theory: Enrocolt '95.
`pages 23-37. Springer—Verlag, 1995.
`
`[6] H. Greenspan. S. Belongie. R. Gooodman, P. Perona, S. Rak-
`shit. and C. Anderson. Overcomplete steerable pyramid fil-
`
`ters and rotation invariance. In Proceedings ofrhe IEEE Con-
`ference on Computer Vision and Pattern Recognition, 1994.
`
`[7] L. Itti, C. Koch, and E. Niebur. A model of saliency-based
`visual attention for rapid scene analysis.
`IEEE Parr. Anal.
`Mach. i'nIe.l.l., 2l}(l 1): 12544259, November 1993.
`
`{8} Edgar Osuna. Robert Freund, and Federico Girosi. Training
`support vector machines: an application to face detection.
`In Proceedings ofrhe IEEE Conference on Computer lirion
`and Pattern Recognition. i997.
`
`{9} C. Papageorgiou. M. Oren. and T. Poggio. A general frame-
`work for object detection.
`In Inremarional Conference on
`Computer Piston, 1998.
`
`[10] D. Roth, M. Yang. and N. Ahuja. A snowbased face detector.
`In Neural Infomiorion Processing I2. 2000.
`
`[l 1] H. Rowley, S. Baluja. and T. Kanade. Neural network-based
`face detection. In IEEE Port. Anal. Mach. l’nIell., volume 20,
`pages 22-38. i993.
`
`[12] Robert E. Schapire. Yoav Freund, Peter Bartlett, and
`Wet: Sun Lee. Boosting the margin: A new explanation for
`the effectiveness of voting methods.
`In Proceedings of the
`Fourteenth fntentational Conference on Machine Leamtng.
`1997.
`
`[[3] H. Schneiderrnan and T. Kanade. A statistical method for 3D
`object detection applied to faces and cars.
`In Inrernortonal
`Conference on Computer Vision, 2000.
`
`[14] K. Sung and T. Poggio. Example-based learning for view-
`based face detection. In IEEE Pan. Anal. Mach. Ina-elt, vol-
`ume 20. pages 39-51. 1998.
`
`[15] K. Tteu and P. Viola. Boosting image retrieval. In Proceed-
`ings ofthe IEEE Conference on Computer Vision and Pattern
`Recognition. 2000.
`
`[I6} J.K."l"sotsos. SM. Culliane. W.Y.K. Wai. Y.H. Lai. N. Davis,
`and F. Nuflo. Modeling visual-attention via selective tun-
`ing. Artificial Inrelligence Journal. '78(1-2}‘.507—545. Octo-
`ber I995.
`
`[17] Andrew Webb. Statistical Pattern Recognition. Oxford Uni-
`versity Press. New York, 1999.
`
`I-5l8

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket