`IPR of U.S. Patent 9,007,420
`
`0001
`
`
`
`— E
`
`ditorial Advisory
`Board
`
`Gary S. Lynch
`University of California.
`Irvine
`
`George A. Miller
`Princeton University
`
`Mortimer Mishkin
`National Institutes of
`Health
`
`Associate Editors
`
`Richard Andersen
`Massachusetts Institute of
`Technology
`
`Emilio Bizzi
`Massachusetts Institute of
`Technology
`
`Floyd E. Bloom
`Scripps Clinic, La Jolla
`
`Alfonso Caramazza
`Johns Hopkins University
`
`J. Anthony Movshon
`New York University
`
`Patricia S. Churchland
`University of California,
`San Diego
`
`Daniel C. Dennett
`Tufts University
`
`Mitchell Glicl-tstein
`University College, London
`
`Lila Gleitman
`University of Pennsylvania
`
`Patricia Ge-ldrnan—Ral<ic
`Yale University
`
`Steven A I-lillyard
`University of California
`
`William Hirst
`New School for Social
`Research
`
`Terence W. Picton
`
`University of Ottawa
`
`Michael I. Posner
`University of Oregon
`
`David Prernack
`University of
`Pennsylvania
`
`Zenon W. Pylyshyrt
`University of
`Western Ontario
`
`Marcus E. Raichle
`Washington University
`
`Pasko Raloc
`Yale University
`
`David Rumelhart
`
`Stanford University
`
`David H. I-lubel
`Harvard University
`
`Stanley Schachter
`Columbia University
`
`Jon I-I. Kaas
`Vanderbilt University
`
`Terrence J. Sejnowski
`The Salk Institute
`
`"Journal of Cognitive Neuroscience
`Volume 5, No. 1, ‘Winter 1991
`ISSN 0393-929X
`
`Editors
`
`Michael S. Gazznniga
`Editor-in-Chief
`Dartmouth Medical School
`
`Ira. B. Black
`Robert Wood Johnson Medical School
`
`Stephen M. Kosslyn
`Harvard University
`
`Gordon M. Shepherd
`Yale University
`
`Managing Editor
`Charlotte Smylie
`Editorial Address
`Journal of Cognitive Neuroscience
`Cognitive Neuroscience Institute
`R0. Box 1204
`Norwich. VT 05055
`
`Individuals wishing to submit manuscripts should follow the
`guidelines provided at the back of Lltis issue.
`
`jourrtal ofCogm':t've New-osctettce is indexed or abstracted in:
`A.rt1fict'ai'!nIel'It'gence Abstracts, Cztrrerir Contents, F.-trcevpm ilferiicrt,
`and Linguistics arm‘ fflrtgungf Behavior Abstmcts.
`
`Business Oflices and Subscription Rates
`Subscriptions, address changes, and mailing list correspondence
`should be addressed to MIT Pressjournals, 55 Hayward Street,
`Cambridge, MA 02142. (617) 2532889. journal of Cognitive
`Neuroscience (ISSN 0898-929x) is published quarterly (winter,
`Spring, Summer, and Fall) by The MIT Press, Cambridge,
`Massachusetts, $56.00 for individuals and $120.00 for institutions.
`Subscribers outside the United States add $14.00 for postage and
`handling, Single copies of current issues: $30.00. To be honored free,
`claims for missing issues must be made immediately upon receipt of
`the next published issue.
`
`Postmaster
`Send address changes to Journal’ ofcognfzive Neuroscience,
`55 Hayward Street, Cambridge, MA 02142. Second Class postage
`paid at Boston, MA, and at additional post offices.
`
`Advertising
`inquiries may be addressed to the Aclvertising Manager, MIT Press
`Journals, 53 Hayward Street, Cambridge, MA 02142. (61?) 2532866.
`
`Copyright Ittformation
`Permission to photocopy articles for internal or personal use, or the
`internal or personal use of specific Clients, is granted by the
`copyright owner for users registered with the Copyright Clearance
`Center (CCC) Transactional Reporting Service, provided that the fee
`of $5.00 per article-copy is paid directly to CCC, 27 Congress Street,
`Salem, MA 01910. The fee code for users of the Transactional
`Reporting Service is 0898-9293991 $5.00. For those organizations that
`have been granted a photocopy license with CCC, a separate system
`of payment has been arranged.
`® This publication is printed on acid-free paper.
`© 1991 by the Massachusetts institute of Technology
`
`Herbert P. Killackey
`University of California,
`Irvine
`
`Larry R. Squire
`University of California.
`San Diego
`
`Marta Kutati
`University of California,
`San Diego
`
`David C. Van Essen
`California Institute of
`Technology
`
`Ralph Linsker
`IBM Research
`
`Edgar Zurif
`Brandeis University
`
`0002
`
`
`
`Eigenfaces for Recognition
`
`Matthew Turk and Alex Pentland
`Vision and Modeling Group
`The Media laboratory
`Massachusetts Institute of Technology
`
`Abstract
`
`I We have developed a near—real—timc computer system that
`can locate and track a subjects head, and then recognize the
`person by comparing characteristics of the face to those of
`known individuals. The computational approach taken in this
`system is motivated by both physiology and information theory,
`as well as by the practical requirements of near-real-time per-
`formance and accuracy. Our approach treats the face recog
`nition problem as an intrinsically two-dimensional
`(2-D)
`recognition problem rather than requiring recovery of three-
`dimensional geometry. taking advantage of the fact that faces
`are normally upright and thus may be described by a small set
`of 2-D characteristic views. The system functions by projecting
`
`face images onto a feature space that spans the significant
`variations among known face images. The significant features
`are known as “eigenfaces," because they are the eigenvectors
`(principal components) of the set of faces; they do not neces-
`sarily correspond to features such as eyes, ears, and noses. The
`projection operation characterizes an individual face by a
`weighted sum of the eigenface features, and so to recognize a
`particular face it is necessary only to compare these weights to
`those of known individuals. Some particular advantages of our
`approach are that it provides for the ability to learn and later
`recognize new faces in an unsupervised manner, and that it is
`easy to implement using a neural network architecture. I
`
`INTRODUCTION
`
`The face is our primary focus of attention in social in-
`tercourse, playing a major role in conveying identity and
`emotion. Although the ability to infer intelligence or
`character from facial appearance is suspect, the human
`ability to recognize faces is remarkable. We can recog-
`nize thousands of faces learned throughout our lifetime
`and identify familiar faces at a glance even after years of
`separation. This skill
`is quite robust, despite large
`changes in the visual stimulus due to viewing conditions,
`expression, aging, and distractions such as glasses or
`changes in hairstyle or facial hair. As a consequence the
`visual processing of human faces has fascinated philos-
`ophers and scientists for centuries, including figures such
`as Aristotle and Darwin.
`
`Computational models of face recognition, in partic-
`ular, are interesting because they can contribute not only
`to theoretical insights but also to practical applications.
`Computers that recognize faces could be applied to a
`Wide variety of problems, including criminal identifica-
`tion, security systems, image and film processing, and
`human—computer interaction. For example, the ability to
`model a particular face and distinguish it from a large
`number of stored face models would make it possible
`to vastly improve criminal identification. Even the ability
`I0 merely detect faces, as opposed to recognizing them,
`
`can be important. Detecting faces in photographs, for
`instance,
`is an important problem in automating color
`film development, since the effect of many enhancement
`and noise reduction techniques depends on the picture
`content (eg, faces should not be tinted green, while
`perhaps grass should).
`Unfortunately, developing a computational model of
`face recognition is quite difficult, because faces are com-
`plex, multidimensional, and meaningful visual stimuli.
`They are a natural class of objects, and stand in starl-c
`Contrast to sine wave gratings, the “blocks world," and
`other artificial stimuli used in human and computer vi-
`sion research (Davies, Ellis, 8: Shepherd, 1981), Thus
`unlike most early visual functions, for which we may
`construct detailed models of retinal or striate activity,
`face recognition is a very high level task for which com-
`putational approaches can currently only suggest broad
`constraints on the corresponding neural activity.
`We therefore focused our research toward developing
`a son of early, preattcntive pattern recognition capability
`that does not depend on having threeclimensional in-
`formation or detailed geometry. Our goal, which we
`believe we have reached, was to develop a computational
`model of face recognition that is fast, reasonably simple,
`and accurate in constrained environments such as an
`
`office or a household. In addition the approach is bio-
`logically implementable and is in concert with prelimi-
`
`© 1991 Mctssadauserts lnsntwe of Technology
`
`joumal of Cognitive Neuroscience Voiume 3, Number I
`
`0003
`
`
`
`nary findings in the physiology and psychology of face
`recognition.
`The scheme is based on an information theory ap-
`proach that decomposes face images into a small set of
`characteristic feature images called “eigenfaces,” which
`may be thought of as the principal components of the
`initial training set of face images. Recognition is per-
`formed by projecting a new image into the subspace
`spanned by the eigenfaces (“face space") and then clas-
`sifying the face by comparing its position in face space
`with the positions of known individuals.
`Automatically learning and later recognizing new faces
`is practical Within this framework. Recognition under
`widely varying conditions is achieved by training on a
`limited number of characteristic views (e.g., a "straight
`on" view, a 45° view, and a profile view). The approach
`has advantages over other face recognition schemes in
`its speed and simplicity, learning capacity, and insensitiv-
`ity to small or gradual changes in the face image.
`
`Background and Related Work
`
`Much of the work in computer recognition of faces has
`focused on detecting individual features such as the eyes,
`nose, mouth, and head outline, and defining a face model
`by the position, size, and relationships among these fea-
`tures. Such approaches have proven difficult to extend
`to multiple views, and have often been quite fragile,
`requiring a good initial guess to guide them. Research
`in human strategies of face recognition, moreover, has
`shown that individual features and their immediate re-
`
`lationships comprise an insufficient representation to ac-
`count
`for
`the performance of adult human face
`identification (Carey & Diamond, 1977). Nonetheless,
`this approach to face recognition remains the most pop-
`ular one in the computer vision literature.
`Bledsoe (1966a,b) was the first to attempt semiauto-
`mated face recognition with a hybrid human—computer
`system that classified faces on the basis of fiducial marks
`entered on photographs by hand. Parameters for the
`classification were normalized distances and ratios
`
`among points such as eye corners, mouth corners, nose
`tip, and chin point. Later work at Bell Labs (Goldstein,
`Harmon, 8: Lesk, 1971; Harmon, 1971) developed a vec-
`tor of up to 21 features, and recognized faces using
`standard pattern classification techniques. The chosen
`features were largely subjective evaluations (e.g., shade
`of hair, length of ears, lip thickness) made by human
`subjects, each of which would be quite difficult
`to
`automate.
`
`An early paper by Fischler and Elschlager (1973) at-
`tempted to measure similar features automatically. They
`described a linear embedding algorithm that used local
`feature template matching and a global measure of fit to
`find and measure facial features. This template matching
`approach has been continued and improved by the re-
`cent work of Yuille, Cohen, and Hallinan (1989) (see
`
`'
`
`Yuille, this volume). Their strategy is based on “deform '
`able templates,“ which are parameterized models of the
`face and its features in which the parameter values are
`determined by interactions with the image.
`Connectionist approaches to face identification seek to ':
`capture the configurational, or gestalt-like nature of the
`task. Kohonen (1989) and Kohonen and Lahtio (1981)
`describe an associative network with a simple learning
`algorithm that can recognize (classify) face images and
`recall a face image from an incomplete or noisy version
`input to the network. Fleming and Cottrell (1990) extend
`these ideas using nonlinear units, training the system by I
`backpropagation. Stonham‘s WISARD system (1986) is a
`general-purpose pattern recognition device based on
`neural net principles.
`It has been applied with some '
`success to binary face images, recognizing both identity
`and expression. Most connectionist systems dealing with
`faces (see also Midorikawa, 1988; O"I‘oole, Millward, 8:
`Anderson, 1988) treat the input image as a general 2-D
`pattern. and can make no explicit use of the configura .
`tional properties of a face. Moreover, some of these
`systems require an inordinate number of training ex-
`amples to achieve a reasonable level of performance.
`Only very simple systems have been explored to date,
`and it is unclear how they will scale to larger problems.
`Others have approached automated face recognition
`by characterizing a face by a set of geometric parameters '
`and performing pattern recognition based on the param-
`eters (e.g., Kaya 8: Kobayashi, 1972; Cannon, Jones,
`Campbell, & Morgan, 1986; Craw, Ellis, 8: Lishman, 1987;
`Wong, Law, & Tsaug, 1989). Kanade"s (1975) face identi-
`fication system was the first (and still one of the few)
`systems in which all steps of the recognition process
`were automated, using a top-down control strategy di-
`rected by a generic model of expected feature charac-
`teristics. His system calculated a set of facial parameters
`from a single face image and used a pattern classification .
`technique to match the face from a known set, a purely .
`statistical approach depending primarily on local histo-
`.
`gram analysis and absolute gray-scale values.
`Recent work by Burt (1988a,b) uses a “smart sensing"
`approach based on multiresolution template-matching.
`This coarse-to-fine strategy uses a special—purpose com-
`puter built to calculate multiresolution pyramid images
`quickly, and has been demonstrated identifying people
`in near—rea1-time. This system works well under limited
`circumstances, but should suffer from the typical prob-
`lems of correlation—based matching, including sensitivity
`to image size and noise. The face models are built by _
`hand from face images.
`
`'
`
`THE EIGENFACE APPROACH
`
`Much of the previous work on automated face l‘CC0gI1l‘
`tion has ignored the issue of just what aspects of the face
`stimulus are important for identification. This suggested
`to us that an information theory approach of coding and
`
`_.
`
`72
`
`Journal of Cognitive Neuroscience
`
`Volume 3, Number 1
`
`0004
`
`
`
`into the infor-
`decoding face images may give insight
`mation content of face images, emphasizing the signifi-
`cant local and global “features.“ Such features may or
`may not be directly related to our intuitive notion of face
`features such as the eyes, nose, lips, and hair. This may
`have important implications for the use of identification
`tools such as Identikit and Photofit (Bruce, 1988).
`In the language of information theory, we want to
`extract the relevant information in a face image, encode
`it as efficiently as possible, and compare one face encod-
`mg’ with a database of models encoded similarly. A simple
`approach to extracting the information contained in an
`image of a face is to somehow capture the variation in a
`collection of face images, independent of any judgment
`of features, and use this information to encode and -'com-
`
`pare individual face images.
`In mathematical terms, we wish to find the principal
`components of the distribution of faces, or the eigenvec-
`tors of the covariance matrix of the set of face images,
`treating an image as a point (or vector) in a very high
`dimensional space. The eigenvectors are ordered, each
`one accounting for a different amount of the variation
`among the face images.
`These eigenvectors can be thought of as a set of fea-
`tures that together characterize the variation between
`face images. Each image location contributes more or
`less to each eigenvector, so that we can display the ei-
`genvector as a sort of ghostly face which we call an
`ezgenface. Some of the faces we studied are illustrated
`in Figure 1, and the corresponding eigenfaces are shown
`in Figure 2. Each eigenface deviates from uniform gray
`where some facial feature differs among the set of train-
`ing faces; they are a sort of map of the variations between
`faces.
`
`Each individual face can be represented exactly in
`terms of a linear combination of the eigenfaces. Each
`face can also be approximated using only the “best”
`eigenfaces—those that have the largest eigenvalues, and
`which therefore account for the most variance within
`
`me set of face images. The best M eigenfaces span an
`M-dimensional subspace—“face space"——of all possible
`images.
`The idea of using eigenfaces was motivated by a tech-
`nique developed by Sirovich and Kirby (1987) and Kirby
`and Sirovich (1990) for efficiently representing pictures
`of faces using principal component analysis. Starting with
`an ensemble of original face images, they calculated a
`best coordinate system for image compression, where
`each coordinate is actually an image that they termed an
`91'genpicm.re. They argued that, at least in principle, any
`Collection of face images can be approximately recon-
`structed by storing a small collection of weights for each
`\
`._ face and a small set of standard pictures (the eigenpic—
`lures). The weights describing each face are found by
`Projecting the face image onto each eigenpicture.
`It occurred to us that if a multitude of face images can
`be reconstructed by weighted sums of a small collection
`
`of characteristic features or eigenpictures, perhaps an
`efficient way to learn and recognize faces would be to
`build up the characteristic features by experience over
`time and recognize particular faces by comparing the
`feature weights needed to (approximately) reconstruct
`them with the weights associated with known individuals.
`Each individual, therefore, would be characterized by
`the small set of feature or eigenpicture weights needed
`to describe and reconstruct themman extremely com-
`pact representation when compared with the images
`themselves.
`
`This approach to face recognition involves the follow-
`ing initialization operations:
`
`1. Acquire an initial set of face images (the training
`set).
`2. Calculate the eigenfaces from the training set, keep-
`ing only the M images that correspond to the highest
`eigenvalues. These M images define the face sprzce. As
`new faces are experienced, the eigenfaces can be up-
`dated or recalculated.
`
`5. Calculate the corresponding distribution in Meli-
`mensional weight space for each known individual, by
`projecting their face images onto the “face space."
`
`These operations can also be performed from time
`to time whenever there is free excess computational
`capacity.
`Having initialized the system, the following steps are
`then used to recognize new face images:
`
`1. Calculate a set of weights based on the input image
`and the M eigenfaces by projecting the input image onto
`each of the eigenfaces.
`2. Determine if the image is a face at all (whether
`known or unknown) by checking to see if the image is
`sufficiently close to “face space."
`3. If it is a face, classify the weight pattern as either a
`known person or as unknown.
`4. (Optional) Update the eigenfaces and/or weight
`patterns.
`_
`5. (Optional) If the same unknown face is seen several
`times. calculate its characteristic weight pattern and in-
`corporate into the known faces.
`
`Calculating Eigenfaces
`
`Let a face image I(x,y) be a two-dimensionaliv byN array
`of 03-bit) intensity values. An image may also be consid-
`ered as a vector of dimension N2, so that a typical image
`of size 256 by 256 becomes a vector of dimension 65,556,
`or, equivalently, a point in 65,556-dimensional space. An
`ensemble of images, then, maps to a collection of points
`in this huge space.
`Images of faces, being similar in overall configuration,
`will not be randomly distributed in this huge image space
`and thus can be described by a relatively low dimen-
`sional subspace. The main idea of the principal compo-
`
`lurk and Pentland
`
`73
`
`0005
`
`
`
`Figure 1. (a)Face images
`used as the training set.
`
`nent analysis (or Karhunen—Loeve expansion) is to find
`the vectors that best account for the distribution of face
`
`images within the entire image space. These vectors de-
`fine the subspace of face images, which we call "face
`space." Each vector is of length N2, describes an N by N
`image, and is a linear combination of the original face
`images. Because these vectors are the eigenvectors of
`the covariance matrix corresponding to the original face
`images, and because they are face-like in appearance, we
`refer to them as “eigenfaces.” Some examples of eigen-
`faces are shown in Figure 2.
`,
`.
`.
`.
`Let the training set of face images be T1, F2, F5,
`FM. The average face of the set
`is defined by ‘I’ =
`if L: Fn- Each face differs from the average by the
`vector 11>, = F, — ‘I’. An example training set is shown
`in Figure 1a, with the average face ‘I’ shown in Figure
`1b. This set of very large vectors is then subject to prin-
`cipal component analysis, which seeks a set of M ortho-
`normal vectors, u,«., which best describes the distribution
`of the data. The lath vector, us, is chosen such that
`
`At = 1 § (u:.'tI> )2
`Mn=1
`H
`
`(1)
`
`is a maximum, subject to
`
`lltrus = 3»; ={
`
`1,
`0,
`
`if!=ie
`otherwise
`
`(2) '
`
`The vectors 11.: and scalars M are the eigenvectors and
`eigenvalues, respectively, of the covariance matrix
`
`T
`1”
`c=—2«1>,,ct-,.
`Mn=1
`
`: MT
`
`(5)_
`
`I
`(I)}|f]- The matrix C,
`.
`.
`where the matrix A = [£11, (I); .
`however, is N2 by N‘, and determining the N2 eigenvec- .
`tors and eigenvalues is an intractable task for typical
`image sizes. We need a computationally feasible method
`to find these eigenvectors.
`If the number of data points in the image space is 1655
`than the dimension of the space (M < N2), there will be
`only M - 1, rather than N2, meaningful eigenvectots. '
`(The remaining eigenvectors will have associated ei3e“'
`values of zero.) Fortunately we can solve for the N2‘
`dimensional eigenvectors in this case by first solving fill’
`the eigenvectors of an M by M matrix—e.g., solving 3'
`16 X 16 matrix rather than a 16,384 X 16,384 matrix-e
`
`74
`
`Journal of C‘ogm'::'ve Neuroscience
`
`Votume 3, Number 1'
`
`0006
`
`
`
`Following this analysis, we Construct theM byM matrix
`L = ATA, where 1...,,.., = tI)3,",(IJ,,, and find the M eigenvec-
`tors, vi, of 11. These vectors determine linear combina-
`tions of the M training set face images to form the
`eigenfaces m.
`
`M
`
`111: 2V.1»q’_§,
`,9:
`
`1:1,...
`
`.M
`
`(6)
`
`With this analysis the calculations are greatly reduced,
`from the order of the number of pixels in the images
`(N2) to the order of the number of images in the training
`set (M). In practice, the training set of face images will
`be relatively small (M 4 N2), and the calculations become
`quite manageable. The associated eigenvalues allow us
`to rank the eigenvectors according to their usefulness in
`characterizing the variation among the images. Figure 2
`shows the top seven eigenfaces derived from the input
`images of Figure 1.
`
`Using Eigenfaces to Classify a Face Image
`
`The eigenface images calculated from the eigenvectors
`of L span a basis set with which to describe face images.
`Sirovich and Kirby (1987) evaluated a limited version of
`this framework on an ensemble of M = 115 images of
`Caucasian males, digitized in a controlled manner, and
`found that about 40 eigenfaces were sufficient for a very
`good description of the set of face images. With M’ =
`40 eigenfaces, RMS pixel-by-pixel errors in representing
`cropped versions of face images were about 2%.
`Since the eigenfaces seem adequate for describing face
`images under very controlled conditions, we decided to
`investigate their usefulness as a tool for face identifica-
`tion. In practice, a smaller M’ is sufficient for identifica-
`tion, since accurate reconstruction of the image is not a
`requirement. In this framework, identification becomes
`a pattern recognition task. The eigertfaces span an M‘-
`dimensional subspace of the original JV?’ image space.
`The M’ significant eigenvectors of the 1 matrix are chosen
`as those with the largest associated eigenvalues. In many
`of our test cases, based on M = 16 face images, M’ = 7
`eigenfaces were used.
`A new face image (F) is transformed into its eigenface
`components (projected into "face space”) by a simple
`operation,
`
`(Ilia = u}i{F — ‘F3
`
`(7)
`
`. ,M'. This describes a set of point—by-point
`.
`for £2 = 1, .
`image multiplications and summations, operations per-
`formed at approximately frame rate on current image
`processing hardware. Figure 3 shows an image and its
`projection into the seven-dimensional face space.
`The weights form a vector QT = [(91,
`(.02 .
`.
`. am] that
`describes the contribution of each eigenface in repre-
`senting the input face image, treating the eigenfaces as a
`basis set for face images, The vector may then be used
`
`Time and Pemland
`
`75
`
`',:
`
`Figure 1. (la) The average face ‘P.
`
`*
`
`_ Figure 2. Seven of the eigenfaces calculated from the input images
`. of Figure 1.
`
`H‘ and then taking appropriate linear combinations of the
`face images <I>.. Consider the eigenvectors v,- ofATA such
`that
`
`ATAV; = ].l.-{V1
`
`Premultiplying both sides by A, we have
`
`/lJ’11.‘AVr =
`
`u..Avt
`
`EFOIII which we see that Av; are the eigenvectors of C =
`AAT.
`
`0007
`
`
`
`in a standard pattern recognition algorithm to find which
`of a number of predefined face classes, if any, best de-
`scribes the face. The simplest method for determining
`which face class provides the best description of an input
`face image is to find the face class it that minimizes the
`Eucliclian distance
`
`as = ||(0 - 9-t)||2
`
`(8)
`
`where (Is is a vector describing the kth face class. The
`face classes 0; are calculated by averaging the results of
`the eigenface representation over a small number of face
`images (as few as one) of each individual. A face is
`classified as belonging.to class .(a when the minimum at
`is below some chosen threshold 6.. Otherwise the face
`is classified as "unknown," and optionally used to create
`a new face class.
`
`Because creating the vector of weights is equivalent to
`projecting the original face image onto the low-climen-
`sional face space, many images (most of them looking
`nothing like a face) will project onto a given pattern
`vector. This is not a problem for the system, however,
`since the distance e between the image and the face
`space is simply the squared distance between the mean-
`adjusted input image ([1 = I‘ — ‘I’ and <11; = L, myuy,
`its projection onto face space:
`-
`
`e’ = Iltb — MI’
`
`(9)
`
`Thus there are four possibilities for an input image and
`its pattern vector: (1) near face space and near a face
`class, (2) near face space but not near a known face class,
`(3) distant from face space and near a face class, and (4)
`distant from face space and not near a known face class.
`In the first case, an individual is recognized and iden-
`tified. In the second case, an unknown individual is pres-
`ent. The last two cases indicate that the image is not a
`face image. Case three typically shows up as a false pos-
`itive in most recognition systems;
`in our framework,
`however, the false recognition may be detected because
`of the significant distance between the image and the
`subspace of expected face images. Figure 4 shows some
`images and their projections into face space and gives a
`measure of distance from the face space for each.
`
`Summary of Eigenface Recognition
`Procedure
`
`with the highest associated eigenvalues. (Let M‘ = 1()_-
`this example.)
`5. Combine the normalized training set of images 3
`cording to Eq. (6) to produce the (M’ = 10) eigenfac
`Us
`,.
`4. For each known individual, calculate the class
`tor 01: by averaging the eigenface pattern vectors 0 [fro
`Eq. (8)] calculated from the original (four) images of
`. "
`individual. Choose a threshold 3, that defines the
`mum allowable distance from any face Class, and
`threshold 8. that defines the maximum allowable di'
`tance from face space [according to Eq. (9)].
`S. For each new face image to be identified, calculan
`its pattern vector (1, the distances 6: to each known (:1 «Q2;
`and the distance 6 to face space. if the minimum distan
`at < 6., and the distance e < 6., classify the input
`as the individual associated with class vector Qt. If
`minimum distance as > as but distance 6 < 9;, then
`image may be classifed as “unknown,” and optional:
`used to begin a new face class.
`6. If the new image is classified as a known individ .
`this image may be added to the original set of fami1'
`face images, and the eigenfaces may be recalcula eel
`(steps 1-4). This gives the opportunity to modify the fa
`space as the system encounters more instances of kno ii’;
`
`In our current system calculation of the eigenfaces
`done offline as part of the training. The recognitio
`currently takes about 400 msec running rather -ié
`ciently in Lisp on a Sun4, using face images of size 128
`128. With some special-purpose hardware, the cur ‘ ii":
`version could run at close to frame rate (53 msec).
`‘f
`Designing a practical system for
`face recognitio
`within this framework requires assessing the tradeo
`between generality, required accuracy, and speed. If
`face recognition task is restricted to a small set of peop.
`(such as the members of a family or a small compan
`a small set of eigenfaces is adequate to span the faces ti
`interest. If the system is to learn new faces or repres it
`many people, a larger basis set of eigenfaces will
`I.:I'
`required. The results of Sirovich and Kirby (1987) -
`53”
`Kirhyand Sirovich (1990) for coding of face images gi
`some evidence that even if it were necessary to reptes ..
`a large segment of the population, the number of eigelf
`faces needed would still be relatively small.
`
`E
`
`To summarize, the eigenfaces approach to face recogni-
`tion invoives the following steps:
`
`Locating and Detecting Faces
`
`1. Collect a set of characteristic face images of the
`known individuals. This set should include a number of
`
`images for each person, with some variation in expres-
`sion and in the lighting. (Say four images of ten people,
`so M = 40.)
`2. Calculate the (40 X 40) matrix L, find its eigenvec-
`tors and eigenvalues, and choose the M’ eigenvectors
`
`The analysis in the preceding sections assumes we hi‘
`a centered face image, the same size as the trainl
`images and the eigenfaces. We need some way, then,
`locate a face in a scene to do the recognition. We h
`developed two schemes to locate andlor track faces.
`ing motion detection and manipulation of the image5 “_
`“face space".
`'
`
`76
`
`Journal of Cognitive Neuroscience
`
`Volume 3. Number
`
`0008
`
`
`
`Figure 3. An original face image and its projectionlonto the face space defined by the eigenfaces of Figure 2.
`
`_
`
`"If-'_..
`5;‘
`
`Motion Detecting and Head Tracking
`'7} People are constantly moving. Even while sitting, we
`_ fidget and adjust our body position, nod our heads, look
`around, and such. In the case of a single person moving
`in a static environment, a simple motion detection and
`tracking algorithm, depicted in Figure 5, will locate and
`track the position of the head. Simple spatiotemporal
`filtering (e.g., frame differencing) accentuates image lo
`cations that change with time, so a moving person “lights
`up" in the filtered image. If the image “lights up" at all,
`motion is detected and the presence of a person is
`postulated.
`After thresholding die filtered image to produce a
`binary motion image, we analyze the “motion blobs" over
`time to decide if the motion is caused by a person
`"moving and to determine head position. A few simple
`rules are applied, such as “the head is the small upper
`blob above a larger blob (the body),” and “head motion
`ff. must be reasonably slow and contiguous" (heads are not
`expected to jump around the image erratically). Figure
`6 shows an image with the head located, along with the
`Path of the head in the preceding sequence of frames.
`The motion image also allows for an estimate of scale.
`The size of the blob that is assumed to be the moving
`hfiad determines the size of the subimage to send to the
`recognition stage. This subimage is rescaled to Fit the
`dimensions of the eigenfaces.
`
`faces from motion (eg, if there is too little motion or
`many moving objects) or as a method of achieving more
`precision than is possible by use of motion tracking
`alone. This method allows us to recognize the presence
`of faces apart from the task of identifying them.
`As seen in Figure 4, images of faces do not change
`radically when projected into the face space, while the
`projection of nonface images appears quite different.
`This basic idea is used to detect the presence of faces in
`a scene: at every location in the image, calculate the
`distance 6 between the local subimage and face space.
`This distance from face space is used as a measure of
`“faceness," so the result of calculating the distance from
`face space