`IPR of U.S. Patent 9,007,420
`
`0001
`
`
`
`IEEE
`
`9
`
`COMPUTER
`SOCIETY
`
`The [I2 EE Computer Society is an association of pi.-npie with professioml interest in tlie field at’ cornputers. All inenibi.-rt: til the IEEE are eligible for membership in
`the Computer Society. as are Il.IEI'I1bE‘.l'5 ol oi:-rtain pmfcssional sociei-is-.9 and other computer prrifessioiials. Computer Society meiribers will reteive this T1‘-Insacfions
`upon P.1y|'|'|e|'|t of the annual Society meml:-ersliip fee ($35 fur IEEE nii:-nilaers, 578 for all ulhetsl plus an zinnusil suliscription Fae [paper uiil5r:$33:u1-actmiic ni1|y:53fl:
`curribinritiuii: $49). For .1c1diiional membership and 5|ll.'l‘.‘iC1‘lPIl(\11 iiiiorniatioii, visit our Web site at rIttp:#cnrrip4.'ler.orgIsubscr|:e. send email to helpficornputenorg.
`or write to IEEE Con1piitcrSL>i:iety, 111662 Los Vriqneros Circle. PO Box 3014. 1.05 Aliiniltus, CA 90720-1314 USA.
`lnclividual slibscription copies of Transartiuns ':l1‘i.‘
`for personal use only
`
`@IEEE TRANSACTIONS on
`PATTERN ANALYSIS AND MACHINE INTELLIGENCE
`BD1'l'iJR-JN-CI-IIEF
`ASSOCIATE EDl'l‘CIR-1N-CI-IIBF
`David]. Krieginriii
`Rzirria Chellappa
`.
`Beckrnan Institute
`Center for Au turnatiorn Resenrclt
`University of Illinois at 1Jrbaua—Cha nipaign
`University of Maryland
`-105 N. Mathews Avenue
`College Park, MD 20742 USA
`Urbana. 11.. 61801 USA
`clic-|l:ifleng.uincl.ecl1.1
`k1'iegn1an€miuc.odii
`
`P. ANANDAN
`Microsoft Corp.
`c'iIlaW.‘iiI1'|.@1lll.{‘lt0S0.fl.€0l'l'i
`PE1'l.".R N. 'BEl.Hl..'M aux
`Yale Uiiiversity
`l:Ie|huineiir®)':ite.ed.i.i
`1'. Ross BF.\'ER!Di.iE
`Colorado State University
`mss@cs.colustate.edu
`
`CARLA E. Biwntrr
`I’urd1.ie University
`hroc1Iey®ecn.pui'clue.ei1u
`HEt\'|'lI|< l. CHRlS'I'EN$E.'~.l
`Swedish Royal trait. of
`Tedtnology
`|iic@nad:i.kt1‘Ise
`Aiiisiiio DEL BIMISU
`Uiiiversité dcgli Studi di
`Firenze
`de|bimbo@r|s'i.uniEi.it
`
`Svnm J. DICKINSON
`University uF'lhrm1tn
`svei\®cs.toronto.crlu
`
`D.AV1DJ. l-‘i.i:iri
`Xerox Palo Alto Research
`Center
`lleet®i.1ntIc.xi::rux.ccim
`DAVID A. I'oitsr'rH
`Univ. of California. Berkeley
`,-I‘
`dalficsberkcleymtii
`;
`Wiiuiwi 1'. FREEMAN
`Mitsubishi Electric Rir_°.§.'\1ri:h
`Lab
`.
`lrl3t!n1rii1‘11Iriei'Ii:‘t'ii'i1
`
`‘
`
`3
`
`BRENDAN J, Fitizv
`University at’ Toronto
`freyfinwii tc-rlornca
`VENIJ G(.wINi:w:A|u
`State: Univ. 01 New York.
`Buffalo
`got-'i.ridfi1ooi:l.'ir.|.7iit't'.'i|I.'i.i.‘cl=.i
`
`Editorial Board
`AMi..-mi KUNDU
`Mitre Corp.
`akI.ii1dii@i1'Ii'rre.ni'g
`1'-‘stair MEIER
`Rutgers University
`nieer@criip.rL1tgers.eclu
`R.iNI::.\L C. NELSON
`lvlici-’i.~i'. llwsii A .‘
`-r
`"rt Uiiiversity of RtItht‘SlL‘l'
`_- "nlpirp-r'a‘i'ui Inst. oF3cience
`iianiflwisrlomnveizmaiirtrac.i1 'iie|son®cs.roc|ie:iter.edu
`DAVJD Jacoris
`NEC Research lI'l5lll‘L1II¢‘
`dwj®nesearc11.iij.nec.co1ii
`
`CCIIIDELIA SCI-IMID
`INRIA Rlioue-Alpa
`Coreleliaschmidflinrialpeslr
`STAN 5Cl..I ROI’-F
`Boston Utti\'et‘sity
`sclarolifilcsl-u.ed\1
`
`Srevru M. Seiiz
`' University of Washingtoii
`si.~itz®i::a.washini;ton.edu
`Mun.-\iuii< SHAH
`University of Central Florida
`sl1rrh@cs.ui:E.edu
`
`RA]Eli.\‘ Sri.-iitwi
`Penn State Uiiiversity
`rshiirrI'iii@cse.psu.edu
`AMNDN 5|-lASl'IU.i\
`Hebrew University (If
`Ierusiiiein
`sli.1sl\iiaflcs.huji.ac.i|
`
`VIC 501.0
`University at’ New South
`Wales
`vsolo®s_vscon.ee.iinsw.edu.nu
`Luc VINCENT
`Lizai-cl‘l'ec'rI Inc.
`lui.i'ice¢'ntifi.‘1iz2I:i:1tei:l1.I:i:on1
`
`Juvmc. (JOHN) Went;
`Michigan State University
`weng@cse.xnsu.edii
`KAZUI1II<0 Ynteinhinrro
`Gifu University
`yaniamotufiiiifmigifii-1i.ac.jp
`Zrisaemoi; Ziuwc
`Microsoft Rusuni-ch
`z1'i:iiig@n'iicri:isnt‘t.mm
`
`Mitffl l"iETIl<Al.’\1£N
`Univ rsity of Oulu‘
`11lk]J§EE52.(IiI1ll.ll
`1.ILiNc Quniv
`Hon
`ring Univ. of Scieiice
`fit TE lnulngy
`qua
`5\.|.l5Ll‘ik
`Sisberr SARKAR
`.Univiersity of South Florida
`ksrtflks
`. Led
`SdI"|
`sens
`11
`
`EDWIN H..mLLx_'i.'.
`University (if York
`er1i|Ei‘cs.york..1c.uk
`TIN KAH Ho
`Bell Laboratories
`rkhflbell-tabs.cn1n
`
`}
`
`"I i
`aiuiarrn i<i-ion33..in‘f\
`5_Q1't‘lfiEyiMa'll
`isthj ‘ 3
`Lliiiirersity
`|d1a|Eiaeas.smu.edu
`
`R.\i:.£si-I KUMAR
`Sarnofl Corp.
`‘Q’:-il1'%;'\l'l‘ID_1'f.I_E0I'i1'
`¥:.°'—-‘aura. .33-¥ -'
`
`o
`'
`
`fa.
`
`";
`
` mmhfl$IONS I SIAT'LlS : For information on submitting a inanuscript or on a paper awaiting publication. please contact: Transactions Assistant TPAMI,
`IEEE Cnmpuier Society, 10662 Los Vaqueros Ciiicle. PO Box 3014, Los Alrn-nitos. CA 90720-1314 USA; EMAIL: tpanilficonipiiturnrg. PHONE: +1 7143213330; FAX! +1 714-821-4010
`
`Wiii H Kim. l"n>si'.leiii'
`5I'l:l'>-1|’-\’ L D.IAll3t\IJ. Pnsilenf-Elnrt
`BIN \\".-\H. Pitsl PiIt'.<iili‘IlI
`l{.i\NCN.‘lId\k K.-\5"I't.|ll,.
`l-ti". Put-timiimis
`][RR\'E1\‘l.'-EL. VP. Cmifiwrrrns Er '.l1iiiiriirl‘eI
`
`Msm.l:ew-at-Lame
`G.i|l|LII!|.1.i $In\_’tiNI'll [I Data
`ANGELA BUNJE.-6 tr.i' qiicirit
`LEE GILI5
`A:\AUIJTll|I'hItlI
`T:-Iums Krrn
`llIlii_i;ii:i‘ii¢ Opmitiiilis Ctirir:
`Tlmisiirliiuis Ops-miiiiiis CM:
`Piil-liriitiiuiis Opirriiiirvirs C-lirrir:
`IEEE PA Ltatlsrm:
`
`GIJDIVEE C\'lIE\'l§I.I
`TliD
`Hfltllli <.‘iIiti.~.1m.<.i:i<
`K-\.'\GAIL'IIJtR Kai-t11_.'iri
`
`@ [BEE COMPUTER SOCIETY
`Officers
`Cm. CHAN41, UP. Ediimiiuiniuirfiiiiiei
`]AiiiEs H. cm. W’. Cl'IlI]’lli.‘l' Ariitirtu-5
`LU\FElL]DI |7\!IUN. Srriiiiiiil VP. Sriliiilrmrls t1.ri'lI-{lire
`DEBORAH SCIIEFIER. First 111'! Taciurii-at Adiriri‘.-5
`Dsiioiiuii M. Corrsit. Sermtitry
`
`Puhlicatiocris Board
`Vice l‘rrsi'rle:ii': R.\t~.':Jv:>tAlt KJISTURI
`Editon-in-Cliiei
`T1-Imus]. llmcm
`Fmvms 5i.rI.I.w.\I~t
`].\Ml!§ 1-I. i!uIi.oiI
`James]. TII(1.\lA5
`1-‘iwer. Fsmwir
`DANIEL E. 0'l.E.’tR\'
`Nll.I:\lL\1D.\l'L 1'. Sl7~El'.II
`R.-ui=sH Gurra
`Kr:-i 5.u<.amuIui
`Foatauzm CUt.51l.I\i\'I
`M, S.\'n:.i\'.=iiur.uiLL\I
`Sm-i=.:-I C. l\I1l.‘Cl‘I.\'it'El.l.
`
`Magainriea
`Aiiiiiiis ry'll'ii' Itistiu-y l‘,(CDl'll}|Itll'l1g:
`Cnmpiitiiig Fir Sdeiirir I9 E:-iglIirrIi'i'iig:
`Conppirtei-:
`Cimipiri:-r Gmyliics Er A;J1Jli(iltl'lt‘AlI9I
`Drsigir 5* Test:
`lriti-iligmt Systt-iris:
`latte:-rial‘ Cmpirtiiig:
`t'.r l"n:,ii-sslaiiiil:Mi'uu.'
`Mnlrimuiiw:
`Prmrsiu' Cmripriti':ig:
`Sufliurrre:
`
`Wtlftiitt-11.: GIIIJII. Tm.-unr
`lattes IJ. I5.-MK. l'E£E Ditvisiiui V Drmciiir
`TI-etltdiis W. Wu Ir\.\l§. bl'.l‘.ll-300.’ IEEE Di'r«'r-iii» 1/Hf l.)ii\'crr-Ir
`DA\1I)HliNNM'£. Einecritiw Dilwliu
`
`'l‘nrIii|cIlriiiis
`Culwyiiirrs:
`R‘ii:araIici.l}{nEr Darin Eiigiurrriiig:
`ittoliik Ciuujmtiirg:
`Mutiiwumir:
`Nnlinmiiiirig:
`nifflm .t'y Di‘si'rr'hirt.il' Syuteiiiis:
`l'-iittmr Mulgisl: é-it-lliidiiiialiitdliqiime
`5iI_flmIn' EIP_¢iNl't'PiNg:
`Vi>r_u- l_mxI.>SrInl'r' lmrgmlkm:
`l/‘i‘siuilizitIi‘:m iv Cmirp.iIrta- Gmpliirs:
`IEEE CS Pnrss:
`
`Edltcii-ii-iri-Il'.‘l-iief
`Il€A.\l—L1.|<.' G:lL7L)I.(‘N
`Piiiurs. ‘tu
`TOM LA l’0ltl'.\
`Tsunainr Ciibt
`Mm.‘ Lil:
`Pm YEW
`MAMA L‘iIt.I I..hl‘l!r\
`iflflhl l<:iii:iii
`Elli‘ Fltllntiliw
`H-ties Hmrx
`litlkfl.-\l‘I \v\"II.I.tm~I.-'>
`
`D. \'ID HENNMBIE. Eu-zuiiw Din>t-tor
`ANLIIIJI l3lM1GE§e'I. Pflltliéllw
`VirIi.i:i Fit IJDAN. Chiiffiiinirriul Dflirrr
`
`Executive Staff
`
`J'i.\:.\'I'-. M.-IIIIE Ki=.I.t.\'. Dinicnlr. Valuula-rsrnins
`]uI-in: C. 1<r.u(\¥. Mirrmgrr. Rismrrilr min‘ l’lliiiii-iiig
`ttooeirr Calls‘. Dimrtrir. liifr-numtiour 'l'i~t'liiii.:to_i;_|r¢'a'.‘ierr-r'4\'$
`
`Trarisacliorris Department
`ALicm L. Riiriizinr. Pvertiirliiiii M.iiii.i5\-i-
`Suzixxne Wemrlt I-leer Rmicrr Supt-misar
`JUL]! HICKS. §l'El\[El1 liIu1'£r.. li-\NN l5!Il.IlCll.lI1, l-'m.l'iiI:li'iIIii Ertirurs
`Yu—Tzu Tsnl. Elrrrrialrik Media Auisrnrr!
`l'lll.DA HrIsiI.I.r!=, Tririasm-i:'iw Asaistmit
`
`.lEEE T[Lr\l\lSAC'I'lOl\l5 ON I’.I'\Tl"£lI.N ANAYLSJ5 AND MACHINE INTELLIGENCE is piiblislied muntlily by tl1eIEEE Ci:ii'i'ipule1' Society. IEEE Corporate Office: Three Park iM(8|IIll.‘i I7Ili Flour. New York, NY l.00‘l6-59‘)? USA.
`lir.-spuiisihllily For the oulltunl rests upon Ill.‘ autlims and .riot iipuin the IEEE or III: ll2ElI Computer So(iely.1EEE Computer fiociety Ptiblicalians Office: 10515! I.-05 Vnqueros Circle. PO BOX 301-1. I-06 hlalllifllfi. CH 99730-13“ U54°-- 11355
`Computer Society Hemtquai-tars: 1730 Mass.1cItI.Ise|itil Ave. NW, Waslu'ngton. DC 200351992 .TJSA._1!a:|c issues: IEEE members $10.00 (first copy only]. noninmibevs 510.00 peronpti. iN_o|e: Adfl S1130 postage and 1|-Wllillfi Chem‘ _fl‘ any
`unlar tram 61.110 in $50.00. inrliirliiig prepaid orders]. Complete price intomaiiiirm at-iiiafule on request. Also available in l't’|il.'tI)fi.t:l1E. Copyright and Reprint Permiissirrrisz Absir.1i:tmg IS permitted with credit In line soume. Llbfflrlfi are
`permitted ID pimtmupy for private use at patroro. prwidod the pen-.npy lee ll1dII;‘fl|E$1. in the war: at the bottom of the first pa
`paid iliroiigli the Copyright Clearance Center. 222 rtosmm.-rod Drive. Danvers. MA 01923 U5-‘L For 1'"
`aghgr mp}.i"s_ mprinif or mP..|.|'.:.1|im pgmiiggrin“, write to; copyright; and ren-ntsgitans Opp.-irrrrient, IEEE. i‘ub.tic.1iiuns Admi
`iitratimi. -i-45 l-lnos Lane. PO Boot I33], Fiscatuwa}: N] M855-I331. Copyright 0 2112 by The Institute ol'
`Electrical and Electronic Engineers. lnr. Ail riyiisgegen-ed, Periodicals postage paid at New York, NY. and at additional mailing nificcit I"cIstn1nster:SeiitI address chniuqes to IEEE TRANSACHONS DN PA1TERN ANAYISIS AND
`MACI-l1NE INTELLIGENCE. ii-:i3i=.sei-uioe Center. 445 Hoes um. PO tam 1351.1’!-kalaivay. Ni 05695-I331 115-\.C5T Regisltatirln Nn.12563-Ilsa. Canada Post Publications Mail Agreement Number 0457767. Printed in USA.
`
`0002
`
`
`
`'“""“"""'VB'B'I°0e¢ebv6aurvI:uumnu|vu.1oaaoJ
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 24. N0. 1.
`
`JANUARY 2002
`
`Detecting Faces in Images: A Survey
`Ming-Hsulan Yang. Member, IEEE, David J. Kriegman, Senior Member, l‘EEE,.an‘d
`Narendra Ahuja, Fellow, IEEE
`
`- Abstract—-Images containing faces are essential to intelligent vision-based human c_omputer interaction. and research efforts in face
`processing include lace recognition. face tracking. pose estimation. and expression recognition. However. many reported methods_
`assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that
`analyze the information contained in face images. robust and efficient face detection algorithms are required. Given a single image. the
`goal of face detection is to identify all image regions which contain a face regardless of its three-dimensional position, orientation. and
`lighting conditions. Such a problem is challenging because faces are nonrigid and have a high degree of variability in size. shape. color.
`and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to
`categorize and evaluate these algorithms. We also discuss relevant issues such as data collection. evaluation metrics, and
`benchmarking. After analyzing these algorithms and identifying their limitations. we conclude with several promising directions for
`future research.
`
`Index Terrns—Face detection. face recognition. object recognition, view»based recognition. statistical pattern recognition. machine 7
`learning.
` m—
`
`+———————
`
`._n_-.-rn)-u-to-I-l'\'|-1u)y-'-‘Q-_5
`
`We now give a definition of face detection: Given an:
`arbitrary image. the goal of face detection is to determine
`whether or not there are any faces in the image and, if;
`present. return the image location and extent of each face;
`The challenges associated with face detection can be?"
`attributed to the following factors:
`'
`
`1
`
`INTRODUCTION
`1Tl-1 the ubiquity of new information technology and
`media, more effective and friendly methods for
`human computer interaction (HCI) are being developed
`which do not rely on traditional devices such as keyboards.
`mice, and displays. Furthermore, the ever decreasing price/
`performance ratio of computing coupled with recent
`decreases in video image acquisition cost
`imply that
`computer vision systems can be deployed in desktop and
`embedded systems [11]]. 1112]. i113]. The rapidly expand-
`ing research in face processing is based on the premise that
`information about a user's identity, state, and intent can be
`extracted from images, and that computers can then react
`accordingly, e.g.. by observing a person’s facial expression.
`In the last five years, face and facial expression recognition
`have attracted much attention though they have been
`studied for more than 20 years by psychophysicists,
`neuroscientists, and engineers. Many research demonstra-
`tions and commercial applications have been developed
`from these efforts. A first step of any face processing system
`is detecting the locations in images where faces are present.
`However, face detection from a single image is a challen-
`ging task because of variability in scale, location, orientation
`(up-right, rotated), and pose (frontal, profile). Facial
`expression, occlusion. and lighting conditions also change
`the overall appearance of faces.
`
`o
`
`Pose. The images of a face vary due to the relative;
`camera-face pose (frontal, 45 degree, profile, upside‘
`down). and some facial features such as an eye or the;
`nose may become partially or wholly occluded.
`Presence or absence of structural components:
`Facial
`features such as beards. mustaches. and
`glasses may or may not be present and there is ;:
`great deal of variability among these components
`including shape, color. and size.
`Facial expression. The appearance of
`faces are":
`directly affected by a person's facial expression.
`l
`Occlusion. Faces may be partially occluded by othe [_
`objects. In an image with a group of people, som'_.
`faces may partially occlude other faces.
`.:
`Image orientation. Face images directly vary fo.-I
`different rotations "about the camera's optical axis.
`Q."
`Imaging conditions. When the image is formed;
`factors such as lighting (spectra, source distribution
`and intensity) and camera characteristics [sens
`response, lenses) affect the appearance of a face.
`There are many closely related problems of
`f
`detection. Face localization
`'
`'
`position of a single face;
`problem with the assumpti
`only one face [851, [1[}3]. The goal offacial feature detection
`to detect the presence and location of features, such as eyes
`
`0 M.-H. Yrrirg is a.-itli Homin Fiindnn-initial Rcscrrrcli Labs, 800 California
`Street, Mountain View, CA 94041. E-mriil: ni_1;nng@l'ri'n.com.
`9 DJ. Kricgmarr is with the Department of Compriter Science and Beckmrm
`lirstirutc, Llnioeisily of Illinois at Urbano-Clrnmpaigii, ljrbrma, IL 61801.
`E-mrtil: lrriegn1im@iiiuc.ca'1r.
`o N. Africa is with the Dcparlrrient of Electrical and Computer Engineering
`and Beciciaan lnstitule, Llirioersity of Illinois at Urbano-Clmmpaign,
`l.lrbrrrm, IL 61801. E-mail: ahujc@uision.rii.uiuc.ed‘u.
`assumption that there is only on
`ii/lanuscript received 5 May 2000; revised 15 Inn. 2001; accepted ? Mar. 2001.
`Price recognition orface iclentr'ficr_itr'on compares an input imag
`Recommcmtccl for acceptance by K. Bcrwycr-
`--
`(probe) against a database (gallery) and reports a match. ’
`For info:-rimtion on obtciiriiig reprints of this article, please semi s-mrril
`tpniui@compirter.aig, and reference JEEECS Log Number" 112058.
`D162-B82B1'0Er‘$1?.OU ED 2002 IEEE
`
`to:
`
`0003
`
`
`
`YANG ET AL.: DETECTING FACES IN IMAGES: A SURVEY
`
`any [1 63], [133], [18]. The purpose offace authentication is to
`verify the claim of the identity of an individual in an input
`image 1158], [82], while face tracking methods continuously
`estimate the location and possibly the orientation of a face
`in an image sequence in real time [30], [39], I33]. Facial
`expression recognition concerns identifying the affective
`states (happy, sad, disgusted, etc.) of humans I401, [35].
`Evidently, face detection is the first step in any automated
`system which solves the above problems.
`It
`is worth
`mentioning that many papers use the term "face detection,”
`but the methods and the experimental results only show
`that a single face is localized in an input image. In this
`paper, we differentiate face detection from face localization
`since the latter is a simplified problem of the former.
`Meanwhile, we focus on face detection methods rather than
`tracking methods.
`While numerous methods have been proposed to detect
`faces in El single image of intensity or color images, we are
`unaware of any surveys on this particular topic. A survey of
`early face recognition methods before 1991 was written by
`Samal and Iyengar I133]. Chellapa et al. wrote a more recent
`survey on face recognition and some detection methods [18].
`"Among the face detection methods, the ones based on
`learning algorithms have attracted much attention recently
`and have demonstrated excellent results. Since these data-
`driven methods rely heavily on the training sets, We also
`discuss several databases suitable for this task. A related
`
`and important problem is how to evaluate the performance
`of
`the proposed detection methods. Many recent
`face
`detection papers compare the performance of several
`methods, usually in terms of detection and false alarm
`rates. It is also worth noticing that many metrics have been
`adopted to evaluate algorithms, such as learning time,
`execution time, the number of samples required in training,
`and the ratio between detection rates and false alarms.
`Evaluation becomes more difficult when researchers use
`different definitions for detection and false alarm rates. In
`
`this paper, detection mic is defined as the ratio between the
`number of faces correctly detected and the number faces
`determined by a human. An image region identified as a
`face by a classifier is considered to be correctly detected if
`the image region covers more than a certain percentage of a
`face in the image (See Section 3.3 for details). In general,
`detectors can make two types of errors: false negatives in
`which faces are missed resulting in low detection rates and
`false positives i.n which an image region is declared to be
`face, but it is not. A fair evaluation should take these factors
`
`into consideration since one can tune the parameters of
`one’s method to increase the detection rates while also
`
`increasing the number of false detections. In this paper, we
`discuss the benchmarking data sets and the related issues in
`a fair evaluation.
`
`With over 150 reported approaches to face detection, the
`research in face detection has broader implications for
`Computer vision research on object recognition. Nearly all
`model—based or appearancebased approaches to 3D object
`recognition have been limited to rigid objects while
`attempting to robustly perform identification over a broad
`range of camera locations and illurnination conditions. Face
`detection can be viewed as a two-class recognition problem
`
`35
`
`in which an image region is classified as being a "face” or
`“nonface.” Consequently, face detection is one of the few
`attempts to recognize from images (not abstract representa-
`tions) a class of objects for which there is a great deal of
`within-class variability (described previously). It is also one
`of the few classes of objects-for which this variability has
`been captured using large training sets of images and, so,
`some of the detection techniques may be applicable to a
`-much broader class of recognition problems.
`Face detection also provides interesting challenges to the
`underlying pattern classification and learning techniques.
`When a raw or filtered image is considered as input to a
`pattern classifier,
`the dimension of the feature space is
`extremely large (i.e., the number "of pixels in normalized
`training images). The classes of face and nonface images are
`decidedly characterized by multimodal distribution func-
`tions and effective decision boundaries are likely to be
`nonlinear in the image space. To be effective, either classifiers
`must be able to extrapolate from a modest number of training
`samples or be efficient when dealing with a very large '
`number of these high-dimensional training samples.
`With an aim to give a comprehensive and critical survey
`of current face detection methods, this paper is organized as
`follows:
`In Section 2, we give a detailed review of
`techniques to detect faces in a single image. Benchmarking
`databases and evaluation criteria are discussed in Section 3.
`
`We conclude this paper with a discussion of several
`promising directions for face detection in Section 4.1
`Though we report error rates for each method when
`available, tests are often done on unique data sets and, so,
`comparisons are often difficult. We indicate those methods
`that have been evaluated with a publicly available test set. It
`can be assumed that a Lmique data set was used if we do not
`indicate the name of the test set.
`
`2 DETECTING FAcss IN A SINGLE IMAGE
`
`In this section, we review existing techniques to detect faces
`from a single intensity or color image. We classify single
`image detection methods into four categories; some
`methods clearly overlap category boundaries and are
`discussed at the end of this section.
`
`1. Knowledge-based methods. These rule-based meth-
`ods encode human knowledge of what constitutes a
`typical face. Usually, the rules capture the relation-
`ships between facial features. These methods are
`designed mainly for face localization.
`Feature invariant approaches. These algorithms aim
`to find structural features that exist even when the
`
`pose, viewpoint, or lighting conditions vary, and
`then use the these to locate faces. These methods are
`
`designed mainly for face localization.
`Template matching methods. Several standard pat-
`terns of a face are stored to describe the face as a whole
`
`or the facial features separately. The correlations
`between an input image and the stored patterns are
`
`this survey paper appeared at http://
`1. An earlier version of
`vision.ai.uiuc.eclu/mhyang/face-clectection-surveyhtrnl in March 1999.
`
`0004
`
`
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24. NO. 1.
`
`JANUARY 2002
`
`TABLE 1
`
`Categorization of Methods ior Face Detection in a Single Image
`
`Approach
`'
`Representative Works
`Knowledge-based
`
`Multiresolution rule-based method [170]
`
`Feature invariant
`'— Facial Features
`— Texture
`— Skin Color
`—- Multiple Features
`Template matching
`— Predefined face templates
`— Deformable Templates
`Appearance-based method
`— Eigenface
`— Distribution-based
`— Neural Network
`
`Groupingof edges [87] [178]
`Space Gray-Level Dependence matrix (SGLD) of face pattern [32]
`Mixture of Gaussian [172] [98]
`Integration of skin color, size and shape [79]
`
`Shape template [28]
`Active Shape Model (ASM) [86]
`
`Eigenvector decomposition and clustering [1631
`Gaussian distribution and multilayer perceptron [154]
`Ensemble of neural networks and arbitration schemes [128] '
`SVM with polynomial kernel [107]
`— Support Vector Machine (SVM)
`Joint statistics of local appearance and position [140]
`— Naive Bayes Classifier
`Higher order statistics with HMM [123]
`— Hidden Markov Model (1-IMM)
`Information-Theoretical Approach Kullback relative information [89] [24]
`
`
`computed for detection. These methods have been
`used for both face localization and detection.
`Appearance-based methods. In contrast to template
`matching, the models (or templates) are learned from
`a set of training images which should capture the
`representative variability of facial appearance. These
`learned models are then used for detection. These
`methods are designed mainly for face detection.
`Table 1 summarizes algorithms and representative
`works for face detection in a single image within these
`four categories. Below, we discuss the motivation and
`general approach of each category. This is followed by a
`review of specific methods including a discussion of their
`pros and cons. We suggest ways to further improve these
`methods in Section 4.
`
`2.1 Knowledge-Based Top-Down Methods
`in this approach, face detection methods are developed
`based on the rules derived from the researchers knowledge
`of human faces. It is easy to come up with simple rules to
`describe the features of a face and their relationships. For
`example, a face often appears in an image with two eyes
`that are symmetric to each other, a nose, and a mouth. The
`relationships between features can be represented by their
`
`relative distances and positions. Facial features in an input:
`image are extracted first, and face candidates are identified";
`based on the coded rules. A verification process is usually."
`applied to reduce false detections.
`One problem with this approach is the difficulty in f
`translating human knowledge into well-defined rules. If thei
`rules are detailed {i.e., strict), they may fail to detect faces‘;
`that do not pass all the rules. If the rules are too general;
`they may give many false positives. Moreover, it is difficult-;=['§
`to extend this approach to detect faces in different poses;
`since it is challenging to enumerate all possiblecases. On;
`the other hand, heuristics about faces work well in detecting;
`frontal faces in uncluttered scenes.
`Yang and Huang used a hierarchical 1cnow1edge—based_7f
`method to detect faces [170]. Their system consists of threefi
`levels of rules. At
`the highest
`level, all possible face:-E5
`candidates are found by scanning a window over the input}
`image and applying a set of rules at each location. The rules‘;
`at a higher level are general descriptions of what a faceiji
`looks like while the rules at lower levels rely on details of?"
`facial features. A multiresolution hierarchy of images
`created by averaging and subsampling, and an example
`shown in Fig. 1. Examples of the coded rules used to locate}
`face candidates in the lowest resolution include: "the centei”
`
`(a)
`
`f
`
`'
`
`ch)
`
`to)
`
`Fig. 1. (a) n = 1, original image. (b) n = 4_. (of n = 8. (d) n = 16. Original and corresponding low resolution images. Each square cell consists
`n x in pixels in which the intensity of each plxellis replaced by the average intensity of the pixels in that cell.
`
`0005
`
`
`
`Fr
`
`YANG ET AL: DETECTING FACES IN IMAGES: A SURVEY
`
`|'—’_ig, 2. "A typical face used in knowledge-based top-down methods:
`Rules are coded based on human knowledge about the characteristics
`(e,g., intensity distribution and difference) of the facial regions [170].
`
`part of the face (the dark shaded parts in Fig. 2) has four
`cells with a basically uniform intensity,” "the upper round
`part of a face {the light shaded parts in Fig. 2) has a basically
`uniform intensity,” and "the difference between the average
`gray values of the center part and the upper round part is
`significant.” The lowest resolution (Level 1)
`image is
`searched for face candidates and these are further processed
`at finer resolutions. At Level 2, local histogram equalization
`is performed on the face candidates received from Level 2,
`followed by edge detection. Surviving candidate regions are
`then examined at Level 3 with another set of rules that
`
`respond to facial features such as the eyes and mouth.
`Evaluated on a test set of 60 images, this system located
`faces in 50 of the test images while there are 28 images in
`which false alarms appear. One attractive feature of this
`method is that a coarse-to-fine or focus-of-attention strategy
`is used to reduce the required computation. Although it
`does not result in a high detection rate, the ideas of using a
`multiresolution hierarchy and rules to guide searches have
`been used in later face detection works [81].
`Kotropoulos and Pitas [81] presented a rule—basecl
`localization method which is similar to [71] and [170]. First,
`facial features are located with a projection method that
`Kanade successfully used to locate the boundary of a face [71].
`Let I(m, y) be the intensity value of an in X n. image at position
`(2:, y), the horizontal and vertical projections of the image are
`defined as HI(x) = 2°; I(:c,y) and VI(y) =
`I(:c,y).
`The horizontal profile 0 an input image is obtained first, and
`then the two local minima, determined by detecting abrupt
`changes in HI, are said to correspond to the left and right side
`of the head. Similarly, the vertical profile is obtained and the
`local minima are determined for the locations of mouth lips,
`nose tip, and eyes. These detected features constitute a facial
`candidate. Fig. 3a shows one example where the boundaries
`
`37
`
`of the face correspond to the local minimum where abrupt
`intensity changes occur. Subsequently, eyebrow/eyes, nos-
`trilsf nose, and the mouth detection rules are used to validate
`these candidates. The proposed method has been tested using
`a set of faces in frontal views extracted from the European
`ACTS MZVTS (Multilvlodal Verification for Teleservices and
`Security applications) database [116] whichcontains video
`sequences of 37 different people. Each image sequence
`-contains only one face iii a uniform background. Their
`"method provides correct face candidates in all tests. The
`detection rate is 86.5 percent if successful detection is defined
`as correctly identifying all facial features. Fig. 3b shows one
`example in which it becomes difficult to locate a face in a
`complex background using the horizontal and vertical
`profiles. Furthermore, this method cannot readily detect
`multiple faces as illustrated in Fig. 3c. Essentially,
`the
`projection method can be effective if the window over
`which it operates is suitably located to avoid misleading
`interference.
`
`2.2 Bottom-Up Feature-Base-cl Methods
`In contrast to the l<nowledge—based top-down approach,
`researchers have been trying to find invariant features of
`faces for detection. The underlying assumption is based on
`the observation that humans can effortlessly detect faces
`and objects in different poses and lighting conditions and,
`so,
`there must exist properties or -features which are
`invariant over these variabilities. Numerous methods have
`
`been proposed to first detect facial features and then to infer
`the presence of a face. Facial features such as eyebrows,
`eyes, nose, mouth, and hair—line are commonly extracted
`using edge detectors. Based on the extracted features, a
`statistical model is built to describe their relationships and
`to verify the existence of a face. One problem with these
`feature-based algorithms is that the image features can be
`severely corrupted due to illumination, noise, and occlu-
`sion. Feature boundaries can be weakened for faces, while
`
`shadows can cause numerous strong edges which together
`render perceptual grouping algorithms useless.
`
`2.2.1 Facial Features
`
`Sirohey proposed a localization method to segment a face
`from a cluttered background for face identification [145]. It
`uses an edge map (Canny detector [15]) and heuristics to
`remove and group edges so that only the ones on the face
`
`1".
`-2.:
`....L
`
`Fi9- 3. la] and (bi n = 8. (cl n = 4. Horizontal and vertical profiles. It is feasible to detect a single face by searching for the peaks in horizontal and
`Vertical profiles. However. the same rI‘net_hod has difficulty detecting faces in complex backgrounds or multiple faces as shown in lb) and {c).
`
`0006
`
`
`
`33
`
`JANUARY 2uo2
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELUGENCE. VOL. 24, N0. 1.
`graph correspond to features on a face, and the arcs represent
`the distances between different features. Ranking of
`constellations is based on a probability density function that
`?
`a constellation corresponds to a face versus the probability it
`was generated by an alternative mechanism {i.e., nonface). ,
`They used a set of 150 images for experiments in which a face '
`is considered correctly detected if any constellation correctly
`locates three or more features on the faces. This system is able _
`to achieve a correct localization rate of 86 percent.
`Instead of using mutual distances to describe the
`relationships between facial features in constellations, an '
`alternative method for modeling faces was also proposed
`by the Leung et al.
`[13],
`[88]. The representation and _
`ranking of the constellations is accomplished using the
`statistical theory of shape, developed by Kendall [75] and
`Mardia and Dryden [95]. The shape statistics is a joint-
`probability density function over N feature points, repre~ '_
`sented by (x,-,y,-), for the 51:11 feature under the assumption _
`that the original feature points are positioned in the plane‘ .
`according to a general 2N—dimensional Gaussian distribu-
`tion, They applied the same maximum—lil<elihood (ML)-
`method to determine the location of a face. One advantage '
`of these methods is that partially occluded faces can be 3
`located. However, it is unclear whether these methods can __
`be adapted to detect multiple faces effectively in a scene.
`I.u [177], [178], Yow and Cipolla presented a feature--:--
`based method that uses a large amount of evidence from the
`visual image and their contextual evidence. The first stage"
`applies a second derivative Gaussian filter, elongated at an_.f?'
`aspect ratio of three to one, to a raw image. interest points,
`detected at the local maxima in the filter response, indicate'=_.._-
`the possible locations of facial features. The second stage}-:5
`examines the edges around these interest points and groups};
`them into regions. The perceptual grouping of edges is:
`based on their proximity and similarity in orientation and-I;:_§
`stre