`IPR of U.S. Patent 9,007,420
`
`0001
`
`
`
`Computer Vision and Image Understanding
`
`Volume 83, Number 3, September 2001
`
`Copyright © 2001 by Academic Press
`All Rights Reserved
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic
`or mechanical, including photocopy. recording. or any information storage and retrieval system,
`without permission in writing from the Publisher. Excep:r'ons.' Explicit permission from Academic
`Press is not required to reproduce a maximum of two figures or tables from an Academic Press article
`in another scientific or research publication provided that the material has not been credited to another
`source and that full credit to the Academic Press article is given. In addition, authors of work contained
`herein need not obtain permission in the following cases only: (1) to use their original figures or tables
`in their future works; (2) to make copies of their papers for use in their classroom teaching; and (3) to
`include their papers as part of their dissertations.
`The appearance of the code at the bottom of the first page of an article in this journal indicates the
`Publisher's consent that copies of the article may be made for personal or internal use. or for the
`personal or internal use of specific clients. This consent is given on the condition, however. that
`the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood
`Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of
`the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for
`general distribution, for advertising or promotional purposes, for creating new collective works, or
`for resale. Copy fees for pre-2001 articles are the same as those for current articles.
`
`1077-3142101 $35.00
`
`f<;li_.Ԥ_T F. wsnor LIBRARY
`—uJu
`Maoem THE UN1T‘E.D srrnns or A1vrER1t£3O:.'..' ""3 or Euggggggg;
`
`0 CT 5.. 3 2001
`This journal is printed on acid-fi'e6P9@W'.._l\'a_AD!soN! WI
`
`COMPUTER VISION AND IMAGE UNDERSTANDING
`
`(ISSN [071-3142)
`
`Published monthly by Academic Press
`Editorial and Production Offices: 525 B Street. Suite 1900. San Diego, CA 92101-4495
`Accounting and Circulation Ofiices: 6271' Sea Harbor Drive, Orlando, FL 32887-4900
`
`200]: Vblumes 81-84. Price: $899.00 USA. and Curbed-9.: $999.00 all other countries
`All prices include postage and handling.
`
`Information concerning personal subscription rates may be obtained by writing to the Publishers. All corrmpondence, permission requests.
`and subscription orders should be addressed to the office of the Publishers at 6277 Sea Harbor Drive, Orlando, FL 32387-4900 (telephone:
`407-345-2000). Send notices of change of address to the ofiice of the Publishers at least 6 to 8 weeks in advance. Please include both old and
`new addresses. POSTMASTER: Send changes of address to Computer Vision rmdlmage Understanding. 6277 Sea Harbor Drive, Orlando, FL
`32887-4900.
`Periodicals postage paid at Orlando. FL. and at additional mailing offices.
`
`Copyright © 200] by Academic Press
`
`0002
`
`0002
`
`
`
`This maternal may be protected by Copynghl law tTIIIe 1'-' U.S. Code)
`
`Computer Vision and Image Understanding 83, 236-274 (2001)
`doi:10.1006lcviu.200l.092l, available online at http:!lwww.idealibrary.corn on III Egl
`
`®
`
`Face Detection: A Survey
`
`Erik Hjelmfis'
`
`Department oflnformatfcs, Universiov of0.rlo, P.0. Box 1030 Blindem. N—{J3l'6 Oslo. Norway
`
`and
`
`Boon Kec Low
`
`Department ofMeteorology, University ofEa‘r'm‘2urgh, JCMB. Kings Buildings, Moyjield Road.
`Edinburgh EH9 3J2, Scotland, United Kingdom
`E-mail: er1'kh@hig.no; boon@me1.ed.ac.ult
`
`Received October 23. 2000; accepted April 17. 200]
`
`
`In this paper we present a comprehensive and critical survey of face detection
`algorithms. Face detection is a necessary first—step in face recognition systems, with
`the purpose of localizing and extracting the face region from the background. It also
`has several applications in areas such as content-based image retrieval, video coding,
`video conferencing, crowd surveillance. and intelligent hum:-1n—computer interfaces.
`However, it was not until recently that the face detection problem received consider-
`able attention among researchers. The human face is a dynamic object and has a high
`degree of variability in its apperance, which makes face detection a difficult problem
`in computer vision. A wide variety of techniques have been proposed. ranging from
`simple edge—based algorithms to composite high-level approaches utilizing advanced
`pattern recognition methods. The algorithms presented in this paper are classified
`as either feature-based or image-based and are discussed in tenns of their technical
`approach and performance. Due to the lack of standardized tests. we do not provide
`a comprehensive Comparative evaluation. but in cases where results are reported on
`common datasets, comparisons are presented. We also give a presentation of some
`proposed applications and possible application areas. @2001 Academic Press
`Key Words:
`face detection; face localization; facial feature detection; feature-
`based approaches; image-based approaches.
`
`1. INTRODUCTION
`
`The current evolution of computer technologies has envisaged an advanced machinery
`world, where human life is enhanced by artificial intelligence. Indeed, this trend has already
`
`1 Also with the faculty of technology, Gjpvik University College, Norway.
`
`10??-3142301 $35.00
`Copyright © 20t]I by Academic Press
`All rights of reproduction in any form reserved.
`
`236
`
`0003
`
`0003
`
`
`
`FACE DETECTION
`
`237
`
`
`
`FIG. 1. Typical training images for face recognition.
`
`prompted an active development in machine intelligence. Computer vision, for example,
`aims to duplicate human vision. Traditionally, computer vision systems have been used
`in specific tasks such as performing tedious and repetitive visual tasks of assembly line
`inspection. Current development in this area is moving toward more generalized vision
`applications such as face recognition and video coding techniques.
`Many of the current face recognition techniques assume the availability of frontal faces
`of similar sizes [14, 163]. In reality, this assumption may not hold due to the varied nature
`of face appearance and environment conditions. Consider the pictures in Fig. 1.2 These
`pictures are typical test images used in face classification research. The exclusion of the
`background in these images is necessary for reliable face classification techniques. However,
`in realistic application scenarios such as the example in Fig. 2, a face could occur in a
`complex background and in many different positions. Recognition systems that are based
`on standard face images are likely to mistake some areas of the background as a face. In
`order to rectify the problem, a visual front—end processor is needed to localize and extract
`the face region from the background.
`Face detection is one of the visual tasks which humans can do effortlessly. However, in
`computer vision terms, this task is not easy. A general statement of the problem can be
`defined as follows: Given a still or video image, detect and localize an unknown number (if
`any) of faces. The solution to the problem involves segmentation, extraction, and verification
`of faces and possibly facial features from an uncontrolled background. As a visual front-
`end processor, a face detection system should also be able to achieve the task regardless
`of illumination, orientation, and camera distance. This survey aims to provide insight into
`
`the contemporary research of face detection in a structural manner. Chellappa er al. [14]
`have conducted a detailed survey on face recognition research. In their survey, several
`issues, including segmentation and feature extraction, related to face recognition have been
`reviewed. One of the conclusions from Chellappa er all. was that the face detection problem
`has received surprisingly little attention. This has certainly changed over the past five years
`as we show in this survey.
`
`The rest of this paper is organized as follows: In Section 2 we briefly present the evolution
`of research in the area of face detection. Sections 3 and 4 provide a more detailed survey and
`
`discussion of the different subareas shown in Fig. 3, while in Section 5 we present some of
`the possible and implemented applications of face detection technology. Finally, summary
`and conclusions are in Section 6.
`
`‘The images are courtesy of the ORL (The Olivetti and Oracle Research Laboratory) face database at
`http:l!www.cam-orl.co.uk!facedatabase.hI:ml.
`
`0004
`
`0004
`
`
`
`238
`
`HJELMAS AND LOW
`
`
`
`FIG. 2. A realistic face detection scenario.
`
`Lowulevel
`ena .- is
`
`Feature-based
`approaches
`
`Feature
`era _
`: «-
`
`Active
`mshave
`
`Deformable
`
`--
`'-
`elntdl
`modele POM:
`
`approaches
`
`
`
`"°“"'
`networks
`
`Statietlcai
`-- aches
`
`FIG. 3. Face detection divided into approaches.
`
`0005
`
`0005
`
`
`
`FACE DETECTION
`
`239
`
`2. EVOLUTION OF FACE DETECTION RESEARCH
`
`Early efforts in face detection have dated back as early as the beginning of the 19705,
`where simple heuristic and anthropometric techniques [162] were used. These techniques
`are largely rigid due to various assumptions such as plain background, frontal face—a
`typical passport photograph scenario. To these systems, any change of image conditions
`would mean fine-tuning, if not a complete redesign. Despite these problems the growth of
`research interest remained stagnant until the 1990s [14], when practical face recognition
`and video coding systems started to become a reality. Over the past decade there has been
`a great deal of research interest spanning several important aspects of face detection. More
`robust segmentation schemes have been presented, particularly those using motion. color.
`and generalized information. The use of statistics and neural networks has also enabled faces
`to be detected from cluttered scenes at different distances from the camera. Additionally.
`there are numerous advances in the design of feature extractors such as the deformable
`templates and the active contours which can locate and track facial features accurately.
`Because face detection techniques requires a prion‘ information of the face, they can be
`effectively organized into two broad categories distinguished by their different approach
`to utilizing face lcnowledge. The techniques in the first category make explicit use of face
`knowledge and follow the classical detection methodology in which low level features are
`derived prior to knowledge-based analysis [9, 193]. The apparent properties of the face such
`as slcin color and face geometry are exploited at different system levels. Typically, in these
`techniques face detection tasks are accomplished by manipulating distance, angles, and area
`measurements of the visual features derived from the scene. Since features are the main
`
`ingredients, these techniques are termed the feature—based approach. These approaches have
`embodied the majority of interest in face detection research starting as early as the 19705 and
`therefore account for most of the literature reviewed in this paper. Taking advantage of the
`current advances in pattern recognition theory, the techniques in the second group address
`face detection as a general recognition problem. Image-based [33] representations of faces,
`
`for example in 2D intensity arrays, are directly classified into a face group using training
`algorithms without feature derivation and analysis. Unlike the feature-based approach,
`these relatively new techniques incorporate face knowledge implicitly [193] into the system
`through mapping and training schemes.
`
`3. FEATURE-BASED APPROACH
`
`The development of the feature-based approach can be further divided into three areas.
`Given a typical face detection problem in locating a face in a cluttered scene, low-level
`analysis first deals with the segmentation of visual features using pixel properties such as
`gray-scale and color. Because of the low-level nature, features generated from this analysis
`are ambiguous. In feature analysis, visual features are organized into a more global concept
`of face and facial features using information of face geometry. Through feature analysis,
`
`feature ambiguities are reduced and locations of the face and facial features are determined.
`The next group involves the use of active shape models. These models ranging front snakes,
`proposed in the late 1980s, to the more recent point distributed models (PDM) have been
`developed for the purpose of complex and nonrigid feature extraction such as eye pupil and
`lip tracking.
`
`0006
`
`0006
`
`
`
`240
`
`HJBLMAS AND LOW
`
`3.1. Low-Level Analysis
`
`3.1.1. Edges. As the most primitive feature in computer vision applications, edge rep-
`resentation was applied in the earliest face detection work by Sakai er al. [I62]. The work
`was based on analyzing line drawings of the faces from photographs, aiming to locate facial
`features. Craw at at. [26] later proposed a hierarchical framework based on Sakai er at. ‘s
`work to trace a human head outline. The work includes a line-follower implemented with
`
`curvature constraint to prevent it from being distracted by noisy edges. Edge features within
`the head outline are then subjected to feature analysis using shape and position information
`of the face. More recent examples of edge—based techniques can be found in [9, 16, 59.
`115. 116] for facial feature extraction and in [32, 50, 58. 60, 70. 7'6. 108, 201. 222] for
`face detection. Edge—based techniques have also been applied to detecting glasses in facial
`images [T9, 80].
`Edge detection is the foremost step in deriving edge representation. So far, many different
`types of edge operators have been applied. The Sobel operator was the most common filter
`among the techniques mentioned above [9, 16, 32, 76, S7, 108]. The Marr-Hildreth edge
`operator [124] is a part of the proposed systems in [50, TO]. A variety of first and second
`derivatives (Laplacian) ofGaussians have also been used in the other methods. For instance,
`a Laplacian of large scale was used to obtain a line drawing in [162] and steerable and multi-
`scale-orientation filters in [S9] and [58], respectively. The steerable filtering in [59] consists
`of three sequential edge detection steps which include detection of edges, determination of
`the filter orientation of any detected edges, and stepwise tracking of neighboring edges using
`the orientation information. The algorithm has allowed an accurate extraction of several key
`points in the eye.
`In an edge-detection-based approach to face detection, edges need to be labeled and
`matched to a face model in order to verify correct detections. Govindaraju [50] accomplishes
`this by labeling edges as the left side, hairline. or right side of a front view face and matches
`these edges against a face model by using the golden ratio3 [40] for an ideal face:
`
`height
`width
`
`1 + «/5
`2
`
`-
`
`(1)
`
`Govindaraju’s edge-based feature extraction works in the following steps:
`
`the Marr-Hildreth edge operator
`Edge detection:
`Thinning: a classical thinning algorithm from Pavlidis [141]
`Spur removal: each connected component is reduced to its central branch
`Filtering:
`the components with non-face-like properties are removed
`Corner detection:
`the components are split based on corner detection
`Labeling: the final components are labeled as belonging to the left side, hairline, or
`right side of a face
`
`The labeled components are combined to form possible face candidate locations based on
`a cost function (which uses the golden ratio defined above). On a test set of 60 images with
`complex backgrounds containing 90 faces, the system correctly detected 76% of the faces
`with an average of two false alarms per image.
`
`3 An aesthetically proportioned rectangle used by artists.
`
`0007
`
`0007
`
`
`
`FACE DETECTION
`
`241
`
`3.1.2. Gray information. Besides edge details. the gray information within a face can
`also be used as features. Facial features such as eyebrows, pupils, and lips appear generally
`darker than their surrounding facial regions. This property can be exploited to differentiate
`various facial parts. Several recent facial feature extraction algorithms [5, 53, 100] search
`for local gray minima within segmented facial regions. In these algorithms, the input images
`are first enhanced by contrast—stretching and gray-scale morphological routines to improve
`the quality of local dark patches and thereby make detection easier. The extraction of dark
`patches is achieved by low-level gray-scale thresholding. On the application side, Wong
`er al. [207] implement a robot that also looks for dark facial regions within face candidates
`obtained indirectly from color analysis. The algorithm makes use of a weighted human eye
`template to determine possible locations of an eye pair. In Hoogenboom and Lew [64], local
`
`maxima, which are defined by a bright pixel surrounded by eight dark neighbors, are used
`instead to indicate the bright facial spots such as nose tips. The detection points are then
`aligned with feature templates for correlation measurements.
`
`Yang and Huang [214], on the other hand, explore the gray-scale behavior of faces
`in mosaic (pyramid) images. When the resolution of a face image is reduced gradually
`either by subsampling or averaging, macroscopic features of the face will disappear. At low
`resolution. face region will become uniform. Based on this observation. Yang proposed a
`hierarchical face detection framework. Starting at low resolution images, face candidates
`are established by a set of rules that search for uniform regions. The face candidates are
`
`then verified by the existence of prominent facial features using local minima at higher
`resolutions. The technique of Yang and Huang was recently incorporated into a system for
`rotation invariant face detection by Lv et al. [119] and an extension of the algorithm is
`presented in Kotropoulos and Pitas [95].
`
`3.1.3. Color. Whilst gray information provides the basic representation for image fea-
`tures, color is a more powerful means of discerning object appearance. Due to the extra
`
`dimensions that color has, two shapes of similar gray information might appear very dif-
`ferently in a color space. It was found that different human skin color gives rise to a tight
`cluster in color spaces even when faces of difference races are considered [75, 125, 215].
`
`This means color composition of human skin differs little across individuals.
`One of the most widely used color models is RGB representation in which different colors
`
`are defined by combinations of red. green, and blue primary color components. Since the
`main variation in skin appearance is largely due to luminance change (brightness) [2l5],
`normalized RGB colors are generally preferred [2’l', 53, 75. 88, 92, 165, 181, 196, 202, 207,
`213, 215], so that the effect of luminance can be filtered out. The normalized colors can be
`
`derived from the original RGB components as follows:
`
`R
`’ - 7.5?
`G
`3 - Han
`B
`5’ * Hf»
`
`(2)
`
`<3’
`
`*4’
`
`From Eqs. (2)—(4), it can been seen that r + g + b = l. The normalized colors can be
`effectively represented using only r and g values as b can be obtained by noting b =
`1 — r — g. In skin color analysis, a color histogram based on r and 3 shows that face color
`
`0008
`
`0008
`
`
`
`242
`
`I-IJELMAS AND LOW
`
`occupies a small cluster in the histogram [215]. By comparing color information of a pixel
`with respect to the r and g values of the face cluster, the likelihood of the pixel belonging
`to a flesh region of the face can be deduced.
`Besides RGB color models, there are several other alternative models currently being
`
`used in the face detection research. In [106] HSI color representation has been shown to
`have advantages over other models in giving large variance among facial feature color
`clusters. Hence this model is used to extract facial features such as lips, eyes. and eyebrows.
`Since the representation strongly relates to human perception of color [l06, 178], it is also
`widely used in face segmentation schemes [51, 83, 113, 125, 175, 178, 187, 220].
`The YIQ color model has been applied to face detection in [29, 205]. By converting
`RGB colors into YIQ representation, it was found that the I-component, which includes
`color's ranging from orange to cyan, manages to enhance the skin region of Asians [29].
`The conversion also eflectively suppresses the background of other colors and allows the
`detection of small faces in a natural environment. Other color models applied to face de-
`
`tection include l-ISV [48, 60, 81, 216], YES [160]. YChCb [2, 48, 84, 130. 191, 200], YUV
`[1, 123]. CIE-xyz [15], L at a as b are [13. 109]. L as u a: v as [63], CSN (amodifiedrg represen-
`tation) [90, 91] and UCSa'Farnsworth (a perceptually uniform color system was proposed
`
`by Farnsworth [2] 0]) [208].
`Terrilon at at‘. [188] recently presented a comparative study of several widely used color
`spaces (or more appropriately named chrominance spaces in this context since all spaces
`seeks luminance-invariance) for face detection. In their study they compare normalized TSL
`(tint—saturation-luminance [1881]). rg and CIE-xy chrominance spaces, and CIE-DSI-I, I-ISV,
`YIQ, YES, CIE-L as u at: v =1:
`, and CD3 L as a 1: b as chrominance spaces by modeling skin
`color distributions with either a single Gaussian or a Gaussian mixture density model in each
`space. Hu‘s moments [68] are used as features and a multilayer perceptron neural network
`is trained to classify the face candidates. In general, they show that skin color in normalized
`chrominance spaces can be modeled with a single Gaussian and perform very well, while a
`mixture-model of Gaussians is needed for the unnormalized spaces. In their face detection
`test, the normalized TSL space provides the best results, but the general conclusion is that
`the most important criterion for face detection is the degree of overlap between skin and
`nonskin distributions in a given space (and this is highly dependent on the number of skin
`and nonskin samples available for training).
`Color segmentation can basically be performed using appropriate skin color thresholds
`where skin color is modeled through histograms or charts [13, 63, 88, 118. 178, 207, 220].
`More complex methods make use of statistical measures that model face variation within a
`wide user spectrum [3, 27, 75, 125. 139, 215]. For instance, Oliver er al. [139] and Yang and
`Waibel [215] employ a Gaussian distribution to represent a skin color cluster of thousands of
`skin color samples taken from difference races. The Gaussian distribution is characterized
`by its mean (1.1) and covariance matrix (2). Pixel color from an input image can be compared
`with the skin color model by computing the Mahalanobis distance. This distance measure
`then gives an idea of how close the pixel color resembles the skin color of the model.
`An advantage of the statistical color model is that color variation of new users can be
`adapted into the general model by a learning approach. Using an adaptive method, color
`detection can be more robust against changes in environment factors such as illumination
`conditions and camera characteristics. Examples of such a learning approach have been used
`by Oliver at at. and Yang and Waibel according to Eq. (6) which updates the parameters
`of the Gaussian distribution [139] (a similar approach can be found in the face recognition
`
`0009
`
`0009
`
`
`
`FACE DETECTION
`
`system of McKenna er at. [127]).
`
`2...... = [E;,,',m1 + E;;,] “‘
`
`is... = new [2;..'..... x “general + 2.1:. x mm]
`
`243
`
`(5)
`
`(6)
`
`If the use of a video sequence is available, motion information is a
`3.1.4. Motion.
`convenient means of locating moving objects. A straightforward way to achieve motion
`segmentation is by frame difference analysis. This approach, whilst simple, is able to discern
`a moving foreground efficiently regardless of the background content. In [5, 53, 152. 192,
`218], moving silhouettes that include face and body parts are extracted by thresholding
`accumulated frame difference. Besides face region, Luthon and Lievin [1 18], Crowley and
`Berard [2?], and Low [115. 116] also employ frame difference to locate facial features. In
`[27]. the existence of an eye-pair is hypothesized by measuring the horizontal and the vertical
`displacements between two adjacent candidate regions obtained from frame difference.
`Another way of measuring visual motion is through the estimation of moving image
`contours. Compared to frame difference, results generated from moving contours are always
`more reliable, especially when the motion is insignificant [I26]. A spatio-temporal Gaussian
`filter has been used by McKenna et al. [126] to detect moving boundaries of faces and
`human bodies. The process involves convolution of gray image I (x, y) with the second
`order temporal edge operator m(x. y, t) which is defined from the Gaussian filter G(x, y, r)
`as follows [126].
`
`(7)
`
`m(x.)'.t)=— V +—— G(x,y.r),
`:13 312
`
`< 2
`
`I 32)
`
`where u is a time scaling factor, and a is the filter width. The temporal edge operator then
`convolved with consecutive frames from an image sequence by
`
`.S'(x.y.t)=m(x.y.t)®I(x.y,r).
`
`(8)
`
`The result of the temporal convolution process S(x, y, it) contains zero-crossings which
`provide a direct indication of moving edges in I(x. y, 2‘). The locations of the detected
`zero-crossings are then clustered to finally infer the location of moving objects. More so-
`phisticated motion analysis techniques have also been applied in some of the most recent
`
`face detection research. Unlike the methods described above which identify moving edges
`and regions. these methods rely on the accurate estimation of the apparent brightness veloc-
`ities called optical’flow. Because the estimation is based on short-range moving patterns, it
`is sensitive to fine motion. Lee et at. [106] use optical flow to measure face motion. Based
`on the motion information, a moving face in an image sequence is segmented. The optical
`flow is modeled by the image flow constraint equation [106]
`
`1'xV;+IyVy+lr;=-0,
`
`where II, I,, and I, are the spatio-temporal derivatives of the image intensity and V, and
`V, are the image velocities. By solving the above equation for V, and i/,., an optical flow
`
`0010
`
`0010
`
`
`
`244
`
`HJELMAS AND Low
`
`field that contains moving pixel trajectories is obtained. Regions corresponding to different
`motion trajectories are then classified into motion and nonmotion regions. Since Eq. (9) has
`two unknowns, additional constraints are required. The choice of additional constraints is a
`major consideration in optical flow estimation and has been proposed by many researchers
`in motion analysis. Lee et at‘. proposed a line clustering algorithm which is a modified
`and faster version of the original algorithm by Schunck [I72]. By thresholding the image
`
`velocities, moving regions of the face are obtained. Because the extracted regions do not
`exactly define the complete face area, an ellipse fitting algorithm is employed to complete
`the face region extraction.
`
`3.! .5. Generalized measures. Visual features such as edges, color, and motion are de-
`rived in the early stage of the human visual system, shown by the various visual response
`patterns in our inner retina [19D, 206]. This pre-attentive processing allows visual informa-
`tion to be organized in various bases prior to high-level visual activities in the brain. Based
`on this observation, Reisfeld er al. [1531 proposed that machine vision systems should begin
`with pre-attentive low—level computation of generalized image properties. In their earlier
`work, Reisfeld and Yeshurun [154] introduced a generalized symmetry operator that is
`based on edge pixel operation. Since facial features are symmetrical in nature, the operator
`which does not rely on higher level a priori knowledge of the face effectively produces
`a representation that gives high responses to facial feature locations. The symmetry mea-
`sure assigns a magnitude at every pixel location in an image based on the contribution of
`surrounding pixels. The symmetry magnitude, M, (p), for pixel p is described as
`
`Map) = 2 co’. .6).
`(I'..r')€T‘(P)
`
`(10)
`
`where C(z‘, j) is the contribution of the surrounding pixel i and j (of pixel p) in the set of
`pixels defined by l"(p). Both the contribution and the pixel set are defined as in the following
`equations,
`
`1"<p)= [(:.r),*”‘*2'*”" =p]
`
`C(13)") = Da(I'. .I')P(='. J°)r:r,:.
`
`an
`
`where D(t'. j) is a distance weight function, P(i, j) is a phase weight function, and r.- and
`r; are defined as below,
`
`Dar. .0 = 7,-%,-e-Er?’-'
`
`Po’. .0 = (1 — cos(9: + 9; — 2a.;))(1 — cos(9: — 91))
`
`(12)
`
`1*; = log (1+ 1|v,,,[|)
`3* = arctan
`,
`
`. K. V” is the gradient of the intensity at
`.
`.
`.
`where pg» is any point (xi. yr) where k = 1, .
`point pic. and or,-,- is the counterclockwise angle between the line passing through p,- and p_,-
`and the horizon [154]. Figure 4 shows the example of M, (i. j) computed from the gradient
`of a frontal facial image. The symmetry magnitude map clearly shows the locations of facial
`
`0011
`
`0011
`
`
`
`FACE DETECTION
`
`245
`
`
`
`FIG. 4. Original image and M., of original image Reisfeld er al. [I53].
`
`features such as the eyes and mouth. Using the magnitude map, Reisfeld er al. managed a
`success rate of 95% in detecting eyes and mouths of various similar faces in a database.
`The face images in this database also contain various backgrounds and orientations. A more
`recent work by Reisfeld and Yeshurun is [I55].
`Later development in generalized symmetry includes [86, 94] and the work by Lin and
`Lin et at‘. [ 1 l 1]. Lin and Lin proposed a dual—masl< operator that exploits the radial symmetry
`distribution of 6; (see Eq. (12)) on bright and dark facial parts. Similar to Reisfeld er aI.’s
`operator, the dual-mask operator also managed to extract facial features from different
`backgrounds and under various poses, but the complexity of the latter is reported to be less
`than the original symmetry operator [I11]. A new attentional operator based on smooth
`convex and concave shapes was presented recently by Tankus er al. [I85]. Different from
`previous approaches, they make use of the derivative of gradient orientation, 6;, . with respect
`to the y-direction which is termed Y-Phase. It was shown by Tankus er al. that the Y-Phase
`of concave and convex objects (paraboloids) have a strong response at the negative x-axis.
`Because facial features generally appear to be parabolic, their Y-Phase response will also
`
`resemble that of the paraboloids, giving a strong response at the x-axis. By proposing a
`theorem and comparing the Y—Phase of log(log(log(l’))) and exp(exp(exp(l’ D) where I is
`the image, Tankus et af. also proved that Y-Phase is invariant under a very large variety of
`illumination conditions. Further experiments also indicate that the operator is insensitive to
`strong edges from nonconvex objects and texture backgrounds.
`
`3.2. Feature Analysis
`
`Features generated from low-level analysis are likely to be ambiguous. For instance, in
`locating facial regions using a skin color model, background objects of similar color can
`also be detected. This is a classical many to one mapping problem which can be solved
`
`by higher level feature analysis. In many face detection techniques, the knowledge of face
`geometry has been employed to characterize and subsequently verify various features from
`their ambiguous state. There are two approaches in the application of face geometry among
`the literature surveyed. The first approach involves sequential feature searching strategies
`based on the relative positioning of individual facial features. The confidence of a feature
`
`0012
`
`0012
`
`
`
`246
`
`HJELMAS AND Low
`
`existence is enhanced by the detection of nearby features. The techniques in the second
`approach group features as flexible constellations using various face models.
`
`3.2.}. Feature searching. Feature searching techniques begin with the determination
`of prominent facial features. The detection of the prominent features then allows for the
`existence of other less prominent features to be hypothesized using anthropometric mea-
`surements of face geometry. For instance, a small area on top of a larger area in a head and
`shoulder sequence implies a “face on top of shoulder" scenario, and a pair of dark regions
`found in the face area increase the confidence of a face existence. Among the literature
`
`survey, a pair of eyes is the most commonly applied reference feature [5, 2?, 52, 61, 207,
`214] due to its distinct side-by-side appearance. Other features include a main face axis [26,
`165], outline (top of the head) [26, 32, 162] and body (below the head) [192, 207].
`The facial feature extraction algorithm by De Silva et al. [32] is a good example of feature