(12) United States Patent
US 6,792,135 B1
Sep. 14, 2004
`(10) Patent No.:
`(45) Date of Patent:
(75) Inventor: Kentaro Toyama, Redmond, WA (US)
(73) Assignee: Microsoft Corporation, Redmond, WA
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(*) Notice:
(21) Appl. No.: 09/430,560
`(22) Filed:
Oct. 29, 1999
`(51) Int. Cl." .................................................. G06K 9/00
`(52) U.S. Cl. ......................... 382/118; 725/10; 382/173;
`(58) Field of Search ................................. 382/118, 209,
`382/199, 195, 191, 173; 725/10
Primary Examiner Bhavesh M. Mehta
39 Claims, 9 Drawing Sheets
Sep. 14, 2004
US 6,792,135 B1
US 6,792,135 B1
Sep. 14, 2004
`Sheet 2 of 9
US 6,792,135 B1
FIG. 2
Sep. 14, 2004
`Sheet 3 of 9
US 6,792,135 B1
`200 N-
FIG. 3
Sep. 14, 2004
`Sheet 4 of 9
US 6,792,135 B1
FIG. 4
Sep. 14, 2004
`Sheet 6 of 9
US 6,792,135 B1
`7 O O
FIG. 7
Sep. 14, 2004
`Sheet 7 of 9
US 6,792,135 B1
`8 O 8
`A A. E
Sep. 14, 2004
US 6,792,135 B1
FIG. 10
Sep. 14, 2004
US 6,792,135 B1
`US 6,792,135 B1
`The present invention relates in general to object detec
`tion and more particularly to a System and a method for
`detecting a face within an image using a relational template
`over a geometric distribution of a non-intensity image
`eyes and light skin other people have light eyes and dark
`skin. In addition, a face having a thick beard tends to have
`a dark cheek region, while the Same cheek region for a
`Smoothly shaven face appears light. This wide range of
`possible image intensities can drastically reduce the accu
`racy and reliability of a face detection System.
`Accordingly, there exists a need for a face detection
`System that utilizes relational templates based on an image
`property other than image intensity. Further, this face detec
`tion System would not require immense amounts of training
`data for initialization. The face detection System would
`accurately, efficiently and reliably detect any type of gener
`ally upright and forward-facing human face within an image.
`Whatever the merits of the above-mentioned systems and
`methods, they do not achieve the benefits of the present
`To overcome the limitations in the prior art as described
`above and other limitations that will become apparent upon
`reading and understanding the present specification, the
`present invention is a System and method for detecting a face
`within an image using a relational template over a geometric
`distribution of a non-intensity image property. The present
`invention provides accurate, efficient and reliable face detec
`tion for computer vision Systems. In particular, the present
`invention is especially insensitive to illumination changes, is
`applicable to faces having a wide variety of appearances and
`does not require vast amounts of training data for initializa
`In general, the System of the present invention detects a
`face within an image and includes a hypothesis module for
`defining an area within the image to be searched, a prepro
`cessing module for performing resizing and other enhance
`ments of the area, a feature extraction module for extracting
`image feature values based on a non-intensity image prop
`erty. In a preferred embodiment the image property used is
`edge density, although other Suitable properties (Such as
`pixel color) may also be used. The face detection System also
`includes a feature averaging module, for grouping image
`feature values into facial regions, and a relational template
`module that uses a relational template and the facial regions
`to determine whether a face has been detected.
`The present invention also includes a method for detect
`ing a face in an image using a relational template over a
`geometric distribution of a non-intensity image property.
`The method of the present invention includes determining an
`area of an image to examine, performing feature extraction
`on the area using on a non-intensity image property (Such as
`edge density), grouping the extracted image feature values
`into geometrically distributed regions called facial regions,
`averaging the image feature values for each facial region and
`using a relational template to determine whether a face has
`been detected. In addition, the method includes preprocess
`ing the image either before or after feature extraction.
`Preprocessing may include any Suitable image processing
`operations that enhance the image. Preferably, preprocessing
`includes a resizing module, for resealing the image to a
`canonical image size, and, optionally, an equalization
`module, for enhancing the contrast of the image.
`Other aspects and advantages of the present invention as
`well as a more complete understanding thereof will become
`apparent from the following detailed description, taken in
`conjunction with the accompanying drawings, illustrating by
`way of example the principles of the invention. Moreover, it
`is intended that the scope of the invention be limited by the
`claims and not by the preceding Summary or the following
`detailed description.
`Determination of the location and Size of a human face
`within an image, or face detection, is a critical part of many
`computer vision applications. Face detection is an important
`first step for many types of machine vision Systems (such as
`an automatic face recognition and interpretation System)
`because a face must first be detected before any further
`processing (Such as recognition and interpretation) can
`occur. Thus, accurate and reliable face detection is a crucial
`foundation for higher processing of a face image.
`Face detection is used in diverse applications Such as
`Systems that indeX and Search image databases by content,
`Surveillance and Security Systems, vision-based interfaces
`and Video conferencing. Once a face has been detected by a
`face detection System the resulting face image may be used
`in Several ways. For instance, a System that identifies and
`recognizes a person by their face (known as face
`recognition) can be used to detect and recognize a user's
`face when they sit in front of a computer. This System could
`then use the perSon's face as a Substitute for a password and
`automatically provide the user with the user's preferred
`WorkSpace environment. A detected face can also be exam
`ined to interpret the facial expression (known as face
`interpretation). Facial expression is a non-verbal form of
`communication that helps determine a perSon's emotion,
`intent and focus of attention. For example, eye tracking can
`be used to determine whether the user is looking at a
`computer Screen and where on the Screen the user's eyes are
`Each human face, however, is a unique and complex
`pattern, and detecting faces within an image is a significant
`problem. This problem includes the difficulty of varying
`illumination on a face and differences in facial appearance
`(Such as skin color, facial hair and eye color). Some Systems
`attempt to overcome this problem by trying to model (using,
`for example, neural networks) clusters of variations depend
`ing on their occurrence in a training Set. These Systems,
`however, often have significant machinery Surrounding their
`basic Statistical model and thus require immense amounts of
`training data to construct a Statistical model of facial images.
`An alternative approach used by Some Systems is based on
`“relational templates' over image intensity values. A rela
`tional template is a set of constraints that compares and
`classifies different regions of an image based on relative
`values of a regional image property. These types of Systems
`typically contain, for example, a constraint that an eye
`region (Such as the left eye region) must be darker than the
`cheek region (such as the right cheek region).
`Although the relational template approach is Sound, one
`problem with using a relational template over image inten
`sity values is that pixel intensity of an image can vary
`drastically depending on the lighting conditions and the
`types of faces. For instance, while Some people have dark
`The present invention can be further understood by ref
`erence to the following description and attached drawings
`that illustrate the preferred embodiments. Other features and
`advantages will be apparent from the following detailed
`description of the invention, taken in conjunction with the
`accompanying drawings, which illustrate, by way of
`example, the principles of the present invention.
`Referring now to the drawings in which like reference
`numbers represent corresponding parts throughout:
`FIG. 1 is a block diagram illustrating an apparatus for
`carrying out the invention.
`FIG. 2 is an overall block diagram of a computer vision
`System incorporating the present invention.
`FIG. 3 is a general block-flow diagram illustrating the
`face detection System of the present invention.
`FIG. 4 is a detailed block diagram illustrating the hypoth
`esis module of the face detection system shown in FIG. 3.
`FIG. 5 is a detailed block diagram illustrating the prepro
`cessing module of the face detection System shown in FIG.
`FIG. 6 is a detailed block diagram illustrating the feature
`extraction module of the face detection System shown in
`FIG. 3.
`FIG. 7 is a detailed block diagram illustrating the feature
`averaging module shown in FIG. 3.
`FIG. 8 is a detailed block diagram illustrating the rela
`tional template module shown in FIG. 3.
`FIG. 9 illustrates an exemplary example of facial regions
`used in the present invention.
`FIG. 10 is a working example of a relational template over
`edge density that is used in the present invention.
`FIG. 11A is a raw image used in a working example of the
`present invention.
`FIG. 11B shows the location of a detected face from the
`raw image of FIG. 11A.
`FIG. 11C shows the result of partial region averaging
`performed on the image of FIG. 11B.
`In the following description of the invention, reference is
`made to the accompanying drawings, which form a part
`thereof, and in which is shown by way of illustration a
`Specific example whereby the invention may be practiced. It
`is to be understood that other embodiments may be utilized
`and Structural changes may be made without departing from
`the Scope of the present invention.
`I. Introduction
`The present invention is embodied in a System and
`method for detecting a face in an image. The present
`invention uses a relational template over a geometric distri
`bution of a non-intensity image property to detect a face
`within the image and determine the Size and location of the
`face. Specifically, the present invention generates a hypoth
`esis and defines a Sub-region within an image where a face
`may be located, extracts feature information from that
`Sub-region using a non-intensity image property, groups the
`feature information into facial regions and uses a relational
`template to determine whether a face has been detected. In
`a preferred embodiment, the image property is edge density,
`which is generally a measure of the total length and Strength
`of edges present in a given area.
`The present invention may be used to detect a generally
`upright face in an image where the face is either directly
`US 6,792,135 B1
`facing or Slightly offset from the camera plane. In particular,
`Some portion of the face must be present for detection.
`Preferably an entirely forward-facing view of a face is
`present in an image. The System and method of the present
`invention are independent of illumination and thus may be
`used under various lighting conditions. In addition, because
`the image intensity is not used as the image property, the
`present invention can be used to detect faces having a wide
`variety of appearances without requiring lengthy initializa
`II. Exemplary Operating Environment
`FIG. 1 and the following discussion are intended to
`provide a brief, general description of a Suitable computing
`environment in which the invention may be implemented.
`Although not required, the invention will be described in the
`general context of computer-executable instructions, Such as
`program modules, being executed by a computer. Generally,
`program modules include routines, programs, objects,
`components, data Structures, etc. that perform particular
`tasks or implement particular abstract data types. Moreover,
`those skilled in the art will appreciate that the invention may
`be practiced with a variety of computer System
`configurations, including personal computers, Server
`computers, hand-held devices, multiprocessor Systems,
`microprocessor-based or programmable consumer
`electronics, network PCs, minicomputers, mainframe
`computers, and the like. The invention may also be practiced
`in distributed computing environments where tasks are
`performed by remote processing devices that are linked
`through a communications network. In a distributed com
`puting environment, program modules may be located on
`both local and remote computer storage media including
`memory Storage devices.
`With reference to FIG. 1, an exemplary system for imple
`menting the invention includes a general-purpose computing
`device in the form of a conventional personal computer 100,
`including a processing unit 102, a System memory 104, and
`a System buS 106 that couples various System components
`including the system memory 104 to the processing unit 102.
`The system bus 106 may be any of several types of bus
`Structures including a memory bus or memory controller, a
`peripheral bus, and a local bus using any of a variety of bus
`architectures. The System memory includes read only
`memory (ROM) 110 and random access memory (RAM)
`112. Abasic input/output system (BIOS) 114, containing the
`basic routines that help to transfer information between
`elements within the personal computer 100, Such as during
`start-up, is stored in ROM 110. The personal computer 100
`further includes a hard disk drive 116 for reading from and
`Writing to a hard disk, not shown, a magnetic disk drive 118
`for reading from or writing to a removable magnetic disk
`120, and an optical disk drive 122 for reading from or
`writing to a removable optical disk 124 such as a CD-ROM
`or other optical media. The hard disk drive 116, magnetic
`disk drive 128 and optical disk drive 122 are connected to
`the system bus 106 by a hard disk drive interface 126, a
`magnetic disk drive interface 128 and an optical disk drive
`interface 130, respectively. The drives and their associated
`computer-readable media provide nonvolatile Storage of
`computer readable instructions, data Structures, program
`modules and other data for the personal computer 100.
`Although the exemplary environment described herein
`employs a hard disk, a removable magnetic disk 120 and a
`removable optical disk 124, it should be appreciated by
`those skilled in the art that other types of computer readable
`media that can Store data that is accessible by a computer,
`Such as magnetic cassettes, flash memory cards, digital
`US 6,792,135 B1
`Video disks, Bernoulli cartridges, random acceSS memories
`(RAMs), read-only memories (ROMs), and the like, may
`also be used in the exemplary operating environment.
`A number of program modules may be Stored on the hard
`disk, magnetic disk 120, optical disk 124, ROM 110 or RAM
`112, including an operating System 132, one or more appli
`cation programs 134, other program modules 136 and pro
`gram data 138. A user (not shown) may enter commands and
`information into the personal computer 100 through input
`devices such as a keyboard 140 and a pointing device 142.
`In addition, a camera 143 (or other types of imaging devices)
`may be connected to the personal computer 100 as well as
`other input devices (not shown) including, for example, a
`microphone, joystick, game pad, Satellite dish, Scanner, or
`the like. These other input devices are often connected to the
`processing unit 102 through a Serial port interface 144 that
`is coupled to the system bus 106, but may be connected by
`other interfaces, Such as a parallel port, a game port or a
`universal serial bus (USB). A monitor 146 or other type of
`display device is also connected to the system bus 106 via
`an interface, Such as a video adapter 148. In addition to the
`monitor 146, personal computers typically include other
`peripheral output devices (not shown), Such as speakers and
`The personal computer 100 may operate in a networked
`environment using logical connections to one or more
`remote computers, Such as a remote computer 150. The
`remote computer 150 may be another personal computer, a
`Server, a router, a network PC, a peer device or other
`common network node, and typically includes many or all of
`the elements described above relative to the personal com
`puter 100, although only a memory storage device 152 has
`been illustrated in FIG.1. The logical connections depicted
`in FIG. 1 include a local area network (LAN) 154 and a wide
`area network (WAN) 156. Such networking environments
`are commonplace in offices, enterprise-wide computer
`networks, intranets and the Internet.
`When used in a LAN networking environment, the per
`sonal computer 100 is connected to the local network 154
`through a network interface or adapter 158. When used in a
`WAN networking environment, the personal computer 100
`typically includes a modem 160 or other means for estab
`lishing communications over the wide area network 156,
`such as the Internet. The modem 160, which may be internal
`or external, is connected to the system bus 106 via the serial
`port interface 144. In a networked environment, program
`modules depicted relative to the personal computer 100, or
`portions thereof, may be Stored in the remote memory
`storage device 152. It will be appreciated that the network
`connections shown are exemplary and other means of estab
`lishing a communications link between the computerS may
`be used.
`III. General Overview
`As shown in FIGS. 2-11 for the purposes of illustration,
`the invention is embodied in a System and a method for a
`System for detecting a face within an image using a rela
`tional template over a geometric distribution of an image
`property. This image property may be any property of the
`image other than intensity, Such as, for example, edge
`density or color. Using an image property other than image
`intensity alleviateS problems arising from intensity varia
`tions due to lighting conditions or facial features.
`FIG. 2 is an overall block diagram of a computer vision
`System incorporating the present invention. This computer
`Vision System is only one example of Several types of
`Systems that could incorporate the face detection System of
`the present invention. In general, the input to the computer
`Vision System is an unprocessed image or raw image 200 that
`may contain a human face. The raw image 200 may be
`obtained from a storage device (Such as a hard drive or an
`optical disk) or live from a still or Video camera.
`The raw image 200 image is received by a face detection
`system 210 of the present invention that searches for and
`detects any faces present in the raw image 200. AS explained
`in detail below, a hypothesis is generated for where in the
`image 200 to Search for a face and a Sub-region is Subse
`quently defined. The raw image 200 is preprocessed, infor
`mation about any features present in the image 200 are
`extracted based on an image property and a relational
`template is used to determine whether a human face has been
`detected. Face information 220, which includes a face image
`and the location and dimensions (or size) of the Sub-region
`containing the face, is then transmitted from the face detec
`tion system 210 to additional processing modules 230 that
`output relevant data 240 from the modules 230. The addi
`tional processing modules 230 can include, for example,
`face identification and recognition modules (which may
`form a part of a computer vision Security System) and face
`interpretation and tracking modules (which may be part of a
`vision-based computer interface System).
`FIG. 3 is a general block-flow diagram illustrating the
`face detection system shown in FIG. 2. Generally, the face
`detection System 210 of the present invention inputs an
`image to be examined, determines a Sub-region of the image
`to examine, performs preprocessing on the Sub-region, per
`forms feature extraction based on image property and uses
`a relational template to determine if a face is present in the
`sub-region. The raw image 200 is received by the face
`detection system 210 and sent to a hypothesis module 300
`that generates a hypothesis and defines the dimensions of a
`Sub-region in the raw image 200 (or cropped image) where
`a face may be found. The cropped image is Sent as output
`(box 310) to a preprocessing module 320, which prepares
`the raw image 200 for further processing. The preprocessed
`cropped image is then Sent to a feature extraction module
`The feature extraction module 330 extracts any facial
`features present in the preprocessed cropped image by using
`a feature template based on an image property. Further,
`image features values are obtained by the feature extraction
`module 330 and sent to a feature averaging module 340. The
`feature averaging module 340 determines a number of facial
`regions, places the image features values into a facial
`regions and determines a combined image feature value for
`each facial region. The combined values are then Sent to a
`relational template module 350 that builds a relational table
`and determines a relational value based on each region's
`combined image feature value.
`Based a comparison between the relational value and a
`threshold value, the system 210 determines whether a face
`has been detected in the cropped image (box 360). If not,
`then a face is not within in the Sub-region that was examined
`and a different sub-region needs to be generated (box 370).
`This occurs by returning to the hypothesis module 300
`where a different hypothesis is generated about where a face
`may be located within the image 200. In addition, based on
`the hypothesis generated a different cropped image is
`defined for examination as described previously. If a face is
`detected in the cropped image then face information is sent
`as output (box 380). Face information includes, for example,
`a image of the face, the location of the face within the image
`200, and the location and dimensions of the cropped image
`where the face was found.
`IV. System and Operational Details
`FIG. 4 is a detailed block diagram illustrating the hypoth
`esis module of the face detection system shown in FIG. 3.
`Generally, the hypothesis module 300 generates an assump
`tion as to the location of a face within the raw image 200 and
`defines the dimensions of a Sub-region (within the image
`200) in which to look for a face. The hypothesis module 300
`includes a generation module 400, for generating a hypoth
`esis about where a face may be located, and a cropping
`module 410, for defining a Sub-region to examine.
`The generation module 400 receives raw image (box 420)
`and generates a hypothesis about the location of a face in the
`raw image (box 430). The hypothesis may include, for
`example, information about which image Scales, aspect
`ratioS and locations to examine. In a preferred embodiment
`of the invention, hypotheses are generated that include
`rectangular Sub-regions of the image within a range of Scales
`and at all possible image locations. Alternatively, hypothesis
`generation may include other types of vision processing that
`target regions of the image most likely to contain a face
`(Such as regions of the image that contain skin color or
`ellipse-shaped blobs). The generated hypothesis is then sent
`as output (box 440) to the cropping module 410.
`The cr

