`
`681032B2
`
`US 6,681,032 B2
`(10) Patent No.:
`a2 United States Patent
`Bortolussi et al.
`(45) Date of Patent:
`*Jan. 20, 2004
`
`
`(54) REAL-TIME FACIAL RECOGNITION AND
`VERIFICATION SYSTEM
`
`4/1998 Ono et al... 382/190
`5,740,274 A
`OTHER PUBLICATIONS
`
`(75)
`
`Inventors: Jay F.Bortolussi, Andover, MA (US);
`Francis J. Cusack, Jr., Groton, MA
`(US); Dennis C. Ehn, Newton Centre,
`MA (US); Thomas M. Kuzeja,
`Norfolk, MA (US); Michael 8.
`Saulnier, Stoneham, MA (US)
`
`(73) Assignee: Viisage Technology, Inc., Littleton, MA
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`Chai et al. “Face segmentation using skin—color map in
`videophone applications,” JEFEE Transactions on Circuits
`and Systems for Video Technology, vol. 94, pp. 551-564
`(Jun. 1999).
`Chai et al. “Locating facial region of a head—and—shoulders
`color image,” Third IEEEInternational Conference on Auto-
`matic Face and Gesture Recognition, 1998. Proceedings, pp.
`124-129 (Apr. 1998).
`Imagawaet al. “Color—basedhandstracking system for sign
`language recognition,” Third IEEE International Confer-
`ence on Automatic Face and Gesture Recognition,1999.
`Proceedings, pp. 462-467 (Apr. 1998).
`.
`T. S. Jebara et al., “Parametrized Structure from Motion for
`3D Adaptive Feedback Tracking of Faces,” JEEE, pp.
`:
`.
`:
`.
`Th;
`144-150 (1997).
`
`
`
`
`This is subject toatpatent | dis- =i ie
`
`aeNOEE P. N. Belhumeuret al., “Eigenfaces vs. Fisherfaces: Recog-
`,
`nition Using Class Specific Linear Projection,” JEEE Trans-
`actions on Pattern Analysis and Machine Intelligence, vol.
`19, pp. 711-720 (Jul. 1997).
`Primary Examiner—Phuoc Tran
`(74) Altorney, Agent, or Firm—Lahive & Cockfield, LLP
`(657)
`ABSTRACT
`A-system and method for acquiring, processing, and com-
`paring an image with a stored image to determine if a match
`exists.
`In particular,
`the system refines the image data
`
`associated with an object based on pre-stored colorvalues,
`
`(21) Appl. No.: 09/932,832
`(22)
`Filed:
`Aug. 16, 2001
`(65)
`Prior Publication Data
`US 2002/0136448 Al Sep. 26, 2002
`.
`:
`=
`Fesbh
`Related U.8. Application Data
`ccryticeen
`Puvaedioae
`= er
`
`Sic
`
`(65) oeiesROLESARS,Mekontil20,
`
`; Eo
`such as flesh tone color. The system includes a storage
`.. GO6K 9/00
`siiead
`(G1)
`InteOM? sicccsss.
`elementfor storing flesh tone colors of a plurality of people,
`($2) USeGb sass
`_382/118:"382/170: 382/165;
`and a defining stage for localizing a region of interest in the
`382/190; 382/257
`image. A combination stage combines the unrefined region
`_ 382/107, 116,
`;
`(58) Field of Search .
`ofinterest with one or more pre-stored flesh tone colors to
`382/117, 118, 165, 190. 257, 173, 225
`refine the region of interest based on color. This flesh tone
`~
`170. 171: 345/629
`color matching ensures that at least a portion of the image
`,
`PELs
`corresponding to the unrefined region of interest having
`References Cited
`flesh tone color is incorporated into the refined region of
`;
`interest. Hence, the system can localize the head, based on
`U.S. PATENT DOCUMENTS
`the flesh tone color of the skin of the face in a rapid manner.
`;
`_
`According to one practice, the refined region of interest is
`‘hes Tage et Paaniee smaller than or about equal
`to the unrefined region of
`
`ohana
`5
`DeBan et a
`2a of
`10/1997 Koerner etal.
`............. 382/181
`‘mletest-
`1/1998 Moghaddamet al.
`382/228
`;
`F
`2/1998 Shackleton et al.
`......... 382/118
`62 Claims, 13 Drawing Sheets
`
`(56)
`
`-
`pepsitae
`5,386,
`LOZ
`5,675,663 A
`5,710,833 A
`5,719,951 A
`
`34
`30
`
`as a 26-
`
`
`
`
`Image
`Frame
`
`
`
`
`Image
`Acquisition
`Grabber
`
`
`Device
`Manipulation
`
`
`20
`
`36
`
`
`
`Discrimination
`Stage
`
`Compression
`(PCA )
`
`VWGO0OAEX1023
`U.S. Patent No. 9,955,551
`
`VWGoA EX1023
`U.S. Patent No. 9,955,551
`
`
`
`U.S. Patent
`
`02[OTA
`
`Jan. 20, 2004
`
`PEO£8C9¢co
`
`Sheet 1 of 13
`
`US 6,681,032 B2
`
`ase]
`
`uonendiueyy
`
` (VOd)uolssoidwio7)
`9€
`
`UOIBUTUILIOSIC]
`
`a3t}¢
`
`oulel]
`
`qaqqeiy
`
`oseuly
`
`uoT}IsInboy
`
`SIAN]
`
`
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 2 of 13
`
`US 6,681,032 B2
`
`(VOd)
`
`9EoF
`
`OS
`
`
`
`a3e1¢9Jaqqein
`
`U01}99}9qowely SpAeIoue?i
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 3 of 13
`
`UOHOY]juowysnipy10]09ozIgpeoyUNWIN]09=69Ploysemy]
`
`
`
`
`
`ISIOH/IPIMSurely
`
`
`
`edATAuNOo}Gsey
`
`sowely#WNUWIxey]
`
`bs.
`
`
`
`sjoxigsurjdures
`
`US 6,681,032 B2
`
`
`
`uonoe}0qqolg
`
`95
`
`
`
`u01}99}9qUOTO|W~
`
`
`
`
`Iojo)D8ZIgprayWnuWIxey]
`
`
`93e1$2142Lploysemy]xopuyGOu
`
`
`"&DIA
`SJOWpeewel]
`csvb
`
`
`
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 4 of 13
`
`US 6,681,032 B2
`
`88
`
`uoTyeTTG
`
`o3e1S
`
`10d
`
`0c
`
`VIOA
`
`9S98v8c8
`
`u01199}9q,
`
`q°1d
`
`UOIsoOIy
`
`2381S
`
`suljoq
`
`pueJO"
`
`PlOysomy
`
`IO[OD
`
`uoTJONpsy
`
`3381S
`
`
`
`93142}JojOD
`
`SOATEA,
`
`
`
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 5 of 13
`
`US 6,681,032 B2
`
` 66
`
`Rough
`ROI
`
`FIG. 4B
`
`74
`
`Refined
`ROI
`
`
`EigenHead
`Generation —Stage
`
`4J
`
`FIG. 3
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 6 of 13
`
`US 6,681,032 B2
`
`vOReUTWILIOSIGpurious}SOUPeoH
`
`
`98R1SaseigBul[eoS
`
`
`aeSara~OPI
`
`r9lSe_OE!
`peoyuesiq||ofquasig
`soyejduisy||soyetduroy,
`peopuestq|bogegegedOSI|{L_--_foOF]'Eyed
`'|siaysnitg|!étziokqI\IOTA
`aoyya||PuLeAT|soyetdurayadq
`EESaSSeFo>||{
`
`|a8ngsoye|duo|.
`
`[Z]198eUuNormboroy
`ouely
`pos----->---->
`
`peae|\|
`
`|
`
`
`
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 7 of 13
`
`US 6,681,032 B2
`
`Weight
`
`Weight
`
`Weight
`
`212
`
`200
`
`FIG.7A
`
`Image
`Width
`
`FIG. /B
`
`206
`
`208
`
`Image
`Width
`
`FIG./C
`
`Image
`Width
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 8 of 13
`
`US 6,681,032 B2
`
`0c!
`
`GSaia
`
`VOIC
`
`
`
`NOILVTHaaNOO
`
` dVW
`
`ALVIdWAL
`
`NDISdd
`
`TVIOVA
`
`lou
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 9 of 13
`
`US 6,681,032 B2
`
`
`
`305
`
`Provide Unknown Face
`
`
`
`
`
`
` 310
`
`Provide Reference Set of Faces
`
`3
`
`Normalize Reference Faces
`Employing DFFT
`|
`
`
`315
`
`320
`
`325
`
`ae
`
`335
`
`340
`
`345
`
`Obtain Eigenfaces for Reference Set
`
`Obtain , for Reference Faces
`
`Normalize Unknown Face for
`Contrast and Brightness
`Employing DFFT
`
`Provide Windowing Function
`
`Calculate , for Unknown Face
`
`Compare ©, for Unknown Face
`and Reference Faces
`
`FIG.
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 10of 13
`
`US 6,681,032 B2
`
`Yes
`
`Reacquire
`Image
`
`Calculate Eye Scale
`
`228
`
`230
`
`242
`
`
`
`\ C
`
`alculate
`Coordinates of
`ROI
`
`List of Motion
`
` 220
`Head ROT's
`222
`
`
` Is List of Head
`Is Last Motion
`ROI's > 0?
`ROI Empty?
`
`
`
`224
`
`Set Last
`Motion ROI counter
`
`to zero
`
`
` 226
`
`Calculate Head Counter
`
`
`
`
`
`
`243
`
`238
`Counter
`Coordinates
`
`Is Eye Error Counterless
`
`
`than Eye Error Maximum
`
`
`threshold?
`
`No
`
`245
`
`240
`
`Yes
`
`To Image
`Acqusition Stage
`
`Locate Head in ROI
`
`Locate Eyes in ROI
`
`250
`
`Were Eyes Found?
`No
`
`246
`
`248
`
`Yes
`
`252
`
`
`
`Set Eye Error
`Counter and Last ROI
`Counter to Zero
`—————
`
`FIG. 10
`
`232
`
`Locate Eyes in ROI
`
` No
`
`234
`
`Yes
`
`Set Eye Error
`Counter and Last ROI
`Counter to Zero
`
`236
`
`To Compression
`
`Calculate
`Head
`
`244
`
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 11 of13
`
`US 6,681,032 B2
`
` Increment Eye
`
`Error Counter
`
`262
`
`
`
`Set Last ROI
`
`in List to Last Motion ROI
`
`To Image
`
`Acquisition Stage
`
`
`FIG. IOA
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 12 of 13
`
`US 6,681,032 B2
`
`UOTVZI]PULION
`
`TT(OLA
`
`P
`
`a8r1S
`
`ccl
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 13 of 13
`
`US 6,681,032 B2
`
`405
`
`Store Eigen
`Coefficients for
`Acquired Image
`
` Search for Eigen
`
`Image in Database
`
`415
`
`Is Face in the
`Database
`
`No
`
`
`
`Allow Access
`
`
`
`420
`
`Allow Access
`
`}+—
`
`
`
`FIG. 12
`
`
`
`1
`REAL-TIMEFACIAL RECOGNITION AND
`VERIFICATION SYSTEM
`
`US 6,681,032 B2
`
`15
`
`20
`
`25
`
`30
`
`2
`comparing an image with a stored image to determine if a
`match exists. The facial recognition system determines the
`match in substantially real time. In particular, the present
`This application is a continuation application of Ser. No.
`invention employs a motion detection stage, blob stage and
`09/119,485,
`filed on Jul. 20, 1998, now U.S. Pat. No,
`a color matching stage at the input to localize a region of
`6,292,575.
`interest (ROT) in the image. The ROI is then processed by
`BACKGROUND OF THE INVENTION
`the system to locate the head, and then the eyes, in the image
`by employingaseries of templates, such as eigen templates.
`The present invention relates to systemsfor identifying an
`The system then thresholds the resultant eigenimage to
`individual, and in another case, verifying the individual's
`determine if the acquired image matchesa pre-stored image.
`identity to perform subsequent
`tasks, such as allowing
`This invention attains the foregoing and other objects with
`access to a secured facility or permit selected monetary
`transactions to occur.
`a system for refining an object within an image based on
`color. The system includes a storage element for storing
`flesh tone colors of a plurality of people, and a defining stage
`for localizing a regionof interest in the image. Generally, the
`region is captured from a camera, and hence the ROI is from
`image data corresponding to real-time video. This ROI is
`generally unrefined in that the system processes the image to
`localize or refine image data corresponding to preferred
`ROI, such as a person’s head. In this case, the unrefined
`region of interest includes flesh tone colors. A combination
`stage combines the unrefined region of interest with one or
`more pre-stored flesh tone colors to refine the region of
`interest based on the color, This flesh tone color matching
`ensuresthat at least a portion of the image correspondingto
`the unrefined region of interest having flesh tone color is
`incorporated into the refined region ofinterest. Hence, the
`system can localize the head, based on the flesh tone color
`of the skin of the face in a rapid manner. According to one
`practice, the refined region of interest
`is smaller than or
`about equal to the unrefined region of interest.
`According to one aspect, the system includes a motion
`detector for detecting motion of the image within a field of
`view, and the flesh tone colors are stored in any suitable
`storage element, such as a look-up-table. The flesh tone
`colors are compiled by generating a color histogram from a
`plurality of reference people. The resultant histogram is
`representative of the distribution of colors that constitute
`flesh tone color.
`
`Modern identification and verification systems typically
`provide components that capture an image of a person, and
`then with associated circuitry and hardware, process the
`image and then compare the image with stored images, if
`desired. In a secured access environment, a positive match
`between the acquired image of the individual and a pre-
`stored image allows access to the facility.
`The capture and manipulation of image data with modern
`identification systems places an enormous processing bur-
`den on the system. Prior art systems have addressed this
`problem by using Principal Component Analysis (PCA) on
`image data to reduce the amount of dala that needs to be
`stored to operate the system efficiently. An example of such
`a system is set forth in U.S. Pat. No. 5,164,992, the contents
`of which are hereby incorporated by reference. However,
`certain environmental standards need still be present
`to
`ensure the accuracy of the comparison between the newly
`acquired image of the pre-stored image. In particular, the
`individual is generally positioned at a certain location prior
`to capturing the image of the person, Additionally,
`the
`alignment of the body and face of the individual is controlled
`lo some degree to ensure the accuracy of the comparison.
`Lighting effects and other optical parameters are addressed
`to further ensure accuracy. Once the individual is positioned -
`at the selected location, the system then takes a snapshot of
`the person, and this still image is processed by the system to
`determine whether access is granted or denied.
`The foregoing system operation suffers from a real time
`cost
`that slows the overall performance of the system.
`Modern system applications require more rigorous determi-
`nations in terms of accuracy and time in order to minimize
`the inconvenience to people seeking access to the facility or
`attempting to perform a monetary transaction, such as at an
`automated teller machine (ATM). Typical
`time delays in
`order to properly position and capture an image of the
`person, and then compare the image with pre-stored images,
`is in the order of 3 to 5 seconds or even longer.
`Consequently,
`these near real-time systems are quickly
`becoming antiquated in today’s fast paced and technology
`dependent society. There thus exists a need in the art
`to
`develop a real-time facial
`identification and verification
`system that in real-time acquires and processes imagesofthe
`individual.
`
`40
`
`According to another aspect, a blob stage is also
`employed for connecting together selected pixels of the
`object in the image to form a selected numberof blobs. This
`stage in connection with the motion detector rapidly and
`with minimal overhead cost localize a ROI within the image.
`According to another aspect, the system when generating
`the flesh tone colors employs a first histogram stage for
`sampling the flesh tone colors of the reference people to
`generate a first flesh tone color histogram. The color is then
`transformed into ST color space. The system can also
`optionally employ a second histogram stage for generating
`a second color histogram not associated with the face within
`the image, and which is also transformed into ST color
`space.
`Accordingto still another aspect, the system comprises an
`erosion operation to the image data corresponding,
`for
`example, to a face, to separate pixels corresponding to hair
`from pixels corresponding to face, as well as to reduce the
`size of an object within the image, thereby reducingthe size
`of the unrefined region of interest.
`According to yet another aspect, the system also performs
`a dilation operation to expand oneofthe region of interests
`to obtain the object (e.g., face or eyes) within the image.
`The present invention also contemplates a facial recog-
`nition and identification system for identifying an object in
`an image. The system includes an image acquisition element
`for acquiring the image, a defining stage for defining an
`
`$0
`
`55
`
`Accordingly, an object of this invention is to provide a
`real-tine identification and verification system.
`Another object of this invention is to provide an identi-
`fication system that simplifies the processing of the acquired
`image while concomitantly enhancing the accuracy ofthe
`system.
`Other general and more specific objects of the invention
`will
`in part be obvious and will in part appear from the
`drawings and description which follow.
`SUMMARY OF THE INVENTION
`
`The present invention provides systems and methods of a
`facial
`recognition system for acquiring, processing, and
`
`60
`
`65
`
`
`
`US 6,681,032 B2
`
`3
`unrefined region of interest corresponding to the object in
`the image, and optionally a combinationstage for combining
`the unrefined region of interest with pre-stored flesh tone
`colors to refine the region of interest to ensure at
`least a
`portion of the image corresponding to the unrefined region
`of interest includes flesh tone color. The refined region of
`interest can be smaller than or about equal to the unrefined
`region of interest.
`According to another aspect, the system also includes a
`detection module for detecting a feature of the object.
`According to another aspect, the combination stage com-
`bines a blobs with one or more of flesh tone colors to
`develop or generate the ROT.
`According to another aspect, the system further includes
`a compression module for generating a set of eigenvectors of
`a training set of people in the multi-dimensional
`image
`space, and a projection stage for projecting the feature onto
`the multi-dimensional image space to generate a weighted
`vector that represents the person’s feature corresponding to
`the ROI. A discrimination stage compares the weighted
`vector corresponding to the feature with a pre-stored vector
`to determine whether there is a match.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The foregoing and other objects, features and advantages
`of the invention will be apparent from the following descrip-
`tion and apparent
`from the accompanying drawings,
`in
`which like reference characters refer to the same parts
`throughout the different views. The drawingsillustrate prin-
`ciples of the invention and, although not
`to scale, show
`relative dimensions.
`
`FIG. 1 is a schematic block diagram of a real time facial
`recognition system according to the teachings of the present
`invention.
`
`FIG, 2 is a schematic block diagram of the image acqui-
`sition and detection portions of the real time facial recog-
`nition system of FIG. 1 in accordance with the teachings of
`the present invention.
`FIG. 3 is more detailed schematic depiction of the detec-
`tion stage of PIG. 2, which includes a color matching stage
`in accord with the teachings of the present invention.
`FIG. 4A is another detailed schematic block diagram
`depiction of the detection stage illustrating the erosion and
`dilation operations performed on the image according to the
`teachings of the present invention.
`FIG. 4B is a schematicillustrative depiction of the manner
`in which color values stored in the color table are combined
`with a region ofinterest generated by the detection stage of
`FIG. 3 in accordance with the teachings of the present
`invention.
`
`FIG, 5 is a schematic depiction of the scaling and low
`resolution eigenhead feature of the present invention.
`FIG. 6 is a more detailed schematic block diagram depic-
`tion of the real
`time facial recognition system of FIG. 1
`according to the teachings of the present invention.
`FIGS. 7A through 7C illustrate various embodiments of a
`center-weighted windowing functions employed by the
`facial recognition system according to the teachings of the
`present invention.
`FIG, 8 is a block diagram depiction of the fast fourier
`transform stage for generating a correlation map.
`FIG. 9 is a flow-chart diagram illustrating the generation
`of the eigenfaces by employing a dot product in accordance
`with the teachings of the present invention.
`
`15
`
`20
`
`25
`
`30
`
`40
`
`$0
`
`55
`
`60
`
`65
`
`4
`FIGS. 10 and 10A are flow-chart diagramsillustrating the
`acquisition and determinationof a selected regionof interest
`by the facial recognition system according to the teachings
`of the present invention.
`FIG. 11 is a more detailed schematic block diagram
`depiction of the image manipulation stage of FIG. 1 in
`accordance with the teachings of the present invention.
`FIG, 12 is a fow-chart diagram illustrating the discrimi-
`nation performedby the real time facial recognition system
`of FIG. 1 according to the teachings ofthe present invention.
`
`DESCRIPTION OF ILLUSTRATED
`EMBODIMENTS
`
`The present invention relates to an image identification
`and verification system that can be used in a multitude of
`environments, including access control facilities, monitory
`transaction sites and other secured installations. The present
`invention has wide applicability to a number of different
`fields and installations, but for purposes of clarity will be
`discussed below in connection with an access control veri-
`fication and identification system. The following use ofthis
`example is not to be construed in a limiting sense.
`FIG. 1 illustrates a facial identification and verification
`system according to the teachings of the present invention.
`The illustrated system includes a multitude ofserially con-
`nected stages. These stages include an image acquisition
`stage 22, a frame grabber stage 26, a head find stage 28, an
`eye find stage 30, and an image manipulation stage 34.
`These stages function to acquire an image of an object, such
`as a person, and digitize it. The head and eyes are then
`located within the image. The image manipulation stage 34
`places the image in suitable condition for compression and
`subsequent comparison with pre-stored image identification
`information. Specifically, the output of the image manipu-
`lation stage 34 serves as the input to a compressionstage 36,
`which can be a principal component analysis compression
`stage. This stage produces eigenvectors [rom a reference set
`of images projected into a multi-dimensional image space.
`The vectors are then used to characterize the acquired image.
`The compression stage 36 in turn generates an output signal
`whichserves as an input to a discrimination stage 38, which
`determines whether the acquired image matchesa pre-stored
`image.
`FIG, 2 illustrates in further detail the front end portion of
`the system 20. The image acquisition stage 22 includes a
`video camera 40, which produces an S-video output stream
`42 at conventional frame rates. Those of ordinary skill will
`appreciate that
`the video camera used herein may be a
`monochrome camera, a full color camera, or a camera that
`is sensitive to non-visible portions of the spectrum. Those
`skilled in the art will also appreciate that the image acqui-
`sition stage 22 may berealized as a varietyof different types
`of video cameras and in general, any suitable mechanism for
`providing an image of a subject may be used as the image
`acquisition stage 22. The image acquisition stage 22 may,
`alternatively, be an interface to a storage device, such as a
`magnetic storage medium or other components for storing
`images or image data. As used herein, “image dater”refers
`to data such as luminance values, chrominance values, grey
`scale and other data associated with, defining or character-
`izing an image.
`The video output stream 42 is received by a frame grabber
`26, which servesto latch frames of the S-video input stream
`and to convert the S-video analog signal
`into a digitized
`output signal, which is then processed by the remainder of
`the system 20. It is known that conventional video cameras
`
`
`
`US 6,681,032 B2
`
`5
`produce an analog video output stream of 30 frames per
`second, and thus the frame grabber 26 is conventionally
`configured to capture and digitize image framesas this video
`rate. The video camera need not be limited to S-video, and
`can include near IR or IR mode, which utilizes RS 170
`video.
`
`6
`This filtering schemeis particularly advantageous in moni-
`toring transaction environments where an individual seeking
`access to, for example, an ATM machine, would have to
`approach the ATM machine, and thus create motion within
`the field of view.
`
`According to one practice, once the detection stage 30 has
`detected motion and determinesthat the motion of the object
`within the field of view exceeds a selected threshold, the
`blob detection stage 56 analyzes the binary motion image
`generated by the motion detection stage 54 to determine
`whether motion occurs within the field of view, for example,
`by sensing a change in pixel content over time. From this
`information, the blob detection stage 56 defines a region of
`interest (ROT) roughly corresponding to the head position of
`the person in the field of view. This ROI is truly a rough
`approximation of the region corresponding to the head and
`practically is an area larger than the head of the person,
`although it may also be a region of about the same size. The
`blob detection stage employs known techniques to define
`and then correlate an object(e.g., the head ofa person) in the
`image. The present invention realizes that the motion infor-
`mation can be employed to roughly estimate the region of
`interest within the image that corresponds to the person’s
`head. In particular, the blob detection stage 56 designates a
`“blob” corresponding roughly to the head or ROI of the
`person within the field of view. A blob is defined as a
`contiguousarea of pixels having the same uniform property,
`such as grey scale, luminance, chrominance, and so forth.
`Hence, the human body can be modeled using a connected
`set of blobs. Each blob has a spatial and color Gaussian
`distribution, and can have associated therewith a support
`map, which indicates which pixels are members of a par-
`ticular blob. The ability to define blobs through hardware
`(such as that associated with the blob detection stage 56) is
`well known in the art, although the blob detection stage 56
`can also be implemented in software. The system therefore
`clusters or blobs together pixels to create adjacent blobs, one
`of which corresponds to a person’s head, and hence is
`defined as the ROT.
`
`According to another practice and with further reference
`to FIG. 3, the color table 60 can be employed to further
`refine the ROI corresponding to the head. The word “refine”
`is intended to mean the enhancement, increase or improve-
`ment in the clarity, definition and stability of the region of
`interest, as well as a further refinement in the area defined as
`the region corresponding to the person’s head. For example,
`as discussed above,
`the ROI established by the motion
`detection stage is a rough region, larger than the head, that
`defines a general area within which the head can be found.
`Flesh tone colors can be employedto “lighten” or reduce the
`ROI characterizing the person’s head to better approximate
`the area corresponding to the head. This process serves to
`overall refine the region of interest. The color table is
`intended to be representative of any suitable data storage
`medium that is accessible by the system in a known manner,
`such as RAM, ROM, EPROM, EEPROM, andthe like, and
`is preferably a look-up table (LUT)
`that stores values
`associated with flesh tone colors of a sample group.
`The present
`invention realizes that people of different
`races have similar flesh tones. These flesh tones when
`analyzed in a three-dimensional color or RGB space are
`similarly distributed therein and hence lie essentially along
`a similar vector.It is this realization that enables the system
`to store flesh tone colors in a mannerthat allows for the rapid
`retrieval of color information. The flesh tone color valuesare
`created by sampling a reference set of people, e.g., 12-20
`people, and then creating a histogram or spatial distribution
`
`The frame grabber 26 produces a digitized frame output
`signal 44 which is operatively communicated with multiple
`locations. As illustrated, the output signal 44 communicates
`with a broadly termed detection stage 30, which corresponds
`al least in part to the head find stage 28 of FIG. 1. The output
`signal 44 also communicates with the compression stage 36,
`which is described in further detail below. Those of ordinary
`skill will realize that the camera itself can digitize acquired
`images, and hence the frame grabber stage 26 can be
`integrated directly into the camera.
`FIG. 3 is a further schematic depiction of the detection
`stage 50 of FIG. 2. The video frame signal 44 is received by
`the detection stage 50. The signal comprises an N by N array
`of pixels, such as a 256x256 pixel array, which have selected
`chrominance and luminance values. The pixels are inputted
`into the detection stage 50, and preferably are analyzedfirst
`by the motion detection stage 54, The motion detection stage
`54 receives a numberof input signals, as illustrated, such as
`signals corresponding to frame width and height, frame bit
`counts and type, maximum number of frames, selected
`sampling pixel rate, motion threshold values, maximum and
`minimum head size, and RGBindex threshold values. One
`or more of these additional input signals in combination with
`the frame input signal 44 trigger the motion detection stage
`to assess whether motion has occurred within the field of
`view. In particular, the motion detection stage 54 is adapted
`to detect subtle changes in pixel values, such as luminance
`values, which represent motion, especially when an object
`movesagainst a relatively still background image (such asa_;
`kiosk, cubicle or hallway). One method of determining
`motion is to perform a differencing function on selected
`pixels in successive frames, and then comparing changes in
`pixel values against a threshold value.If the pixel variations
`within the field of view exceed the threshold value, then an
`object
`is deemed to have moved within the image.
`Conversely,
`if the changes are below the threshold,
`the
`system determines that no suitable motion has occurred.
`According to another technique, a spatio-temperal filter-
`ing scheme can be applied to the captured image to detect
`motion, as set forth in U.S. Pat. No. 5,164,992 of Turk et al.,
`the contents of which are hereby incorporated by reference.
`In this scheme, a sequence ofimage frames from the camera
`40 pass through a spatio-temperal filtering module which
`accentuates image locations which change with time. The
`spatio-temperal filtering module identifies within the frame
`the locations and motion by performing a differencing
`operation on successive frames of the sequence of image
`frames. A typical output of a conventional spatio-temperal
`filter module have the moving object represented by pixel
`values having significantly higher luminance than areas of
`non-motion, which can appear as black.
`The spatio-temperalfiltered image then passes through a
`thresholding module which produces a binary motion image
`identifying the locations of the image for which the motion
`exceeds a threshold. Those of ordinary skill will recognize
`that the threshold can be adjusted to select a certain degree
`of motion, Specifically, minor movements within the field of
`view can be compensated for by requiring heightened
`degrees of motion within the field of view in order to trigger
`the system. Hence, the thresholding module can be adjusted
`to locate the areas of the image containing the most motion.
`
`15
`
`20
`
`25
`
`30
`
`40
`
`$0
`
`55
`
`60
`
`65
`
`
`
`US 6,681,032 B2
`
`7
`representative of each of the three primary colors that
`constitute flesh tone, e.g., red, blue and green, using the
`reference set of people as a basis in ST color space (H,).
`Alternatively, separate histograms for each color can be
`created, The color histogram is obtained by first reducing the
`24 bit color to 18 bit color, generating the color histogram,
`and then transforming or converting it into ST color space
`from the intensity profile in the RGB space. The system then
`obtains the non-face color histogram in STcolor space (H,,).
`This is obtained by assuming that non-face color is also
`uniformly distributed in the RGB space. The histogram is
`then converted into ST color space. The transformation into
`STcolor space is performed according to the following two
`equations:
`
`S=(B-G)(R+G+B)
`
`T=(2R-G-B)/(R+G4B)
`
`(Eq. 1)
`
`(Eq.2)
`
`The color histograms are then normalized by converting
`Hy and H,, to Py and P,, according to Bayes Rule, which
`determines the face probability within the color space.
`Consequently, the normalized face can be represented as:
`
`15
`
`20
`
`ProcePf(PptPn)
`
`(Eq.3)
`
`25
`
`The system then calculates the width and height of the
`table, as well as the values of the face probability look-up
`table 60 according to the following formula:
`
`LUTLi}Ppaceli255
`
`(Eq.4)
`
`30
`
`8
`
`object set member. The symbol ® is used to signify the
`erosion of one set by another. In equation 5, A is the set
`representing the image (ROI), B is the set representing the
`structuring element, and b is a memberofthe structuring
`element set B. Additionally, the symbol (A)_,, denotes the
`translation of A by —b. After the erosion operation is
`completed, the detection stage 50 performs the connected
`component blob analysis 56 on the ROI.
`After the blob analysis is performed on the image by the
`blob detection stage 56, a dilation stage 88 performs a
`dilation operation thereon to obtain the face regions within
`the ROI. The dilation operation is employed to expand or
`thicken the ROI, and is thus the inverse operationof erosion.
`Furthermore,
`the dilation operation is the union of all
`translations of the image by the structuring element
`members, and is defined as follows:
`
`A@B= UA)
`bebo
`
`ob
`
`(Eq. 6)
`
`The symbol @ signifies the erosion of one set by another.In
`equation 6, A is the set representing the image, B is the set
`representing the structuring element, and b is a member of
`the structuring element set B. Additionally, the term (A),,
`represents the translation of A by b. According to one
`practice, the set B can be defined as including the following
`coordinates {(0, 0), (0, 1), (1, 0), (1, 1)}. The output of the
`dilation stage is the ROI. The system can further process the
`image data by defining the largest area as the dominant face
`region, and merge other smaller face regions into the domi-
`nant face region. The center of the ROI is then determined
`by placing a 128x128 pixel box on the ROI(e.g., face) by
`setling its center as:
`
`X center=¥ (mean of dominant face region)
`
`Y center=top of the face region+average_sampled_face__height/4
`
`The foregoing detection stage 50 hence compares the
`rough ROI with the contents ofthe color table 60, performs
`selected erosion and dilation operations to obtain the pixels
`associated with the face (by analyzing chrominance values),
`and ultimately refines the ROI based on the contents of the
`color table 60. The entire operationis illustratively shown as
`a logic operation in FIG. 4B. Speci