`
` 1Yu-Hao Huang((cid:3827)(cid:2181)(cid:4588)), 2Chiou-Shann Fuh ((cid:3361)(cid:9573)(cid:3667))
`
`
`
`1Dept. of Computer Science and Information Engineering, National Taiwan University
`E-mail:r94013@csie.ntu.edu.tw
`2Dept. of Computer Science and Information Engineering, National Taiwan University
`E-mail:fuh@csie.ntu.edu.tw
`
`accurate smile detection system on a common personal
`computer with a common webcam.
`
`1.2. Related Work
`
`The problem related to smile detection is facial
`expression recognition. There are many academic
`researches on facial expression recognition, such as [12]
`and [4], but there is not much research about smile
`detection. Sony’s smile shutter algorithm and detection
`rate are not available. Sensing component company
`Omron [11] has recently released smile measurement
`software. It can automatically detect and identify faces
`of one or more people and assign each smile a factor
`from 0% to 100%. Omron uses 3D face mapping
`technology and claim its detection rate is more than
`90%. But it is not available and we can not test how it
`performs. Therefore we would test our program with
`Sony DSC T300 and show that we have a better
`performance on detecting slight smile and lower false
`alarm rate on grimace expressions.
`From section 2 to section 4 we would describe our
`algorithm on face detection and facial feature tracking.
`In section 5, we would run experiments on FGNET face
`database [3] and show results with 88.5% detection rate
`and 12.04% false alarm rate while Sony T300 performs
`72.7% detection rate and 0.5% false alarm rate. Section
`6 will compare with Sony smile shutter on some real
`case video sequence.
`
`
`2. FACE DETECTION
`
`
`2.1. Histogram Equalization
`
`Histogram Equalization is a method for contrast
`enhancement. We could always take our pictures with
`under-exposure
`or
`over-exposure
`due
`to
`the
`uncontrolled environment lightness, which would make
`the details of the images difficult to recognize. Figure 1
`is a gray image from Wikipedia [17] that shows a scene
`with pixel values very concentrated. Figure 2 is the
`result after histogram equalization.
`
`
`
`ABSTRACT
`
`
`Due to the rapid development of computer hardware
`design and software technology, the user demands of
`electric products are increasing gradually. Different
`from the traditional user interface, such as keyboard
`and mouse, some new human computer interactive
`system like the multi-touch technology of Apple iPhone
`and the touch screen support of Windows 7 are
`catching more and more attention. For medical
`treatment, there are some eye-gaze tracking systems
`developed for cerebral palsy and multiple sclerosis
`patients. In this paper, we propose a real-time, accurate,
`and robust smile detection system and compare our
`method with the smile shutter function of Sony DSC
`T300. We have better performance than Sony on slight
`smile.
`
`
`
`1. INTRODUCTION
`
`
`1.1. Motivation
`
`From Year 2000, the rapid development of hardware
`technology and software environment make friendly
`and fancy user interface more and more possible. For
`example, for some severely injured patients who cannot
`type or use mouse, there are eye gaze tracking system,
`by which the user can control the mouse by simply
`looking at the word or picture shown on the monitor. In
`2007, Sony has released its first consumer camera
`Cyber-shot DSC T200 with smile shutter function. The
`smile shutter function can detect at most three human
`faces in the scene and automatically takes a photograph
`if smile is detected. Many users have reported that
`Sony’s smile shutter function is not accurate as
`expected, and we find that the Sony’s smile shutter is
`only capable of detecting big smile but not able to
`detect slight smile. On the other hand, smile shutter
`would also be triggered if the user makes a grimace
`with teeth appearing. Therefore we propose a more
`
`0001
`
`GTL 1018
`IPR of U.S. Patent 9,007,420
`
`
`
`
`Figure 1: Before histogram equalization [17].
`
`Figure 3: Integral image [15].
`If we have the integral image, then we can define some
`rectangle features shown in Figure 4:
`
`
`
`
`Figure 2: After histogram equalization [17].
`
`2.2. AdaBoost Face Detection
`
`To obtain real-time face detection, we use the method
`proposed by Viola and Jones [15]. There are three
`components inside the paper. The first is the concept of
`“Integral Image”, which is a new representation of an
`image for people to calculate the features quickly. The
`second is the Adaboost algorithm introduced by Freund
`and Schapire [5] in 1997, which can extract the most
`important features from the others. The last component
`is ‘cascaded’ classifiers. We can eliminate the non-face
`regions in the first few stages. With this method, we
`can detect faces from 320 by 240 pixel images at 60
`frames per second with Intel Pentium M. 740 1.73 GHz.
`We will briefly describe the three major components
`here.
`
`2.2.1. Integral Image
`Given an image I, we define an integral image I’(x, y)
`by
`∑
` ,( yxI
`
`
` ,'(' yxI
`
` ,' yyxx
`'
`<
`<
`The value of the integral image at location (x, y) is the
`summation over all the left and upper pixel values of
`the original image I.
`
`)'
`
`=
`
`
`
`Figure 4: Rectangle feature [15].
`The most commonly used features are two-rectangle
`feature,
`three-rectangle feature, and four-rectangle
`feature. The value of two-rectangle feature is the
`difference of the pixels sum over gray rectangle to the
`pixels sum over the white rectangle. These two regions
`have the same size and are horizontally or vertically
`adjacent as shown in blocks A, B. Block C is a three-
`rectangle feature whose value is also defined as the
`difference of pixels sum over the gray region to the
`pixels sum over the white regions. Block D is an
`example of the four-rectangle features. Since these
`features have different areas, it must be normalized
`after calculating the difference. After calculating the
`integral image in advance, it would be easy to obtain
`one rectangle region’s pixels sum by one-plus operation
`and two-minus operations. For example, to calculate
`the sum of pixels within rectangle D in Figure 5, we
`can simply compute 4 + 1 – (2 + 3) value in the integral
`image.
`
`)
`
`
`
`
`
`0002
`
`
`
`Figure 5: Rectangle sum [15].
`
`2.2.2. AdaBoost
`There will be a large number of rectangle features with
`different sizes. For example, for a 24 by 24 pixel image,
`there are 160,000 features. Adaboost is a machine-
`learning algorithm used to find the T best classifiers
`with minimum error. To obtain the T classifiers, we
`will repeat the following algorithm for T iterations:
`
`
`
`
`Figure 6: Boosting algorithm [15].
`
`After running the boosting algorithm for a goal object,
`we have T weak classifiers with different weighting.
`Finally we have a stronger classifier C(x).
`
`2.2.3. Cascade Classifier
`Since we have the T best object detection classifiers, we
`can tune our cascade classifier with user input: the
`detection rate and the false positive rate. The algorithm
`is shown below:
`
`
`Figure 7: Training algorithm for building cascade
`detector [15].
`
`
`3. FACIAL FEATURE DETECTION AND
`TRACKING
`
`
`3.1. Facial feature location
`
`Although there are many features on human face, most
`of them are not very useful for facial expression
`representation. To obtain the facial features we need,
`we analyze these features from BIOID face database [6].
`The database consists of 1521 gray-level images with
`resolution 384x286 pixels. There are 23 persons in the
`database and every image consists of a frontal view face
`from one of them. Besides, there are 20 manually
`marked feature points as shown in Figure 8.
`
`
`Figure 8: Face and marked facial features [6].
`Here is the list of the feature points:
`0 = right eye pupil
`1 = left eye pupil
`2 = right mouth corner
`3 = left mouth corner
`4 = outer end of right eye brow
`5 = inner end of right eye brow
`6 = inner end of left eye brow
`7 = outer end of left eye brow
`
`0003
`
`
`
`8 = right temple
`9 = outer corner of right eye
`10 = inner corner of right eye
`11 = inner corner of left eye
`12 = outer corner of left eye
`13 = left temple
`14 = tip of nose
`15 = right nostril
`16 = left nostril
`17 = centre point on outer edge of upper lip
`18 = centre point on outer edge of lower lip
`19 = tip of chin
`We first use Adaboost algorithm to detect the face
`region in the image with scale factor 1.05 to get as
`precise position as possible, and then normalize the
`face size and calculate the feature relative positions and
`their standard deviation.
`
`Table 1 Four facial feature locations and error mean
`with faces normalized to 100x100 pixels.
`
`3.2. Optical flow
`
`Optical flow is the pattern of motion of objects [18],
`which is usually used for motion detection and object
`segmentation. In our research, we use optical flow to
`find the displacement vector of feature points. Figure
`12 shows the corresponding feature points in two
`images. Optical flow has three basic assumptions. The
`first assumption is brightness consistency, which means
`that the brightness of a small region remains the same.
`The second assumption is the spatial coherence, which
`means the neighbors of a feature point usually have
`similar motions as the feature. The third assumption is
`temporal persistence, which means that the motion of a
`feature point should change gradually over time.
`
`
`
`
`
`
`
`
`
`
`Figure 9: Original image.
`
`
`Figure 10: Image with
`face detection and features
`marked.
`We detect 1467 faces from 1521 images with detection
`rate 96.45%, and we drop some false positive samples
`and finally get 1312 useful data. Figure 11 shows one
`result, in which the center of the feature rectangle is the
`mean of the feature position and the width and height
`correspond to four times x and y feature point standard
`deviation. Then we can find initial feature position fast.
`
`
`Figure 11: Face and initial feature position with blue
`rectangle.
`Table 1 shows the first four feature points experiment
`results.
`
`Landmark Index
`0: right eye pupil
`1: left eye pupil
`2:
`right mouth
`corner
`3: left mouth corner 64.68 78.38
`
`34.70 78.29
`
`2.49
`
`2.99
`
`4.10
`
`4.15
`
`(Pixel) (Pixel) (Pixel)
`(Pixel)
`X
`Y X St Dev Y St Dev
`30.70 37.98
`1.64
`1.95
`68.86 38.25
`1.91
`1.91
`
`0004
`
`
`Figure 12: Feature point correspondence in two images.
`Let I(x, y, t) be the pixel value at location (x, y) at
`time t. From the assumptions, the pixel value would be
`I(x + u, y + v, t + 1) with displacement (u, v) at time t
`+ 1. Vector (u, v) is also called the optical flow of (x, y).
`Then we have I(x, y, t) = I(x + u, y + v, t + 1). To find
`the best (u, v), we select a region around the pixel (for
`example, a window of size 10 x 10 pixels) and try to
`minimize the sum of the square error as below:
`∑
`
` ),( vuE
`(( , yuxI
`
`
`,tv
`
` ,( 2)),tyxI
`
`)1
`=
`+
`−+
`+
`R
`We use Taylor series to expand the first order
`derivatives of I(x + u, y + v, t + 1) as
` ,( )1,tvyuxI
`
`
` ),,( tyxI
` ),,( ),,( ),,( tyxIvtyxIutyxI
`
`
`
`=++
`+
`+
`+
`+
`x
`y
`t
`
`Replace the expansion in the original equation and we
`∑
`
` ),( vuE
`vI
`uI
`I
`(
`2)
`.
`=
`+
`+
`would have
`y
`x
`R
`uI
`vI
`I
`Equation
`0=
`+
`+
` is also called the optical
`x
`y
`t
`flow constraint equation. To find the extreme value, the
`two equations below should be satisfied.
`dE
`uI
`I
`I
`vI
`)
`(2
`y
`x
`du
`dE
`dv
`R
`Finally we have the linear equation:
`
`t
`
`+
`
`+
`
`=
`
`0
`
`t
`
`x
`
`
`
`(2
`
`uI
`x
`
`+
`
`vI
`y
`
`+
`
`I
`
`t
`
`)
`
`I
`
`y
`
`=
`
`0
`
`∑∑
`
`R
`
`=
`
`=
`
`
`
`
`
`and low misdetection rate and high false alarm rate
`with small Tsmile. We run different Tsmile in FGNET
`database and results are shown in Table 2. We use 0.55
`Dstd = 3.014 pixels as our standard Tsmile to have 11.5%
`misdetection rate and 12.04% false alarm rate.
`Misdetection
`False
`Threshold
`Rate
`Rate
`
`Alarm
`
`0.4*Dstd
`19.73%
`6.66%
`0.5*Dstd
`14.04%
`9.25%
`0.55*Dstd
`12.04%
`11.50%
`0.6*Dstd
`8.71%
`13.01%
`0.7*Dstd
`4.24%
`18.82%
`0.8*Dstd
`2.30%
`25.71%
`Table 2 Misdetection rate and false alarm rate with
`different thresholds.
`
`
`5. REAL-TIME SMILE DETECTION
`
`
`
`It is important to note that the feature tracking will
`accumulate errors as time goes by and that would lead
`to misdetection or false alarm results. Since we do not
`want users to take an initial neutral photograph every
`few seconds, which would be annoying and unrealistic.
`Moreover, it is difficult to identify the timing to refine
`feature position. If the user is performing some facial
`expression when we refine the feature location, it would
`lead us to a wrong point to track. Here we propose a
`method to automatically refine for real-time usage.
`Section 5.1 would describe our algorithm and Section
`5.2 would show some experiments.
`
`5.1. Feature Refinement
`
`From our very first image, we have user’s face images
`with neutral facial expression. We would build user’s
`mouth pattern grey image at that time. The mouth
`rectangle is surrounded by four feature points: right
`mouth corner, center point of upper lip, left mouth
`corner, center point of lower lip. Actually we would
`expand the rectangle wider and higher to one standard
`deviation in each direction. Figure 13 shows the user’s
`face and Figure 14 shows the mouth pattern image. For
`each following
`image, we use normalized cross
`correlation (NCC) block matching method to calculate
`the best matching block to the pattern image around the
`new mouth region and calculate their cross correlation
`value. The NCC equation is:
`∑
`
` ,( yxf
`(
`)
`
`
`,( ) ),(, RvuRyx
`
`'
`∈
`∈
`∑
`∑
`
` ,( yxf
`
`)
`(
`−
`
`
`,( ) Ryx
`
` ),( Rvu
`'
`∈
`∈
`The equation shows the cross correlation between
`two blocks R and R’. If the correlation value is larger
`
`C
`
`=
`
`−
`
`
`
` )( ),( vugf
`
`
`
`−
`
`g
`
`)
`
`f
`
`2
`
`)
`
` ),(( vug
`
`−
`
`2
`
`g
`
`)
`
`
`
`II
`x
`
`t
`
`∑
`
`R
`
`
`
`II
`y
`
`t
`
`II
`x
`
`y
`
`∑
`
`R
`
`⎦⎤
`
`I
`
`2
`y
`
`⎢⎣⎡
`
`⎦⎤
`
`y
`
`I
`
`2
`x
`
`∑
`
`R
`
`⎡
`⎢⎣
`
`⎢⎣⎡
`
`⎤
`⎡
`⎤
`u
`v
`+⎥⎦
`−=⎥⎦
`⎢⎣
`∑
`∑
`∑
`u
`v
`II
`−=⎥
`+⎥
`x
`R
`R
`R
`By solving the linear equation, we can obtain optical
`flow vector (u, v) for (x, y). We use the concept of
`Lucas and Kanade [8] to iteratively solve the (u, v). It is
`similar to Newton’s method.
`1. Choose a (u, v) arbitrarily, and shift the (x, y) to (x
`+ u, y + v) and calculate the relative Ix and Iy.
`2. Solve the new (u’, v’) and update (u, v) to (u + u’,
`v + v’).
`3. Repeat Step 1 until (u’, v’) converges.
`To have fast feature point tracking, we build the
`pyramid images of the current and previous frames
`with four
`levels. At each
`level we search
`the
`corresponding point in a window size 10 by 10 pixels
`and stop the search to get into next level with accuracy
`of 0.01 pixels.
`
`
`4. SMILE DETECTION SCHEME
`
`
`
`2.
`
`3.
`
`low
`We have proposed a fast and generally
`misdetection and low false alarm video-based method
`of smile detector. We have 11.5% smile misdetection
`rate and 12.04% false alarm rate on the FGNET
`database. Our smile detect algorithm is as follows:
`1. Detect the first human face in the first image frame
`and locate the twenty standard facial features
`position.
`In every image frame, use optical flow to track the
`position of left mouth corner and right mouth
`corner with accuracy of 0.01 pixels and update the
`standard facial feature position by face tracking
`and detection.
`If x direction distance between the tracked left
`mouth corner and right mouth corner is larger than
`the standard distance plus a threshold Tsmile, then
`we claim a smile detected.
`4. Repeat from Step 2 to Step 3.
`In the smile detector application, we strongly
`consider that x direction distance between the right
`mouth corner and left mouth corner plays an important
`role in the human smile action. We do not consider y
`direction displacement. Since the user can have little up
`or down head rotation and that will falsely alarm our
`detector. How to decide our Tsmile threshold? As shown
`in Table 1, we have mean distance 29.98 pixels
`between left mouth corner and right mouth corner and
`their standard deviation value 2.49 and 2.99 pixels. Let
`Dmean be 29.98 pixels and Dstd be 2.49 + 2.99 = 5.48
`pixels. In each frame, let Dx be x distance between two
`mouth corners. If Dx is greater than Dmean + Tsmile, then
`it is a smile, otherwise, it is not. With large Tsmile, we
`have high misdetection rate and low false alarm rate,
`
`0005
`
`
`
`
`Cross correlation value 0.925
`
`
`Cross correlation value 0.767
`
`
`
`
`
`
`Cross correlation value 0.502
`Table 3 Cross correlation value of mouth pattern for
`smile activity.
`
`
`
`Cross Correlation Value
`
`1
`
`21
`
`41
`
`61
`
`1
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`Correlation
`
`81 101 121 141 161 181 201 221 241
`Image Index
`
`
`Table 4 Cross correlation value of mouth pattern with
`seven smile activities.
`
`5.2.2. Face Database
`In Section 5.2.1 we have shown clear evidence that
`neutral expression and smile expression have a great
`difference on correlation value. We obtain more
`convincing threshold value by running cross correlation
`value’s mean and standard deviation on FGNET face
`database. There are eighteen people, who have three
`sets of image sequences for each. Each set has 101
`images or 151 images and roughly half of them are
`neutral face and others are smile face. We drop some
`false performing datasets. By setting threshold value
`0.7, we have neutral face mean of mean and standard
`deviation correlation value 0.956 and 0.040. At the
`same time, smile face values are 0.558 and 0.097. It is
`not surprising that smile face has higher variance then
`
`than some threshold, which we would describe more
`clearly later, it means the mouth state is very close to
`the neutral one rather than an open mouse, a smile
`mouth or other state. Then we would relocate feature
`positions. To not take too much computation time on
`finding match block, we set the search region center by
`initial position. To overcome the non-sub pixel block
`matching, we set the search range to a three by three
`block and find the largest correlation value as our
`results.
`
`
`Figure 13: User face
`and mouth region (blue
`rectangle).
`
`
`Figure 14: Grey image
`of mouth pattern [39x24
`pixels].
`
`
`5.2. Experiment
`
`As we have mentioned, we want to know the threshold
`value to do the refinement. We have a real-time case in
`Section 5.2.1 to show the correlation value changes
`with smile expression and off-line case on FGNET face
`database to decide the proper threshold in Section 5.2.2.
`
`5.2.1. Real-Time Case
`Table 3 shows a sequence of images and their
`correlation value corresponding to the initial mouth
`pattern. These images give us some level of confidence
`that using correlation to identify the neutral or smile
`expression is possible. To show stronger evidence, we
`run a real-time case by doing seven smile activities with
`244 frames and record their correlation value. Table 4
`shows the image index and their correlation values. If
`we set 0.7 as our threshold, we would have mean
`correlation value 0.868 and standard deviation 0.0563
`for neutral face and mean value 0.570 and standard
`deviation 0.0676 for smile face. The difference of mean
`value 0.298 = 0.868-0.570 is greater than two times
`sum of
`standard deviation 0.2478 = 2 x
`(0.0563+0.0676). To have more persuasive evidence,
`we run on FGNET face database in Section 5.2.2.
`
`
`Initial neutral expression.
`
`
`
`Initial mouth pattern
`39x25 pixels.
`
`
`
`0006
`
`
`
`neutral face since different user has different smile type.
`We set three standard deviation distances 0.12 = 3*0.04
`as our threshold. If correlation value is beyond the
`original value minus 0.12, we can refine user’s feature
`position automatically and correctly.
`
`
`6. EXPERIMENTS
`
`
`We test our smile detector on the happy part of FGNET
`facial expression database [3]. There are fifty-four
`video streams coming from eighteen persons and three
`video sequences for each. We drop four videos which
`failed to perform smile procedure due to users out of
`control. Additionally, the ground truths of image are
`labeled manually. From Figure 15 to Figure 20 are six
`sequential images which show the procedure of smiling.
`In each frame, there are twenty blue facial features (the
`fixed initial) and twenty red facial features (the
`dynamically updated) and the green label lying at the
`left-bottom of the image. Besides, we put the word
`“Happy” at the top of the image if we have smile
`detected. Figure 15 and Figure 20 are correctly detected
`images, while from Figure 16 to Figure 19 are false
`alarm results. But the false alarm samples are somehow
`ambiguous to different people.
`
`
`
`Figure 15: Frame 1 with
`correct detection.
`
`
`Figure 16: Frame 2 with
`false alarm (Ground truth:
`Non Smile, Detector:
`Happy).
`
`
`Figure 17: Frame 3 with
`false alarm (Ground truth:
`Non Smile, Detector:
`Happy).
`
`
`Figure 18: Frame 4 with
`false alarm (Ground truth:
`Non Smile, Detector:
`Happy).
`
`
`Figure 19: Frame 5 with
`false alarm (Ground truth:
`Non Smile, Detector:
`Happy).
`
`
`Figure 20: Frame 6 with
`correct smile detection.
`
`Total False Alarm Rate:
`Total Detection Rate:
`10.4%
`90.6%
`Table 5 illustrates our detection result with Sony T300
`of Person 1 in FGNET face database. Figure 21 and
`Figure 22 show the total detection and false alarm rate
`results of the fifty video sequences in FGNET. We have
`a normalized detection rate 88.5% and false alarm rate
`12% while Sony T300 has a normalized detection rate
`72.7% and false alarm rate 0.5%.
`Sony T300
`Our program
`Image
`index 63 with
`misdetection
`(Ground
`Truth: Smile, Detector:
`Non Smile).
`
`index 64 with
`Image
`misdetection
`(Ground
`Truth: Smile, Detector:
`Non Smile).
`
`
`index 63 with
`Image
`correct detection (Ground
`Truth: Smile, Detector:
`Happy).
`
`
`index 64 with
`Image
`correct detection (Ground
`Truth: Smile, Detector:
`Happy).
`
`
`index 65 with
`Image
`correct detection (Ground
`Truth: Smile, Detector:
`Sony).
`
`
`index 65 with
`Image
`correct detection (Ground
`Truth: Smile, Detector:
`Happy).
`
`
`
`index 66 with
`Image
`index 66 with
`Image
`correct detection (Ground
`correct detection (Ground
`Truth: Smile, Detector:
`Truth: Smile, Detector:
`Sony).
`Happy).
`Total Detection Rate:
`Total Detection Rate:
`100%
`96.7%
`Total False Alarm Rate:
`Total False Alarm Rate:
`0%
`0%
`Table 5 Detection results of Person 1 in FGNET.
`
`0007
`
`
`
`[3] J. L. Crowley, T. Cootes, “FGNET, Face and Gesture
`Recognition
`Working
`Group,”
`http://www-
`prima.inrialpes.fr/FGnet/html/home.html, 2009.
`[4] B. Fasel and J. Luettin, “Automatic Facial Expression
`Analysis: A Survey,” Pattern Recognition, Vol. 36, pp.
`259-275, 2003.
`[5] Y. Freund and R. E. Schapire, “A Decision-Theoretic
`Generalization of On-Line Learning and an Application to
`Boosting,” Journal of Computer and System Sciences, Vol.
`55, No. 1, pp. 119-139, 1997.
`Research,”
`HumanScan,
`“BioID-Technology
`http://www.bioid.com/downloads/facedb/index.php, 2009.
`[7] R. E. Kalman, “A New Approach to Linear Filtering and
`Prediction Problems,” Transactions of the American
`Society of Mechanical Engineers — Journal of Basic
`Engineering, Vol. 82, pp. 35-45, 1960.
`[8] B. D. Lucas and T. Kanade, “An Iterative Image
`Registration Technique with an Application to Stereo
`Vision,” Proceedings. of International Joint Conference
`on Artificial Intelligence, Vancouver, pp.674-679, 1981.
`[9] S. Millborrow and F. Nicolls, “Locating Facial Features
`with an Extended Active Shape Model,” Proceedings of
`European Conference on Computer Vision, Marseille,
`France,
`Vol.
`5305,
`pp.
`504-513,
`http://www.milbo.users.sonic.net/stasm, 2008.
`[10] OpenCV, Open Computer Vision
`http://opencv.willowgarage.com/wiki/, 2009.
`Vision,”
`[11]
`Omron,
`“OKAO
`http://www.omron.com/r_d/coretech/vision/okao.html,
`2009.
`[12] M. Pantic, S. Member, and L. J. M. Rothkrantz,
`“Automatic Analysis of Facial Expressions: The State of
`the Art,” IEEE Transactions on Pattern Analysis and
`Machine Intelligence, Vol. 22, pp.1424-1445, 2000.
`[13] M. J. Swain and D. H. Ballard, “Color Indexing,”
`International Journal of Computer Vision, Vol. 7. No. 1,
`pp. 11-32, 1991.
`[14] J. Shi and C. Tomasi, “Good Features to Track,” IEEE
`Conference on Computer Vision and Pattern Recognition,
`pp. 593-600, 1994.
`[15] P. Viola and M. J. Jones, “Robust Real-Time Face
`Detection,” International Journal of Computer Vision,
`Vol. 57, No. 2, pp. 137-154, 2004.
`[16] P. Wanga, F. Barrettb, E. Martin, M. Milonova, R. E.
`Gur, R. C. Gur, C. Kohler, and R. Verma, “Automated
`Video-Based
`Facial
`Expression
`Analysis
`of
`Neuropsychiatric Disorders,” Neuroscience Methods, Vol.
`168, pp. 224-238, 2008.
`Equalization,”
`“Histogram
`[17]
`Wikipedia,
`http://en.wikipedia.org/wiki/Histogram_equalization,
`2009.
`“Optical
`Wikipedia,
`[18]
`http://en.wikipedia.org/wiki/Optic_flow, 2009.
`[19] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting
`Faces in Images: A Survey,” IEEE Transactions on
`Pattern Analysis and Machine Intelligence, Vol. 24, No. 1,
`pp. 34-58, 2002.
`[20] C. Zhan, W. Li, F. Safaei, and P. Ogunbona, “Emotional
`States Control for On-Line Game Avatars,” Proceedings
`of ACM SIGCOMM Workshop on Network and System
`Support for Games, Melbourne, Australia, pp. 31-36,
`2007.
`
`Library,
`
`Flow,”
`
`[6]
`
`0008
`
`(cid:742)(cid:770)(cid:769)(cid:780)
`(cid:738)(cid:776)(cid:773)(cid:774)
`
`(cid:742)(cid:770)(cid:769)(cid:780)
`(cid:738)(cid:776)(cid:773)(cid:774)
`
`
`
`
`
`(cid:726)(cid:770)(cid:768)(cid:771)(cid:756)(cid:773)(cid:760)(cid:691)(cid:727)(cid:760)(cid:775)(cid:760)(cid:758)(cid:775)(cid:764)(cid:770)(cid:769)(cid:691)(cid:741)(cid:756)(cid:775)(cid:760)
`
`(cid:708)(cid:705)(cid:709)
`
`(cid:708)
`
`(cid:707)(cid:705)(cid:715)
`
`(cid:707)(cid:705)(cid:713)
`
`(cid:707)(cid:705)(cid:711)
`
`(cid:707)(cid:705)(cid:709)
`
`(cid:707)
`
`(cid:727)(cid:760)(cid:775)(cid:760)(cid:758)(cid:775)(cid:764)(cid:770)(cid:769)(cid:691)(cid:741)(cid:756)(cid:775)(cid:760)
`
`(cid:708) (cid:712) (cid:716) (cid:708)(cid:710) (cid:708)(cid:714) (cid:709)(cid:708) (cid:709)(cid:712) (cid:709)(cid:716) (cid:710)(cid:710) (cid:710)(cid:714) (cid:711)(cid:708) (cid:711)(cid:712) (cid:711)(cid:716)
`
`(cid:732)(cid:769)(cid:759)(cid:760)(cid:779)
`
`Figure 21: Comparison of detection rate.
`
`
`(cid:726)(cid:770)(cid:768)(cid:771)(cid:756)(cid:773)(cid:760)(cid:691)(cid:729)(cid:756)(cid:767)(cid:774)(cid:760)(cid:691)(cid:724)(cid:767)(cid:756)(cid:773)(cid:768)(cid:691)(cid:741)(cid:756)(cid:775)(cid:760)
`
`(cid:708) (cid:712) (cid:716) (cid:708)(cid:710) (cid:708)(cid:714) (cid:709)(cid:708) (cid:709)(cid:712) (cid:709)(cid:716) (cid:710)(cid:710) (cid:710)(cid:714) (cid:711)(cid:708) (cid:711)(cid:712) (cid:711)(cid:716)
`
`(cid:732)(cid:769)(cid:759)(cid:760)(cid:779)
`
`(cid:708)
`
`(cid:707)(cid:705)(cid:715)
`
`(cid:707)(cid:705)(cid:713)
`
`(cid:707)(cid:705)(cid:711)
`
`(cid:707)(cid:705)(cid:709)
`
`(cid:707)
`
`(cid:729)(cid:756)(cid:767)(cid:774)(cid:760)(cid:691)(cid:724)(cid:767)(cid:756)(cid:773)(cid:768)(cid:691)(cid:741)(cid:756)(cid:775)(cid:760)
`
`Figure 22: Comparison of false alarm rate.
`
`
`7. CONCLUSION
`
`
`We have proposed a relatively simple and accurate real-
`time smile detection system that can easily run on a
`common personal computer and a webcam. Our
`program just needs an image resolution of 320 by 240
`pixels and minimum face size of 80 by 80 pixels. We
`have an intuition that the feature around the mouth
`right corner and left corner would have optical flow
`vectors pointing up and outward. The feature which has
`the most significant flow vector is right on the corner.
`Meanwhile, we can support a small head rotation and
`user’s moving toward and backward from camera. In
`the future, we would try to update our mouth pattern
`such that we can support larger head rotation and face
`size scaling.
`
`
`REFERENCES
`
`
`[1] J. Y. Bouguet, “Pyramidal Implementation of the Lucas
`Kanade Feature Tracker Description of the Algorithm,”
`http://robots.stanford.edu/cs223b04/algo_tracking.pdf,
`2009.
`[2] G. R. Bradski, “Computer Vision Face Tracking for Use
`in a Perceptual User Interface,” Intel Technology Journal,
`Vol. 2, No. 2, pp. 1-15, 1998.