`
`Xiaoguang Lu
`Dept. of Computer Science & Engineering
`Michigan State University, East Lansing, MI, 48824
`Email: lvxiaogu@cse.msu.edu
`
`Abstract
`
`In recent years face recognition has received substantial attention from both research com-
`
`munities and the market, but still remained very challenging in real applications. A lot of face
`
`recognition algorithms, along with their modifications, have been developed during the past
`
`decades. A number of typical algorithms are presented, being categorized into appearance-
`
`based and model-based schemes. For appearance-based methods, three linear subspace analysis
`
`schemes are presented, and several non-linear manifold analysis approaches for face recognition
`
`are briefly described. The model-based approaches are introduced, including Elastic Bunch
`
`Graph matching, Active Appearance Model and 3D Morphable Model methods. A number
`
`of face databases available in the public domain and several published performance evaluation
`
`results are digested. Future research directions based on the current recognition results are
`
`pointed out.
`
`1
`
`Introduction
`
`In recent years face recognition has received substantial attention from researchers in biometrics,
`pattern recognition, and computer vision communities [1][2][3][4]. The machine learning and com-
`puter graphics communities are also increasingly involved in face recognition. This common interest
`among researchers working in diverse fields is motivated by our remarkable ability to recognize peo-
`ple and the fact that human activity is a primary concern both in everyday life and in cyberspace.
`Besides, there are a large number of commercial, security, and forensic applications requiring the
`use of face recognition technologies. These applications include automated crowd surveillance, ac-
`cess control, mugshot identification (e.g., for issuing driver licenses), face reconstruction, design of
`human computer interface (HCI), multimedia communication (e.g., generation of synthetic faces),
`
`1
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 1 of 37
`
`
`
`and content-based image database management. A number of commercial face recognition systems
`have been deployed, such as Cognitec [5], Eyematic [6], Viisage [7], and Identix [8].
`Facial scan is an effective biometric attribute/indicator. Different biometric indicators are suited
`for different kinds of identification applications due to their variations in intrusiveness, accuracy,
`cost, and ease of sensing [9] (see Fig. 1(a)). Among the six biometric indicators considered in [10],
`facial features scored the highest compatibility, shown in Fig. 1(b), in a machine readable travel
`documents (MRTD) system based on a number of evaluation factors [10].
`
`(a)
`
`(b)
`
`Figure 1: Comparison of various biometric features: (a) based on zephyr analysis [9]; (b) based on
`MRTD compatibility [10].
`
`Global 2002 industry revenues of $601million are expected to reach $4.04billion by 2007 [9], driven
`by large-scale public sector biometric deployments, the emergence of transactional revenue models,
`and the adoption of standardized biometric infrastructures and data formats. Among emerging
`biometric technologies, facial recognition and middleware are projected to reach $200million and
`$215million, respectively, in annual revenues in 2005.
`Face recognition scenarios can be classified into two types, (i) face verification (or authentication)
`and (ii) face identification (or recognition).
`In the Face Recognition Vendor Test (FRVT) 2002
`[11], which was conducted by the National Institute of Standards and Technology (NIST), another
`scenario is added, called the ’watch list’.
`• Face verification (”Am I who I say I am?”) is a one-to-one match that compares a query
`
`2
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 2 of 37
`
`
`
`(a)
`
`(b)
`
`Figure 2: Face recognition market [9]. (a) Total biometric revenues 2002 - 2007. (b) Comparative
`market share by technology.
`
`face image against a template face image whose identity is being claimed. To evaluate the
`verification performance, the verification rate (the rate at which legitimate users are granted
`access) vs. false accept rate (the rate at which imposters are granted access) is plotted, called
`ROC curve. A good verification system should balance these two rates based on operational
`needs.
`• Face identification (”Who am I?”) is a one-to-many matching process that compares a query
`face image against all the template images in a face database to determine the identity of the
`query face (see Fig. 3). The identification of the test image is done by locating the image in
`the database who has the highest similarity with the test image. The identification process is
`a ”closed” test, which means the sensor takes an observation of an individual that is known to
`be in the database. The test subject’s (normalized) features are compared to the other features
`in the system’s database and a similarity score is found for each comparison. These similarity
`scores are then numerically ranked in a descending order. The percentage of times that the
`highest similarity score is the correct match for all individuals is referred to as the ”top match
`score.” If any of the top r similarity scores corresponds to the test subject, it is considered
`as a correct match in terms of the cumulative match. The percentage of times one of those
`r similarity scores is the correct match for all individuals is referred to as the ”Cumulative
`Match Score”,. The ”Cumulative Match Score” curve is the rank n versus percentage of correct
`identification, where rank n is the number of top similarity scores reported.
`
`3
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 3 of 37
`
`
`
`Figure 3: Face identification scenario.
`
`• The watch list(”Are you looking for me?”) method is an open-universe test. The test indi-
`vidual may or may not be in the system database. That person is compared to the others in
`the system’s database and a similarity score is reported for each comparison. These similarity
`scores are then numerically ranked so that the highest similarity score is first. If a similarity
`score is higher than a preset threshold, an alarm is raised. If an alarm is raised, the system
`thinks that the individual is located in the system’s database. There are two main items of
`interest for watch list applications. The first is the percentage of times the system raises the
`alarm and it correctly identifies a person on the watchlist. This is called the ”Detection and
`Identification Rate.” The second item of interest is the percentage of times the system raises
`the alarm for an individual that is not on the watchlist (database). This is called the ”False
`Alarm Rate.”
`
`In this report, all the experiments are conducted in the identification scenario.
`Human face image appearance has potentially very large intra-subject variations due to
`• 3D head pose
`
`4
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 4 of 37
`
`
`
`• Illumination (including indoor / outdoor)
`• Facial expression
`• Occlusion due to other objects or accessories (e.g., sunglasses, scarf, etc.)
`• Facial hair
`• Aging [12].
`
`On the other hand, the inter-subject variations are small due to the similarity of individual appear-
`ances. Fig. 4 gives examples of appearance variations of one subject. And Fig. 5 illustrates examples
`of appearance variations of different subjects. Currently, image-based face recognition techniques
`can be mainly categorized into two groups based on the face representation which they use: (i)
`appearance-based which uses holistic texture features; (ii) model-based which employ shape and
`texture of the face, along with 3D depth information.
`
`Figure 4: Appearance variations of the same subject under different lighting conditions and different
`facial expressions [13].
`
`A number of face recognition algorithms, along with their modifications, have been developed
`during the past several decades (see Fig. 6). In section 2, three leading linear subspace analysis
`schemes are presented, and several non-linear manifold analysis approaches for face recognition are
`briefly described. The model-based approaches are introduced in section 3, including Elastic Bunch
`Graph matching, Active Appearance Model and 3D Morphable Model methods. A number of face
`databases available in the public domain and several published performance evaluation results are
`provided in section 4. Concluding remarks and future research directions are summarized in section 5.
`
`5
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 5 of 37
`
`
`
`Figure 5:
`(a) and (b) are images from
`Inter-subject variations versus intra-subject variations.
`different subjects, but their appearance variations represented in the input space can be smaller
`than images from the same subject, b, c and d. These images are taken from from Yale database B.
`
`2 Appearance-based (View-based) face recognition
`
`Many approaches to object recognition and to computer graphics are based directly on images
`without the use of intermediate 3D models. Most of these techniques depend on a representation of
`images that induces a vector space structure and, in principle, requires dense correspondence.
`Appearance-based approaches represent an object in terms of several object views (raw intensity
`images). An image is considered as a high-dimensional vector, i.e., a point in a high-dimensional
`vector space. Many view-based approaches use statistical techniques to analyze the distribution of
`the object image vectors in the vector space, and derive an efficient and effective representation
`(feature space) according to different applications. Given a test image, the similarity between the
`stored prototypes and the test view is then carried out in the feature space.
`
`6
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 6 of 37
`
`
`
`Figure 6: Face recognition methods covered in this report.
`
`This image vector representation allows the use of learning techniques for the analysis and for
`the synthesis of images. Face recognition can be treated as a space-searching problem combined
`with a machine-learning problem.
`
`2.1 Vector representation of images
`
`Image data can be represented as vectors, i.e., as points in a high dimensional vector space. For
`example, a p× q 2D image can be mapped to a vector x ∈ Rpq, by lexicographic ordering of the pixel
`elements (such as by concatenating each row or column of the image). Despite this high-dimensional
`embedding, the natural constraints of the physical world (and the imaging process) dictate that the
`data will, in fact, lie in a lower-dimensional (though possibly disjoint) manifold. The primary goal
`of the subspace analysis is to identify, represent, and parameterize this manifold in accordance with
`some optimality criteria.
`Let X = (x1, x2, . . . , xi, . . . , xN ) represent the n × N data matrix, where each xi is a face vector
`of dimension n, concatenated from a p × q face image, where n = p × q. Here n represents the total
`number of pixels in the face image and N is the number of different face images in the training set.
`The mean vector of the training images µ =
`i=1 xi is subtracted from each image vector.
`
`(cid:80)N
`
`7
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 7 of 37
`
`
`
`2.2 Linear (subspace) Analysis
`
`Three classical linear appearance-based classifiers, PCA [14], ICA [15] and LDA [16][17] are intro-
`duced in the following. Each classifier has its own representation (basis vectors) of a high dimensional
`face vector space based on different statistical viewpoints. By projecting the face vector to the basis
`vectors, the projection coefficients are used as the feature representation of each face image. The
`matching score between the test face image and the training prototype is calculated (e.g., as the
`cosine value of the angle) between their coefficients vectors. The larger the matching score, the
`better the match.
`All the three representations can be considered as a linear transformation from the original image
`vector to a projection feature vector, i.e.
`
`Y = W T X,
`
`(1)
`
`where Y is the d × N feature vector matrix, d is the dimension of the feature vector, and W is the
`transformation matrix. Note that d << n.
`
`2.2.1 PCA
`
`The Eigenface algorithm uses the Principal Component Analysis (PCA) for dimensionality reduction
`to find the vectors which best account for the distribution of face images within the entire image
`space [14]. These vectors define the subspace of face images and the subspace is called face space.
`All faces in the training set are projected onto the face space to find a set of weights that describes
`the contribution of each vector in the face space. To identify a test image, it requires the projection
`of the test image onto the face space to obtain the corresponding set of weights. By comparing the
`weights of the test image with the set of weights of the faces in the training set, the face in the test
`image can be identified.
`If the image
`The key procedure in PCA is based on Karhumen-Loeve transformation [18].
`elements are considered to be random variables, the image may be seen as a sample of a stochastic
`process. The Principal Component Analysis basis vectors are defined as the eigenvectors of the
`scatter matrix ST ,
`
`N(cid:88)
`
`ST =
`
`(xi − µ)(xi − µ)T .
`
`(2)
`
`i=1
`The transformation matrix WP CA is composed of the eigenvectors corresponding to the d largest
`eigenvalues. A 2D example of PCA is demonstrated in Fig. 7. The eigenvectors (a.k.a. eigenface)
`
`8
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 8 of 37
`
`
`
`Figure 7: Principal components (PC) of a two-dimensional set of points. The first principal compo-
`nent provides an optimal linear dimension reduction from 2D to 1D, in the sense of the mean square
`error.
`
`corresponding to the 7 largest eigenvalues, derived from ORL face database [19], are shown in
`Fig. 9. The corresponding average face is given in Fig. 8. ORL face samples are provided in Fig. 26.
`After applying the projection, the input vector (face) in an n-dimensional space is reduced to a
`feature vector in a d-dimensional subspace. Also the eigenvectors corresponding to the 7 smallest
`eigenvalues are provided in Fig. 10. For most applications, these eigenvectors corresponding to very
`small eigenvalues are considered as noise, and not taken into account during identification. Several
`extensions of PCA are developed, such as modular eigenspaces [20] and probabilistic subspaces [21].
`
`Figure 8: The average face (derived from the ORL face database [19]).
`
`2.2.2 ICA
`
`Independent Component Analysis (ICA) [22] is similar to PCA except that the distribution of the
`components are designed to be non-Gaussian. Maximizing non-Gaussianity promotes statistical
`
`9
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 9 of 37
`
`
`
`Figure 9: Eigenvectors corresponding to the 7 largest eigenvalues, shown as p × p images, where
`p × p = n (derived from the ORL face database [19]).
`
`Figure 10: Eigenvectors corresponding to the 7 smallest eigenvalues, shown as p × p images, where
`p × p = n (derived from the ORL face database [19]).
`
`independence. Figure 11 presents the different feature extraction properties between PCA and ICA.
`Bartlett et al. [15] provided two architectures based on Independent Component Analysis, sta-
`tistically independent basis images and a factorial code representation, for the face recognition task.
`The ICA separates the high-order moments of the input in addition to the second-order moments
`utilized in PCA. Both the architectures lead to a similar performance. The obtained basis vectors
`based on fast fixed-point algorithm [24] for the ICA factorial code representation are illustrated in
`Fig. 12. There is no special order imposed on the ICA basis vectors.
`
`2.2.3 LDA
`
`Both PCA and ICA construct the face space without using the face class (category) information.
`The whole face training data is taken as a whole. In LDA the goal is to find an efficient or interesting
`way to represent the face vector space. But exploiting the class information can be helpful to the
`identification tasks, see Fig. 13 for an example.
`The Fisherface algorithm [16] is derived from the Fisher Linear Discriminant (FLD), which uses
`class specific information. By defining different classes with different statistics, the images in the
`learning set are divided into the corresponding classes. Then, techniques similar to those used
`in Eigenface algorithm are applied. The Fisherface algorithm results in a higher accuracy rate in
`
`10
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 10 of 37
`
`
`
`Figure 11: Top: Example 3D data distribution and the corresponding principal component and
`independent component axes. Each axis is a direction found by PCA or ICA. Note the PC axes are
`orthogonal while the IC axes are not. If only 2 components are allowed, ICA chooses a different
`subspace than PCA. Bottom left: Distribution of the first PCA coordinate of the data. Bottom
`right: distribution of the first ICA coordinate of the data [23]. For this example, ICA tends to
`extract more intrinsic structure of the original data clusters.
`
`Figure 12: ICA basis vectors shown as p × p images; there is no special order for ICA basis vectors
`(derived from the ORL face database [19], based on the second architechture [23].). The ICA code
`package to compute these ICA is downloaded from http://www.cis.hut.fi/projects/ica/fastica/.
`
`11
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 11 of 37
`
`
`
`Figure 13: A comparison of principal component analysis (PCA) and Fisher’s linear discriminant
`(FLD) for a two class problem where data for each class lies near a linear subspace [16]. It shows
`that FLD is better than PCA in the sense of discriminating the two classes.
`
`recognizing faces when compared with Eigenface algorithm.
`The Linear Discriminant Analysis finds a transform WLDA, such that
`
`WLDA = arg max
`W
`
`W T SBW
`W T SW W
`
`,
`
`(3)
`
`where SB is the between-class scatter matrix and SW is the within-class scatter matrix, defined as
`
`c(cid:88)
`
`SB =
`
`Ni(µi − µ)(µi − µ)T ,
`
`(4)
`
`i=1
`
`12
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 12 of 37
`
`
`
`c(cid:88)
`
`(cid:88)
`
`SW =
`
`xk∈Xi
`i=1
`In the above expression, Ni is the number of training samples in class i, c is the number of distinct
`classes, µi is the mean vector of samples belonging to class i and Xi represents the set of samples
`belonging to class i. The LDA basis vectors are demonstrated in Fig. 14.
`
`(xk − µi)(xk − µi)T .
`
`(5)
`
`Figure 14: First seven LDA basis vectors shown as p×p images (derived from the ORL face database
`[19]).
`
`2.3 Non-linear (manifold) Analysis
`
`The face manifold is more complicated than linear models. Linear subspace analysis is an approx-
`imation of this non-linear manifold. Direct non-linear manifold modeling schemes are explored to
`learn this non-linear manifold. In the following subsection, the kernel principal component analysis
`(KPCA) is introduced and several other manifold learning algorithms are also listed.
`
`2.4 Kernel PCA
`
`The kernel PCA [25] is to apply a nonlinear mapping from the input space RM to the feature space
`RL, denoted by Ψ(x), where L is larger than M. This mapping is made implicit by the use of kernel
`functions satisfying the Mercer’s theorem
`
`k(xi, xj) = Ψ(xi) · Ψ(xj).
`
`(6)
`
`where kernel functions k(xi, xj) in the input space correspond to inner-product in the higher di-
`mensional feature space. Because computing covariance is based on inner-products, performing a
`PCA in the feature space can be formulated with kernels in the input space without the explicit
`computation of Ψ(x). Suppose the covariance in the feature space is calculated as
`
`ΣK = < Ψ(xi)Ψ(xi)T > .
`
`(7)
`
`13
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 13 of 37
`
`
`
`(cid:80)N
`
`The corresponding eigen-problem is λV = ΣK V . It can be proved [25] that V can be expressed as
`i=1 wiΨ(xi), where N is the total number of training samples. The equivalent eigenvalue
`V =
`problem can be formulated in terms of kernels in the input space
`
`N λw = Kw,
`where w is a N-dimensional vector, K is a N × N matrix with Kij = k(xi, xj).
`The projection of a sample x onto the nth eigenvector V n can be calculated by
`
`N(cid:88)
`
`pn = (V n · Ψ(x)) =
`
`i k(xi, xj).
`wn
`
`(8)
`
`(9)
`
`Figure 15 gives an example of KPCA.
`
`i=1
`
`Figure 15: Contour plots of the first six principal component projections. Each contour contains
`the same projection values onto the corresponding eigenvectors. Data is generated by 3 Gaussian
`clusters. A RBF kernel is used. The corresponding eigenvalues are given above each subplot. Notice
`that the first three components have the potential to extract the individual clusters [25].
`
`Similar to traditional PCA, the projection coefficients are used as features for classification.
`Yang [26] explored the use of KPCA for the face recognition problem. Unlike traditional PCA,
`KPCA can use more eigenvector projections than the input dimensionality. But a suitable kernel
`and correspondent parameters can only be decided empirically.
`
`14
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 14 of 37
`
`
`
`Manifold learning has attracted a lot of effort in the machine learning community recently.
`ISOMAP [27] and LLE [28] have been proposed to learn the non-linear manifold, where the learned
`manifold have been shown for some simple face images. Yang [29] applied LDA to the face recognition
`with the feature of the geodesic distance, which is the basis of the ISOMAP. These manifold learning
`algorithms are interesting, but further exploration is needed to demonstrate their performance in
`the face recognition in real applications.
`
`2.5 Small Sample Size
`
`In real applications, current appearance-based face recognition systems encounter difficulties due to
`the small number of available training face images and complex facial variations during the test.
`Human face appearances have a lot of variations resulting from varying lighting conditions, different
`head poses and facial expressions.
`In real-world situations, only a small number of samples for
`each subject are available for training. If a sufficient amount of enough representative data is not
`available, Martinez and Kak [30] have shown that the switch from nondiscriminant techniques (e.g.,
`PCA) to discriminant approaches (e.g., LDA) is not always warranted and may sometimes lead to
`poor system design when small and nonrepresentative training data sets are used. Figure 16 gives
`an example.
`Therefore, face synthesis, where additional training samples can be generated, is helpful to en-
`hance the face recognition systems [31][32].
`
`3 Model-based face recognition
`
`The model-based face recognition scheme is aimed at constructing a model of the human face,
`which is able to capture the facial variations. The prior knowledge of human face is highly utilized
`to design the model. For example, feature-based matching derives distance and relative position
`features from the placement of internal facial elements (e.g., eyes, etc.). Kanade [33] developed one
`of the earliest face recognition algorithms based on automatic feature detection. By localizing the
`corners of the eyes, nostrils, etc.
`in frontal views, his system computed parameters for each face,
`which were compared (using a Euclidean metric) against the parameters of known faces. A more
`recent feature-based system, based on elastic bunch graph matching, was developed by Wiskott et
`al.
`[34] as an extension to their original graph matching system [35]. By integrating both shape
`
`15
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 15 of 37
`
`
`
`Figure 16: Suppose there are two different classes embedded in two different ”Gaussian-like” distri-
`butions. However, only two sample per class are supplied to the learning procedure (PCA or LDA).
`The classification result of the PCA procedure (using only the first eigenvector) is more desirable
`than the result of the LDA. DP CA and DLDA represent the decision thresholds obtained by using
`the nearest-neighbor classification [30].
`
`[36][37] developed a 2D morphable face model, through which the face
`and texture, Cootes et al.
`variations are learned. A more advanced 3D morphable face model is explored to capture the true
`3D structure of human face surface. Both morphable model methods come under the framework of
`’interpretation through synthesis’.
`The model-based scheme usually contains three steps: 1) Constructing the model; 2) Fitting the
`model to the given face image; 3) Using the parameters of the fitted model as the feature vector to
`calculate the similarity between the query face and prototype faces in the database to perform the
`recognition.
`
`3.1 Feature-based Elastic Bunch Graph Matching
`
`3.1.1 Bunch Graph
`
`All human faces share a similar topological structure. Wiskott et al. present a general in-class
`recognition method for classifying members of a known class of objects. Faces are represented as
`
`16
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 16 of 37
`
`
`
`graphs, with nodes positioned at fiducial points (such as the eyes, the tip of the nose, some contour
`points, etc.; see Fig. 17), and edges labeled with 2-D distance vectors.
`
`Figure 17: Multiview faces overlaid with labeled graphs [34].
`
`Each node contains a set of 40 complex Gabor wavelet coefficients, including both phase and
`magnitude, known as a jet (shown in Fig. 18). Wavelet coefficients are extracted using a family of
`Gabor kernels with 5 different spatial frequencies and 8 orientations; all kernels are normalized to
`be of zero mean.
`
`Figure 18: Jet [35].
`
`Face recognition is based on labeled graphs. A labeled graph is a set of nodes connected by
`edges; nodes are labeled with jets; edges are labeled with distances. Thus, the geometry of an object
`is encoded by the edges while the gray value distribution is patch-wise encoded by the nodes (jets).
`An example is shown in Fig. 19.
`
`17
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 17 of 37
`
`
`
`Figure 19: Labeled graph [35].
`
`While individual faces can be represented by simple labeled graphs, a face class requires a more
`comprehensive representation in order to account for all kinds of variations within the class. The
`Face Bunch Graph has a stack-like structure that combines graphs of individual sample faces, as
`demonstrated in Fig. 20. It is crucial that the individual graphs all have the same structure and
`that the nodes refer to the same fiducial points. All jets referring to the same fiducial point, e.g., all
`left-eye jets, are bundled together in a bunch, from which one can select any jet as an alternative
`description. The left-eye bunch might contain a male eye, a female eye, both closed or open, etc.
`Each fiducial point is represented by such a set of alternatives and from each bunch any jet can be
`selected independently of the jets selected from the other bunches. This provides full combinatorial
`power of this representation even if it is constituted only from a few graphs.
`
`3.1.2 Elastic Graph Matching
`
`To identify a new face, the face graph is positioned on the face image using elastic bunch graph
`matching. The goal of Elastic graph matching is to find the fiducial points on a query image and
`thus to extract from the image a graph which maximizes the graph similarity function. This is
`performed automatically if the face bunch graph (FBG) is appropriately initialized. A face bunch
`graph (FBG) consists of a collection of individual face model graphs combined into a stack-like
`
`18
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 18 of 37
`
`
`
`Figure 20: The left figure shows a sketch of a face bunch graph [34]. Each of the nine
`nodes is labeled with a bunch of six jets. From each bunch, one particular jet has been se-
`lected,
`indicated as gray. The actual selection depends on the situation, e.g., the face onto
`which the face bunch graph is matched. Though constructed from six sample faces only, this
`bunch graph can potentially represent 69 = 10, 077, 696 different faces. The right figure shows
`the same concept interpreted slightly differently by Tullio Pericoli (“Unfinished Portrait” 1985)
`[http://www.cnl.salk.edu/simwiskott/Projects/BunchGraph.html].
`
`structure, in which each node contains the jets of all previously initialized faces from the database.
`To position the grid on a new face, the graph similarity between the image graph and the existing
`FBG is maximized. Graph similarity is defined as the average of the best possible match between
`the new image and any face stored within the FBG minus a topographical term (see Eq. 11), which
`accounts for distortion between the image grid and the FBG. Let Sφ be the similarity between two
`jets, defined as
`
`(cid:80)
`(cid:80)
`j cos(φj − φ(cid:48)j − (cid:126)d(cid:126)kj)
`
`j aja(cid:48)
`j a(cid:48)2
`j a2
`j
`where aj and φj are magnitude and phase of the Gabor coefficients in the jth jet, respectively; (cid:126)d is
`the displacement between locations of the two jets; (cid:126)kj determines the wavelength and orientation
`of the Gabor wavelet kernels [35]. For an image graph GI with nodes n = 1, . . . , N and edges
`
`,
`
`j
`
`(10)
`
`Sφ(J, J(cid:48)) =
`
`(cid:113)(cid:80)
`
`19
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 19 of 37
`
`
`
`,
`
`(11)
`
`(cid:88) e
`
`max Sφ(J I
`n, J Bm
`n
`
`(cid:88) n
`1 N
`
`SB(GI , B) =
`
`e = 1, . . . , E and an FBG B with model graphs m = 1, . . . , M, the graph similarity is defined as
`e − ∆(cid:126)xB
`(∆(cid:126)xI
`e )2
`) − λ
`(∆(cid:126)xB
`e )2
`E
`where λ determines the relative importance of jets and metric structure, Jn is the jets at nodes n, and
`∆(cid:126)xe is the distance vector used as labels at edges e. After the grid has been positioned on the new
`face, the face is identified by comparing the similarity between that face and every face stored in the
`FBG. Graphs can be easily translated, rotated, scaled, and elastically deformed, thus compensating
`for the variance in face images which is commonly encountered in a recognition process.
`
`3.2 AAM - A 2D Morphable Model
`
`An Active Appearance Model (AAM) is an integrated statistical model which combines a model of
`shape variation with a model of the appearance variations in a shape-normalized frame. An AAM
`contains a statistical model of the shape and gray-level appearance of the object of interest which
`can generalize to almost any valid example. Matching to an image involves finding model parameters
`which minimize the difference between the image and a synthesized model example, projected onto
`the image. The potentially large number of parameters makes this a difficult problem.
`
`3.2.1 AAM Construction
`
`The AAM is constructed based on a training set of labeled images, where landmark points are
`marked on each example face at key positions to outline the main features (shown in Fig. 21).
`
`Figure 21: The training image is split into shape and shape-normalized texture [38].
`
`The shape of a face is represented by a vector consisting of the positions of the landmarks,
`s = (x1, y1, . . . , xn, yn)T , where (xj, yj) denotes the 2D image coordinate of the jth landmark
`
`20
`
`exocad Ex. 1024
`exocad v. 3Shape, IPR2018-00788
`
`Page 20 of 37
`
`
`
`point. All shape vectors of faces are normalized into a common coordinate system. The principal
`component analysis is applied to this set of shape vectors to construct the face shape model, denoted
`as: s = ¯s + Psbs, where s is a shape vector, ¯s is the mean shape, Ps is a set of orthogonal modes of
`shape variation and bs is a set of shape parameters.
`In order to construct the appearance model, the example image is warped to make the control
`points match the mean shape. Then the warped image region covered by the mean shape is sampled
`to extract the gray level intensity (texture) information. Similar to the shape model construction,
`a vector is generated as the representation, g = (I1, . . . , Im)T ,where Ij denotes the intensity of the
`sampled pixel in the warped image. PCA is also applied to construct a linear model g = ¯g + Pgbg ,
`where ¯g is the mean appearance vector, Pg is a set of orthogonal modes of gray-level variation and
`bg is a set of gray-level model parameters.
`Thus, all shape and texture of any example face can be summarized by the vectors bs and bg.
`The combined model is the concatenated version of bs and bg, denoted as follows:
`s (s − ¯s)
`g (g − ¯g)
`P T
`where Ws is a diagonal matrix of weights for each shape parameter, allowing for the difference in
`units between the shape and gray scale models. PCA is applied to b also, b = Qc , where c is the
`vector of parameters for the combined model.
`The model was built based on 400 face images, each with 122 landmark points [37]. A shape
`model with 23 parameters, a shape-normalized texture model with 113 parameters and a combined
`appearance model with 80 parameters (containing 98% variations of the observation) are g