`6,112,195
`(11) Patent Number:
`p15
`United States Patent
`
`Burges
`[45] Date of Patent:
`*Aug. 29, 2000
`
`[54] ELIMINATING INVARIANCES BY
`NEEROCESSING FOR KERNEL-BASED
`
`[75]
`
`Inventor: Christopher John Burges, Freehold,
`N.J.
`
`[*] Notice:
`
`[73] Assignee: Lucent Technologies Inc., Murray Ilill,
`“
`This patent issued on a continued pros-
`ecution application filed under 37 CFR
`1.53(d), and is subject to the twenty year
`patent
`term provisions of 35 U.S.C.
`154(a)(2).
`
`A. Khotanzad and J.-H. Lu, “Classification of Invariant
`Image Representations Using a Neural Network,” [EEE
`Trans. on Accoustics, Speech, and Signal Processing,vol. 38
`(6), pp. 1028-1038, Jun. 1990.
`
`J. Minnix, et al., Multistaged Self-Organizing Neural
`Network with Biologically Inspired Preprocessing Features
`for Rotation and Scale Invariant Pattern Recognition, vol. 3,
`pp. 1605-1610, Oct. 1991.
`
`S.J. Perantonis and P.J.G. Lisboa, “Translation, Rotation,
`and Scale Invariant Pattern Recognition by High—Order
`Neural Networks and Moment Classifiers,” [EEE Trans. on
`Neural Networks, vol. 3(2), pp. 241-251, Mar. 1992.
`
`C. Yuceer and K. Offazer, “A Rotation, Scaling and ‘Trans-
`lation Invariant Pattern Classification System,” Proc. 11th
`[21] Appl. No.: 08/825,287
`IAPR Int’l Conf. on Pattern Recognition, vol.
`II, pp.
`[22]
`Filed:
`Mar. 27, 1997
`[SL] Unt. Ch?cece GO6F 1/035; GO6E 5/00«=422-425, Sep, 1992.
`[52] U.S. C1. cece
`esssssessessessesssseessessesseceseess 706/20
`.
`e
`.
`[58] Field of Search occu 382/159, 181,
`Slarzyk and S. Chai,
`“Vector Contour Representation
`
`382/276; 706/20
`for Object Recognition in Neural Networks,”
`1992 [EEE
`Int’] Conf. on Systems, Man and Cybernetics, vol. 1, pp.
`399-404, Oct. 1992.
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`6/1989 Ochoa et al.
`4,838,644
`ciecccccccsssssscsseeeseseeee: 359/9
`5,054,094 10/1991) Barski ee. eee eseeenee cena 382/192
`. 382/157
`5,263,107
`11/1993 Ueda et al.
`.
`
`
`3/1998 Rozmus ......
`.. 706/20
`5,729,662
`
`7/1999 Cheneyetal.
`» 382/217
`5,926,568
`9/1999 Vapnik ............
`. T2153
`5,950,146
`9/1999 Greenspanet al. cvvsscssssssseee 382/249
`5,956,427
`
`5,969,524 10/1999 Pierpaoli et al. cee 324/307
`OTHER PUBLICATIONS
`
`(List continued on next page.)
`
`Primary Examiner—Tarig R. Hafiz
`Assistant Examiner—Michael Pender
`.
`-Altorney, Agent, or Firm—Joseph J. Opalach
`
`[57]
`
`ABSTRACT
`
`Gilbert Strang, Linear Algebra and Its Applications, Third
`edition 341-42, 1988.
`Microsoft Press, Computer Dictionary, Third Edition, 272,
`375, 1997.
`RP. Lippmann, “Pattern Classification Using Neural Net-
`works,” IEEE Communications Magazine, vol. 27 (11), pp.
`47-64, Nov. 1989.
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some symmetry transforma-
`tion.
`
`14 Claims, 2 Drawing Sheets
`
`
`
`KERNEL-BASED MACHINE
`LEARNING SYSTEM
`
`
`
`125
`
`Page 1 of 11
`
`SAMSUNG EXHIBIT1031
`
`Page 1 of 11
`
`SAMSUNG EXHIBIT 1031
`
`
`
`6,112,195
`
`Page 2
`
`OTHER PUBLICATIONS
`
`Y. Onodera, et al., “Translation and Rotation—Invariant
`Pattern Recognition Method Using Neural Network with
`Back—Propagation,” Communications on the Move, ICCS/
`ISITA °92, vol. 2, pp. 548-552, Nov. 1992.
`PY. Simard,et al., “Memory—Based Character Recognition
`Using a Transformation Invariant Metric,” Proc. 12th Int’l
`Conf. on Pattern Recognition, pp. 262-267, Oct. 1994.
`QS. Chen, et al., “Symmetric Phase—Only Matched Fuil-
`tering of Fourier—Mellin Transforms for Image Registration
`and Recognition,” ILEE Trans. on Pattern Analysis and
`Machine Intelligence, vol. 16(12), pp. 1156-1168, Dec.
`1994.
`
`J.'Yanomaru and A. Inubushi, “A Compact Representation of
`Binary Patterns for Invarinat Recognition,” 1995 IEEE Int’1
`Conf. on Systems, Man and Cybernetics, vol. 2, pp.
`1550-1555, Oct. 1995.
`E.A. Heredia and G.R. Arce, “Piecewise Polynomial Kernel
`Networks,” Proc.
`Int’l. Conf. on Neural Networks, pp.
`1558-1563, Jun. 1996.
`C.J. C. Burgeset al., “Improving the Accuracy and speed of
`support vector machines”, Advances in Neural Information
`Processing Systems 9. Proceedings of the 1996 Conference,
`Advances in Neural Information Processing Systems 9,
`Proceedings of the 1996 Conference, Denver, CO, USA,
`Dee. 2-5, 1996, pp. 375-381, XP002088166,
`ISBN
`0-262-10065-7, 1997, London, UK, MIT Press, UK, *the
`whole document* particularly relevant if taken alone.
`
`P. Y. Simardet al., “Memory—Based Character Recognition
`Using a Transformation Invariant Mctric”, Proccedings of
`the IAPR International Conference on Pattern Recognition,
`Jerusalem, Oct. 9-13, 1994 Conference B: Pattern Recog-
`nition and Neural Networks, vol. 2, No. Conf. 12, Oct. 9,
`1994, pp. 262-267, XP000509894, Institute of Electrical
`and Electronics Engineers, *Abstract* technological back-
`ground.
`Hidefumi Sawai, “Axially Symmetric Neural Network
`Architecture for Rotation—Invariant Patter Recognition’,
`International Conference on Neural Networks/World Con-
`gress on Computational Intelliges, Orlando, Jun. 27-19,
`1994, vol.7, Jun. 27, 1994, pp. 4253-4258, XP000531357,
`Institute of Electrical and Electronics Engineers, *Abstract*.
`Mas J. et al: Invariant Perception for Symmetry Related
`Patterns, International Joint Conference on Neural Networks
`(IJCNN), Washington, Jun. 19-22, 1989., vol. 1, Jun. 19,
`1989, pp. 85-89, XP000088251, Institute of Electrical and
`Electronic Engineers—* Abstract*.
`“Extracting Support Data For A Given Task,” B. Scholkopf,
`etal., Eds. Proceedings, International Conference on Knowl-
`edge Discovery & Data Mining, AAAI Press, Menlo Park,
`CA 1995.
`
`“Incorporating Invariances in Support Vector Learning
`Machines”, B. Schoklopfct al., Proceedings ICANN’96—
`International Conference on Artificial Neural Networks,
`Springer Verlag, Berlin 1996.
`
`Page 2 of 11
`
`Page 2 of 11
`
`
`
`U.S. Patent
`
`Aug. 29, 2000
`
`Sheet 1 of 2
`
`6,112,195
`
`Fic.
`
`
`
`INPUT DATA (DIMENSION n)
`
`
`PREPROCESS THE DATA,
`
`(IMPOSE LOCAL SYMMETRIES ON THE DATA, GENERATE
`PROCESSED DATA OF DIMENSION m, WHERE m < n)
`
`
`
`
`
`
`7
`
`OPERATE ON THE PREPROCESSED DATA
`BY APPLYING A KERNEL-BASED METHOD
`
`
`
`
`
`END
`
`FIG. 2
`
`STEP 10
`
`STEP 20
`
`STEP 30
`
`
`CO ~
`TT
`KERNEL-BASED MACHINE
`LEARNING SYSTEM
`PREPROCESSOR
`
`T4
`
`IMAGE
`
`
`
`|
`
`
`
`120
`
`Z
`
`125 eewo
`
`PROCESSOR
`
`105
`
`100
`
`Page 3 of 11
`
`Page 3 of 11
`
`
`
`U.S. Patent
`
`Aug. 29, 2000
`
`Sheet 2 of 2
`
`6,112,195
`
`STEP 305
`
`STEP 310
`
`STEP 315
`
`
`
`
`
`FIG. 8
`
`
`
`GENERATE INPUT DATA
`
`PREPROCESS THE INPUT DATA
`
`DETERMINE THE PARAMETERS OF KERNEL-BASED
`CLASSIFICATION
`
`
`
`FIG. 4
`
`
`
`
`
`
`
`GENERATE INPUT DATA
`
`PREPROCESS THE INPUT DATA
`
`
`
`
`
`
`STEP 405
`
`STEP 410
`
`STEP 415
`
`Page 4 of 11
`
`OPERATE ON THE PREPROCESSED DATA BY APPLYING
`THE KERNEL-BASED METHOD TRAINING COEFFICIENTS
`DETERMINED DURING THE TRAINING PHASE OF FIG. 2
`
`
`
`
`
`
`
`Page 4 of 11
`
`
`
`6,112,195
`
`1
`ELIMINATING INVARIANCES BY
`PREPROCESSING FOR KERNEL-BASED
`METHODS
`
`2
`the SVM to operate more efficiently in terms of, e.g.,
`memory size, and training time, yet classify more patterns
`than in the prior art for a given-size SVM.
`
`FIELD OF THE INVENTION
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 showsanillustrative flow chart in accordance with
`
`the principles of the invention;
`FIG. 2 showsa block diagram ofa portion of a recognition
`system embodying the principles of the invention;
`FIG. 3 shows an illustrative method for training the
`system of FIG. 2 in accordance with the principles of the
`invention; and
`FIG. 4 shown an illustrative method for operating the
`system of FIG. 2 in accordance with the principles of the
`invention.
`
`DETAILED DESCRIPTION
`
`Before describing an illustrative embodiment of the
`invention, the inventive conceptitself is described. (Other
`than the inventive concept, it is assumed that the reader is
`familiar with mathematical notation used to generally rep-
`resent kernel-based methods as knownintheart.) Also, the
`inventive conceptis illustratively described in the context of
`pattern recognition. However, the inventive conceptis appli-
`cable to all kernel-based methods. Some examples of the
`classes of problems covered by kernel-based methods are:
`regression estimates, density estimation, etc.
`Introduction
`As used herein, “kernel-based methods” means methods
`which approximate an unknown function G(x) by F(x),
`where F(x) has the form:
`
`Fa)= » QgK (Dg. X) +b, ag, DE Rix eR", Dg € R”
`q
`
`(b
`
`10
`
`i}wn
`
`°
`
`ey}wn
`
`40
`
`and where a,, b, and P,, are parameters that are to be
`determined from empirical data by a training procedure, and
`K is a kernel function, whose form is usually chosen in
`advance. Additive models (e.g., see T. J. Hastie and R. J.
`Tibshirani, Generalized Additive Models, Chapman and
`Hall, 1st edition, 1990), Radial Basis Functions (¢.g., see M.
`J.D. Powell, Radial basis functions for multivariable inter-
`mn
`5 polation: A review, In Algorithms for Approximation, J. C.
`Mason and M. G. Cox (Eds.), pages 143-167, Oxford
`Clarendon Press, 1987; F. Girosi, M. Jones, and T. Poggio,
`Regularization theory and neural networks architectures,
`Neural Computation, 7(2):219-269, 1995; and C. M.
`Bishop, Neural Networks for Pattern Recognition, Claren-
`don Press, Oxford, 1995; and Support Vector Machines(e.g.,
`see C. Cortes and V. Vapnik, Support vector networks,
`Machine Learning, 20:273-297, 1995; and V. Vapnik, The
`Nature ofStatistical Learning Theory, Springer Verlag, New
`York 1995) are examples of such methods. Pattern
`recognition, regression estimation, density estimation, and
`operator inversion are examples of problems tackled with
`these approaches (¢.g., see A. Smola, V. Vapnik, S.
`Golowich, Support vector method for
`function
`approximation, regression estimation, and signal processing,
`Advances in Neural Information Processing Systems, 9,
`1996). Thus for example in the density estimation case, x is
`a point (in a vector space) at which the probability density
`is required, and F(x) is the approximationto that density; for
`the classification case, x is a test pattern to be classified, and
`sgn(F(x)) gives the corresponding label. Similarly, for pat-
`tern recognition, an SVMisfirst trained to recognizea target
`
`60
`
`This invention relates generally to a class of problems
`falling within what is known as “kernel-based methods.”
`
`BACKGROUNDOF THE INVENTION
`
`Pattern recognition, regression estimates, density estima-
`tion are a few examples of a class of problems that are
`analyzed using kernel-based methods. The latter are illus-
`tratively described herein in the context of pattern recogni-
`tion. However, it should be noted that the inventive concept
`(described below)is not limited to pattern recognition and is
`applicable to kernel-based methods in general (of which
`support-vector-machines are an example).
`In pattern recognition,
`it
`is known in the art to use a
`recognizer having a support-vector-machine (SVM)archi-
`tecture. The SVM is viewed as mapping an input image onto
`a decision plane. The output of the SVM is typically a
`numerical result,
`the value of which is associated with
`whether, or not, the input image has been recognized as a
`particular type of image.
`As avery general example, consider a 16 pixel by 16 pixel
`imageof a tree. In this context, an SVM recognition system
`is first “trained” with a set of known imagesofa tree. Kor
`example, the SVM system could betrained on 1000different
`tree images, each image represented by 256 pixels.
`Subsequently, during operation, or testing, the SVM system
`classifies input
`images using the training data generated
`from the 1000 knowntree images. ‘The SVM system indi-
`cates classification of an input image as the desiredtree if,
`¢.g., the output, or result, of the SVM is a positive number.
`Unfortunately, in the above example, the recognizer may
`have to deal not only with a particular type of tree image, but
`also with translates of that tree image. For example,a tree
`imagethat is shifted in the vertical direction—butis still the
`same tree. To some extent this kind oftranslation can be
`
`dealt with by using tree imagesthat represent such a vertical
`shift. However, the SVM system isstill trained to predefined
`images, it’s just that some of these predefined images are
`used lo represent translations of the image (as opposed to,
`e.g., different types of trees).
`SUMMARYOF THE INVENTION
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some symmetry transforma-
`tion.
`
`In an embodimentof the invention, a pattern recognizer
`includes a preprocessor and a support vector machine
`(SVM). Thelatter is trained to recognize a particular set of
`images. The preprocessor operates on an input image in such
`a way as to provide local
`translation invariance.
`In
`particular, the preprocessor mapsa particular input image,
`and its translate, to two points in the decision plane of the
`SVM, whose difference is independentof the original data.
`As a result, the recognizer has built-in local invariance and
`does not require training the SVM to translated versions of
`the images.
`In accordance with a feature of the invention, the size of
`the preprocessed imageis less than the size of the original
`image. In other words, the SVM operates on less data than
`required in the prior art. Thus, the inventive concept enables
`
`Page 5 of 11
`
`Page 5 of 11
`
`
`
`6,112,195
`
`3
`image, as described earlier. During operation,ortesting, the
`SVMindicates classification of input data (representing an
`input image) according to equation(1). If the SVM classifies
`the input data as the target image, F(x)is, e.g., equal to a
`positive number.
`Clearly this approach is general. However the generality
`brings with it the problem of model selection, i.¢., which
`family of functions to choose for the kernels. While some
`work has been done on this (e.g., for the support vector
`pattern recognition case see B. Scholkopf, C. Burges, and V.
`Vapnik, Extracting support data for a given task, In U. M.
`Fayyad and R. Uthurusamy, editors, Proceedings, First
`International Conference on Knowledge Discovery & Data
`Mining, AAAI Press, Menlo Park, Calif., 1995), it remains
`a significant unsolved problem.
`In accordance with the inventive concept, the description
`below explores how imposing constraints arising from
`domain knowledge canrestrict the family of kernels. In the
`following, bold face is used for vector and matrix quantities,
`and light face for their components.
`Incorporating Local Invariances
`This section describes the general approach of the inven-
`tive concept. The objcctis to find those kernels in equation
`(1) suchthat, if any given test point x is transformed under
`a known symmetry of the problem,
`the result F is
`unchanged.
`Given two test points x and y, equation (1) defines an
`affine structure with distance function:
`
`pte, Y= D7 agKgs 2) — Kgs YY)
`q
`
`@
`
`(An affine structure is a mathematical term for a type of
`vector space as known in the art). For example, p can be
`thought of as counting contours between pattcrns in the
`pattern recognition case.
`In accordance with the inventive concept, it is preferable
`to choose a class of kernels so that p(x, y) is close to zero
`if y is a transform of x along some symmetrydirection. If
`y=x+dx, then
`
`dp= » agdlx'd; K(pq. X)
`Gt
`
`(3)
`
`where wedefine d==d/dx,. Requiring that this be zero forall
`a, gives:
`
`>) ad: Kpg, 9) = 0
`
`4)
`
`Note that for a particular problem, for which the a, are
`knownto satisfy certain constraints, equation (4) may be
`morerestrictive than is necessary to ensure that do=0, but in
`this work no such assumption about a, is made.
`We will consider transformations that take the general
`form:
`
`=x; taf, ae R'
`
`(5)
`
`for which, in the limit as a—0, equation (4) takes the
`form:
`
`4
`
`AG: K(pq, ¥) = OK(pq, ») = 0
`
`(6)
`
`which defines the operator 2. From now on the parameter
`vector p,
`in K is not explicitly written.
`Operator Invariance for the Linear Case
`The following two simple theorems are proven for the
`case in which the transformation represented by equation (5)
`is both linear and invertible.
`
`Theorem:Forlinear, invertible transformations (equation
`(5)), the operator 9 is itself invariant under the transfor-
`mation.
`Proof: Let U denote the unit matrix, “T” denote transpose,
`and 6 denote the vector with components 6=0/0x,. Denote
`the transformation by
`x'=(U+aMw.
`Then the operator J in equation (6) is given by
`a =x’M"a
`
`(7)
`
`and
`
`O' =x™M'G' = » Mj(U +0M)px (U + OM)piOn
`ik
`
`(8)
`
`=a’MTA=0
`
`Theorem:Forlinear, invertible transformations (equation
`(5)), denoting the argument of K( ) by its componentsx,, if
`K(x,) satisfies equation (6), then so does K(x,taf,(x)), for
`finite a.
`Proof:
`
`OK(x)=0-0 K(x,40F(2)=OK(x+0f(x)
`since @ is an invariant by the above theorem.
`Multiple Symmetries
`For a set of M symmetries, there will be M simultaneous
`equations of the form of equation (3), which will result in a
`system of M equations of the form
`
`@)
`
`40
`
`(10)
`8 (Xo, vey Xy)=0, m=1,..., M
`where {Q,,} is a sct of lincar differential operators, cach
`of which takes the general form
`
`Om = fil298i
`
`ay
`
`analogous to equation (6).
`integrals of
`The questions arise: when do non-trivial
`equation (10) exist, and how manysuch independent inte-
`grals are there? Following “Techniques in Partial Differen-
`tial Equations,” Clive R. Chester, McGraw-Hill, 1971, the
`following definitions are made:
`Definition: A system of operators {8,} is called complete
`if all commutators take the form
`
`60
`
`[0,0] = deine Cie € Rl
`k
`
`(Note that, from their definition, the c,, satisfy the known
`Jacobi identity, so they are in fact the structure constants of
`someLie algebra (e.g., see Peter J. Olver, Applications of Lie
`Groups to Differential Equations, Springer-verlag, 1986,
`New York).)
`
`Page 6 of 11
`
`Page 6 of 11
`
`
`
`6,112,195
`
`5
`Definition: A system of equations (10) is called complete
`if the corresponding operators form a complete set.
`Theorem: Any compleie system of r<n independent
`equations, of the form of equation (10), has exactly n-r
`independent integrals.
`Thus, non-trivial integrals will only exist if the number of
`operators r in the complete sct is less than n, and if so, there
`will be n-r of them. In the latter case, the general solution
`of the system of equations (10) has the form
`
`F(o(Xo. .
`
`-
`
`-
`
`tXpias ey bppaos +
`
`+
`
`++ Xp_a))
`
`(13)
`
`10
`
`where Uy, ..., U,,_,_; are the set of independent integrals
`of the system of equations (10) and F is any C’ (ie.
`differentiable) function (for reasons that will become clear
`below,
`the vector components are indexed starting with
`index 0). Now, the recipe for finding the complete set is
`clear: one generates new operators by computing commu-
`tators of the existing set of operators, and stops when either
`one has found a complete set, or when r>n.
`Note that this gives us capacity control exactly where we
`want it. For example, for a set of commuting symmetries
`(ie., all C,, vanish), the number of degrees of freedom in the
`problem is reduced by exactly the number of symmetries
`imposed.
`Thus, and in accordance with the inventive concept,
`generating an invariant kernel amounts to preprocessing the
`data, and then using any existing kernel. By imposing one
`symmetry only one degree of freedomis lost (e.g., in a
`pattern recognition problem, a 256 pixel image becomes a
`255 pixel, preprocessed image).
`Building Locally Invariant Kernels
`Given the gencral solution to the system of cquations (10),
`construction of a kernel function remains. It may not be
`possible to simply substitute any F in equation (13) for K in
`equation (1), since the set of functions K may have further
`constraints placed on them by the particular method being
`used, for example, their dependence on the paramcterset p,.
`However, such constraints are easily satisfied, since the only
`requirement on the F’s is that they are differentiable.
`For example, for support vectors machines, the kernels
`take the form K(s,, x), where the s, are the support vectors
`(which have dimension cqual to that of x). There are two
`constraints on the form the kernels may take: they must be
`symmetric, K(x, y) =K(y,x) Vx, y in GR”, and they must
`satisfy Mercer’s positivity constraint (e.g., see, B. E. Boser,
`I. M. Guyon, and V. Vapnik, Extracting support data for a
`given task, in U. M. Fayyad and R. Uthurusame, editors,
`Proceedings, First International Conference on Knowledge
`Discovery & Data Mining, AAAI Press, Menlo Park, Calif.,
`1995; and R. Courant and D. Hilbert, Methods of Math-
`ematical Physics, Interscience, 1953) namely
`
`Flavj=(u-v41Pu, vER"
`
`(15)
`
`Since such kernels are symmetric and satisfy Mercer’s
`condition, all the desired constraints are satisfied. What was
`a polynomial support vector machine has, in accordance
`with the inventive concept, become a polynomial support
`vector machine acting on preprocessed data. However, the
`functional form that the overall kernels take, when consid-
`ered as functions of the input data x, will no longer in
`gencral be polynomial.
`An Example: Vertical Translation Invariance
`In this section, a detailed example is provided in the
`context of pattern recognition using an illustrative example
`of vertical translation invariance with cyclic boundary con-
`ditions. A cyclic boundary condition simply meansthat each
`image wraps around,e.g., if a columnof pixels is “shifted
`up” one pixel, the top pixel “wraps-around”to the bottom.
`Consider an image of N rows and one column. Cyclic
`boundary conditions are imposed to simplify the analysis.
`Here, the local invariance is a shift of up to one pixel, Le.,
`cis between zero and one(larger shifts could be handled by
`extending the analysis below). In this case the transforma-
`tions take the form:
`
`x'a(l-a)etong,yf=0,... N=
`
`(16)
`
`wherethe pixel values x; are considered to be components
`of a vector x. (The pixel values are image dependent, e.g.,
`a binary image pixel has a value of one (dark) and zero
`(light). Note that here and belowthe following convention
`are adopted: all indices1, j, k, and sub-expressions involving
`them, are to be taken modulo n.
`Note that solving the N by 1 problem amountsto solving
`the general N by M problem. An N by M imageis converted
`to an NM by1 image bypasting the top of each column of
`pixels to the bottom of the previous column. To make the
`cyclic boundary conditions a realistic assumption, a border
`of one blank pixel is added to the top and bottom of each
`image.
`The Relation with Group Action
`translation
`We start by using the example of vertical
`invariance to make some general observations regarding the
`rather subtle relation of the transformations (5) to a group
`action. (As knownin the art, if a group action is proven then
`other mathematical transformations apply.) Since pixels are
`an approximation to a continuous function which can be
`considered as a representation of the group oftranslations,
`let us first make the connection between pixels and the
`continuous case. Let I(z) represent the original image field
`for which the N-pixel column is an approximation. Thus
`
`xel(i), i=0,...,N-1
`
`(17)
`
`40
`
`wn2
`
`[K@y@)gVidedy>OVgEL*, J(g(2))"dx>0
`
`(4)
`
`Translating by a distance a means replacing I(z) by
`I'(z)=I(z-a.). The newpixel values become
`
`The data are points in the vector space R”. Below, this
`spaceis referred to as] for “input.” (Call the vector space for
`the preprocessed data P for “preprocessed.”). Now, suppose
`that some number of symmetries have been imposed,result-
`ing in a complete set of size r<n, so that the number of
`independent integrals (and hence, the dimension of P) is n-r.
`Thus, the solutions have the general form of equation (13)
`above. We can then, in accordance with the principles of the
`invention, simply choose F to have the form of a kernel
`function which is knownto satisfy the constraints, but which
`takes n—-r arguments instead of n. As a specific example, we
`might take degree p polynomial kernels, for which the F will
`have the form
`
`xi-l\(D=Ki-a=1(')-a8.D()
`
`(18)
`
`60
`
`Approximating (6,1D)@) by I()-IG-1) then gives equation
`(16).
`The binning of the data into a vector has the consequence
`that equation (5), for finite a, is not necessarily a group
`action, although it does constitute the desired
`transformation, even for finite a. This point is illustrated by
`the following simple specific case, namely vertical transla-
`tion invariance for a column of three pixels, whose values
`are labeled by x, y, z. Then equation (16) for x becomes:
`
`x'=g,a—x(1-a)+0y
`
`(19)
`
`Page 7 of 11
`
`Page 7 of 11
`
`
`
`6,112,195
`
`7
`where g,, is the operator instantiating the transformation.
`Then
`
`(gp °Sox=x(1-a)(1-B)+y(a+P-2a8)+208
`
`(20)
`
`so there exists no y such that g,=2,°g.,. However,to first
`order in a, 8, the above transformations do form a group,
`with g.*=g_,,. ‘Thus, the action may be viewed as of a group
`only for infinitesimal values of the parameters (e.g., see the
`above-mentioned reference by Peter J. Olver). Despite this
`fact, the transformation represented by equation (19) does
`constitute a translation of the binned data for finite values of
`
`a (in fact for any (a€[0,1)).
`Butif the action is a group action for infinitesimal values
`of the parameters, then the corresponding differential opera-
`tors are necessarily generators for some Lie group. Indeed,
`one mayexplicitly compute the exponentiation of the gen-
`erator corresponding to equation (16) acting on a particular
`point, for example:
`
`elAlyartZ-yOVHE-DIZy yep (y—xA, + (x-Qy tOy +
`(y—2)h3 + (x + y— 2z)hg + (x — 2s + (-2x + y +2)
`
`(21)
`
`where the h, are functions of a@ alone. This only corre-
`spondsto the transformation (equation (19))to first order in
`a. To summarize, the transformation (equation (19)) coin-
`cides with that of a Lie group only for infinitesimal values
`of the parameters; for finite values, it is no longer a group,
`but it is still the desired transformation.
`
`A Simple Example: The 4 Pixel Stack
`To show how things work,
`the following is a simple
`example of an image whichconsists of a stack of 4 pixels.
`Equation (6) becomes:
`
`1 (G10)coto-XOx+W3—Xp)BXQt(Xy-X3OyU(X)=O
`
`(22)
`
`The general solution to thisis:
`
`F(Xo, 1, X2, 43) = Flug, 41, U2)
`where
`
`Up = Xo +x, t+ .XQ +3
`
`40
`
`(23)
`
`(24)
`
`uy = inf| (25)
`(X9 —%)? + (%3 — 4)"
`
`-
`1
`>
`2
`= aretan(— a } + sIn((xo — x2)" + (v3 — ¥1)")
`Xo — x
`2
`
`(26)
`
`and where I is any C*.
`This solution has two properties to which attention should
`be drawn.
`
`First, u,, and only ug, is “globally” invariant (invariant for
`anyof the allowed values of a in equation (16).
`Sccond, all three independent integrals have a property
`referred to herein as “additive invariance.”
`Additive Invariance
`Below,transformed coordinates are denoted with a prime:
`
`x exj(1-a)t+aj,a,2=0,...,3.
`
`60
`
`7)
`
`Definition: An “additive invariant” integral is defined to
`be one that has the property:
`
`u(x’)=ulx)t$(), f=0,...2
`
`(28)
`
`for some functions f,.
`
`Page 8 of 11
`
`8
`Denote by ® the mapping whichtakes points in I to points
`in P. Additive invariance is a desirable property of the
`mapping ®. It means that two points in I which are related
`by a vertical translation, map into two points in P, whose
`difference is independentof the original data, and depends
`only on the transformation parameter. Thus, and in accor-
`dance with the principles of the invention, to distinguish two
`translated images from cach other, the Icarning machine has
`only to learn a vector valued function of one parameter,
`where that parameter is independentof the original data(i.e.
`the pixel values of the un-translated image). For example, as
`described above, by imposing (upwards) vertical translation
`invariance on a 4 pixel image, the preprocessed image has
`oneless pixel. This, in effect, reduces the numberof degrees
`of freedom in the problem. Thus, this can be viewed as a
`form of capacity control, where the number of variables
`decreases in a controlled way as the number of imposed
`symmetries increases.
`Note also that the original data does not have this prop-
`erty: for pixel data, the difference between two translated
`images depends on the original image.
`Note that the solution of a partial differential equation
`(PDE) has a large degree of arbitrariness in the independent
`integrals found. For the above example, the uy could have
`equally been taken as the integrals instead of the u,. The
`former do not have the additive invariance property. Thus
`for a general problem, one can only hope to find particular
`additive invariant integrals, rathcr than prove that all intc-
`grals will be additive invariant.
`Vinally, it should be noted that ug and u, above are also
`solutions of the corresponding differential equations for
`vertical
`translation invariance in the opposite direction.
`Since the two operators commute there will be a total of two
`independent
`integrals. Consequently,
`the full solution is
`found, and this solution is clearly additive invariant.
`The example given in this Section raises the following
`questions for the case of arbitrarily sized images: First, can
`the general solution be obtained? (After all, solving a PDE
`with an arbitrary numberof dimensions could turn out to be
`analytically intractable). Second, in general, is there only
`one globally invariant independent integral analogousto ug
`above? If so, it will make looking for additive invariant
`solutions of considerable interest. Finally, can one find those
`special integrals which are also additive invariant? Answers
`to these questions are in the next Section.
`The N-pixel Stack
`As mentioned above, in order to answer the above ques-
`tions for an arbitrarily sized image, we only need consider
`the case of an N-pixel stack. The PDE analogousto equation
`(22) is
`=i
`
`(29)
`
`{jet — 4) )Ox ela) = O
`
`i°
`
`A No-Go Theorem
`
`Consider the following:
`Theorem: The general invariant solution of equation (29)
`has the form F(2,x,), where FEC".
`Proof: As used herein, the notation PRM denotes cyclic
`permutation of the indices. By definition, an invariant solu-
`tion mustsatisfy d.,u(x’)=0, where x' is the linear function of
`a defined in equation (27). However,
`
`Page 8 of 11
`
`
`
`9
`
`10
`
`6,112,195
`
`Og U(X") = (x2 — 41 )0, ue) + PRM.
`
`G30)
`
`((ey-1-¥0)otPRM)u=0
`
`(42)
`
`(43)
`
`Here, the notation d,=0/da and d',=0/dx',, etc. is intro-
`duced. A set of PDEsthat u must satisfy can be generated by
`expressing the x; in terms of the x';. Note first that the
`transformations (equation (16)) can be written
`
`‘These are independent equations, since the matrix of
`coefficients is of rank n—1 for some choicesofthe x,. Finally,
`it is straightforward to check that all the operators appearing
`in equation (40-43) commute,so this set of n-1 PDEs forms
`a complete set. Thus, it has only one integral solution. By
`substitution it is easily checked that u(x)=2,x, is a solution;
`thus the general solution takes the form F(2,x;), where
`TEC". This completes the proof of the theorem.
`This theorem demonstrates that one cannot hope to find
`globally invariant solutions of the equation (29) other than
`F(2;,x,). Thus, one needs to search for “second best”
`solutions, such as additive invariant ones.
`The General Solution
`A general solution for the N-pixel stack is now derived.
`Equation (29) may be viewed as the dot product of the
`gradient of an unknownfunction u with a vector field in n
`dimensions, andto find the general solution one must solve
`the set of PDE’s which describe the characteristic surfaces,
`which are themselves parameterized by some t€R'
`(e.g.,
`see, E. C. Zachmanoglou, Dale W. Thoe, “/ntroduction to
`Partial Differential Equations with Applications,’ Dover,
`(+1)? 1S(2 - 11) = @” + -(@— 1" Ny -
`n
`Mineola, N.Y., 1986):
`2— jaf-2 yo
`Me-v Ja
`x,
`dx
`(44)
`Fe
`— =Ax
`dt
`
`x'=Mx,
`
`where
`
`M,=6,(1-a)+8,;_,00,j=0, ... 2-1
`
`Note also that
`
`detM=S=(1-a)"-(-1)"a"
`
`i)
`
`10
`
`(32)
`
`(33)
`
`One can verify directly, by matrix multiplication, that
`
`)=C/s\o-ryeC
`
`G4)
`
`Consequently,
`
`(35)
`
`i}wn
`
`Byusing the fact that both M and M~? are cyclic matrices,
`it is straightforward to show that the expression for x,,,-x,;
`can be obtained from equation (35) by cycling the x’, on the
`right hand side.
`Lemma: The coefficient of @ on the right hand side of
`equation (35) may be written as (-1)"?""(-x', +x',,0).
`Proof: The proof is by induction. First, note by inspection
`that the coefficient of «° onthe right hand side is (-1)"x',-
`(-1)"°x'. ‘This provides the PDE
`
`{Qh — x), +PRMYu = 0
`
`40
`
`(36)
`
`Equation (36) can then be substituted in equation (35) to
`eliminate terms of order 1. Assuming that this elimination
`scheme worksup to order @”, the right hand side of cquation
`(35) becomes
`
`3
`
`where
`
`Aya-848,44
`
`A has determinant zero and eigenvalues i, given by
`
`1+Ak=e7™"k=0, yn
`
`(45)
`
`(46)
`
`Here and for the remainder of this description, symbol 1
`is reserved for the square root of -1. By inspection, eigen-
`vectors z are constructed
`bs
`
`aee2mijkin‘pk=0
`
`_ nel
`
`(47)
`
`where the first index k labels the eigenvector, and the
`second j its components. Tet S be the matrix whose columns
`are the eigenvectors. Then S diagonalizes A:
`
`S