`[19]
`[11] Patent Number:
`6,112,195
`
`Burges
`[45] Date of Patent:
`*Aug. 29, 2000
`
`USOO6112195A
`
`[54] ELIMINATING INVARIANCES BY
`;%%I;§8§SESSING FOR KERNEL'BASED
`
`[75]
`
`Inventor: Christopher John Burges, Freehold,
`NJ.
`
`[73] Ass1gnee:
`
`[*] Notice:
`
`INITIcent Technologles Inc., Murray [1111’
`'
`'
`This patent issued on a continued pros-
`ecution application filed under 37 CFR
`153((1): and is subject to the twenty year
`patent
`term provisions of 35 U.S.C.
`5400(2)
`
`[21] App]. NO‘: 08/825’287
`[22]
`Filed:
`M2112 27, 1997
`
`Int. C1.7 ............................... G06F 1/035, G06F 5/00
`[51]
`[52] US. Cl.
`............................. 706/20
`
`[58] Field of Search ............................
`382’159, 181,
`382/276; 706/20
`
`[56]
`
`References Cited
`US. PATENT DOCUMENTS
`
`A. Khotanzad and J.—H. Lu, “Classification of Invariant
`Image Representations Using a Neural Network,” IEEE
`Trans. on Aecoustics, Speech, and Signal Processing, vol. 38
`(6), pp. 1028—1038, Jun. 1990.
`
`J.I. MinniX, et al., Multistaged Selinrganizing Neural
`Network with Biologically Inspired Preprocessing Features
`for Rotation and Scale Invariant Pattern Recognition, vol. 3,
`pp. 1605—1610, Oct. 1991.
`
`SJ. Perantonis and P.J.G. I.isboa, “Translation, Rotation,
`and Scale Invariant Pattern Recognition by High—Order
`Neural Networks and Moment Classifiers,” IEEE Trans. on
`Neural Networks, vol. 3(2), pp. 241—251, Mar. 1992.
`
`C. Yuceer and K. Ofiazer, “A Rotation, Scaling and Trans-
`lation Invariant Pattern Classification System,” Proc. llth
`IAPR Int’l Conf. on Pattern Recognition, vol.
`II, pp.
`
`422425, Sep- 1992-
`
`.
`“
`.
`JjA‘ Stirzyk and S: .Chal’ WOW contour prresimafw
`ior Object Recognition in Neural Networks, 199- IEEE
`Int’l Conf. on Systems, Man and Cybernetics, vol. 1, pp.
`3994104, Oct. 1992.
`
`6/1989 Ochoa et a1.
`4,838,644
`............................... 359/9
`5,054,094 10/1991 Barski
`..................................... 382/192
`
`. 382/157
`5,263,107
`11/1993 Ueda et al.
`.
`5,729,662
`3/1998 Rozmus ......
`706/20
`
`' 382/217
`529267568
`7/1999 Chewy et a"
`
`9/1999 Vapmk ............
`. 702/153
`5,950,146
`
`9/1999 Greenspan et a1.
`.
`..... 382/240
`5,956,427
`....................... 324/307
`5,969,524 10/1999 Pierpaoli et a1.
`OTHER PUBLICATIONS
`
`(”St confirmed on new page)
`
`Primary Examiner—Tang R. Hafiz
`Assistant Examiner—Michael Pender
`,
`Attorney) Agent) or Firm—Joseph J~ OPalaCh
`
`[57]
`
`ABSTRACT
`
`Gilbert Strang, Linear Algebra and Its Applications, Third
`edition 341—42, 1988.
`Microsoft Press, Computer Dictionary, Third Edition, 272,
`375, 1997.
`RF. Lippmann, “Pattern Classification Using Neural Net-
`works," IEEE Communications Magazine, vol. 27 (11), pp.
`47—64, Nov. 1989.
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some syrnrnetry transforma-
`tion.
`
`14 Claims, 2 Drawing Sheets
`
`
`
`
`KERNEL-BASED MACHINE
`
`LEARNING SYSTEM
`
`PREPROCESSOR
`
`
`
`_,
`125
`
`
`120
`
`
`PROCESSOR
`
`105
`
`‘00
`
`Page 1 ofll
`
`SAMSUNG EXHIBIT 1031
`
`Page 1 of 11
`
`SAMSUNG EXHIBIT 1031
`
`
`
`6,112,195
`Page 2
`
`OTHER PUBLICATIONS
`
`Y. Onodera, et a1., “Translation and Rotation—Invariant
`Pattern Recognition Method Using Neural Network with
`Back—Propagation,” Communications on the Move, ICCS/
`ISITA ’92, vol. 2, pp. 5487552, Nov. 1992.
`P.Y. Simard, et al., “Memory—Based Character Recognition
`Using a Transformation Invariant Metric," Proc. 12th Int’l
`Conf. on Pattern Recognition, pp. 262—267, Oct. 1994.
`Q.—S. Chen, et a1., “Symmetric Phase—Only Matched Hil-
`tering of FourieriMellin Transforms for Image Registration
`and Recognition,” IEEE Trans. on Pattern Analysis and
`Machine Intelligence, vol. 16(12), pp. 1156—1168, Dec.
`1994.
`
`J. 'l‘anomaru and A. Inubushi, “A Compact Representation of
`Binary Patterns for Invarinat Recognition,” 1995 IEEE Int’l
`Conf. on Systems, Man and Cybernetics, vol. 2, pp.
`1550—1555, Oct. 1995.
`EA. Heredia and GR. Arce, “Piecewise Polynomial Kernel
`Networks,” Proc.
`Int’l. Conf. on Neural Networks, pp.
`155871563, Jun. 1996.
`C. J. C. Burges et al., “Improving the Accuracy and speed of
`support vector machines”, Advances in Neural Information
`Processing Systems 9. Proceedings of the 1996 Conference,
`Advances in Neural Information Processing Systems 9,
`Proceedings of the 1996 Conference, Denver, CO, USA,
`Dec.
`2—5, 1996, pp.
`375—381, XP002088166,
`ISBN
`0—262—10065—7, 1997, London, UK, MIT Press, UK, *the
`whole document* particularly relevant if taken alone.
`
`P. Y. Simard et a1., “MemoryiBased Character Recognition
`Using a Transformation Invariant Metric”, Proceedings of
`the IAPR International Conference on Pattern Recognition,
`Jerusalem, Oct. 9—13, 1994 Conference B: Pattern Recog-
`nition and Neural Networks, vol. 2, No. Conf. 12, Oct. 9,
`1994, pp. 262—267, XP000509894, Institute of Electrical
`and Electronics Engineers, *Abstract* technological back-
`ground.
`Hidefumi Sawai, “Axially Symmetric Neural Network
`Architecture for Rotation—Invariant Patter Recognition”,
`International Conference on Neural Networks/World Con-
`gress on Computational Intelliges, Orlando, Jun. 27719,
`1994, vol.7, Jun. 27, 1994, pp. 4253—4258, XP000531357,
`Institute of Electrical and Electronics Engineers, *Abstract*.
`Mas J. et al: Invariant Perception for Symmetry Related
`Patterns, International Joint Conference on Neural Networks
`(IJCNN), Washington, Jun. 19—22, 1989., vol. 1, Jun. 19,
`1989, pp. 85439, XP000088251, Institute of Electrical and
`Electronic Engineers—*Abstract*.
`“Extracting Support Data For A Given Task,” B. Scholkopf,
`et al., Eds. Proceedings, International Conference on Knowl-
`edge Discovery & Data Mining, AAAI Press, Menlo Park,
`CA 1995.
`
`“Incorporating Invariances in Support Vector Learning
`Machines”, B. Schoklopf et al., Proceedings ICANN’96—
`International Conference on Artificial Neural Networks,
`Springer Verlag, Berlin 1996.
`
`Page 2 of11
`
`Page 2 of 11
`
`
`
`US. Patent
`
`Aug. 29, 2000
`
`Sheet 1 0f2
`
`6,112,195
`
`FIG.
`
`1
`
`INPUT DATA (DIMENSION n)
`
`
`
`
`
`
`PREPROCESS THE DATA,
`(IMPOSE LOCAL SYMMETRIES ON THE DATA, GENERATE
`
`PROCESSED DATA OF DIMENSION m, WHERE m < n)
`
`
`
`
`
`
`
`OPERATE ON THE PREPROCESSED DATA
`BY APPLYING A KERNEL-BASED METHOD
`
`STEP 10
`
`STEP 20
`
`STEP 30
`
`END
`
`E— ________
`
`
`
`KERNEL‘BASED MACHINE
`
`PREPROCESSOR
`
`DATA CAPTURE
`
`LEARNING SYSTEM
`
`125
`
`1 15
`
`
`
`PROCESSOR
`
`
`
`
`
`105
`
`Page 3 ofll
`
`Page 3 of 11
`
`
`
`US. Patent
`
`Aug. 29,2000
`
`Sheet 2 0f2
`
`6,112,195
`
`FI G . 3
`
`
`
`GENERATE INPUT DATA
`
`
`
`
`
`STEP 305
`
`
`
`STEP 310
`
`
`
`
`PREPROCESS THE INPUT DATA
`
`STEP 315
`
`DETERMINE THE PARAMETERS 0F KERNEL-BASED
`
`
`
`
`
`
`CLASSIFICATION
`
`FIG. 4
`
`STEP 405
`
`STEP 410
`
`STEP 415
`
`
`
`GENERATE INPUT DATA
`
`
`PREPROCESS THE INPUT DATA
`
`
`OPERATE ON THE PREPROCESSED DATA BY APPLYING
`
`
`
`THE KERNEL—BASED METHOD TRAINING COEFFICIENTS
`
`
`DETERMINED DURING THE TRAINING PHASE OF FIG. 2
`
`
`
`
`
`Page 4 ofll
`
`Page 4 of 11
`
`
`
`6,112,195
`
`1
`ELIMINATING INVARIANCES BY
`PREPROCESSING FOR KERNEL-BASED
`METHODS
`
`2
`the SVM to operate more efficiently in terms of, e.g.,
`memory size, and training time, yet classify more patterns
`than in the prior art for a given-size SVM.
`
`FIELD OF THE INVENTION
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 shows an illustrative flow chart in accordance with
`
`the principles of the invention;
`FIG. 2 shows a block diagram of a portion of a recognition
`system embodying the principles of the invention;
`FIG. 3 shows an illustrative method for training the
`system of FIG. 2 in accordance with the principles of the
`invention; and
`FIG. 4 shown an illustrative method for operating the
`system of FIG. 2 in accordance with the principles of the
`invention.
`
`DETAILED DESCRIPTION
`
`Before describing an illustrative embodiment of the
`invention, the inventive concept itself is described. (Other
`than the inventive concept, it is assumed that the reader is
`familiar with mathematical notation used to generally rep-
`resent kernel-based methods as known in the art.) Also, the
`inventive concept is illustratively described in the context of
`pattern recognition. However, the inventive concept is appli-
`cable to all kernel-based methods. Some examples of the
`classes of problems covered by kernel-based methods are:
`regression estimates, density estimation, etc.
`Introduction
`As used herein, “kernel-based methods” means methods
`which approximate an unknown function G(x) by F(x),
`where F(x) has the form:
`
`q
`F(x) 2 Z aqK(pq, x) + 17, ac], b E R1, x E R”, pg 6 R”,
`
`(1)
`
`10
`
`I\)m
`
`LA)LII
`
`40
`
`and where aq, b, and Pq, are parameters that are to be
`determined from empirical data by a training procedure, and
`K is a kernel function, whose form is usually chosen in
`advance. Additive models (e.g., see T. J. Hastie and R. J.
`Tibshirani, Generalized Additive Models, Chapman and
`Hall, 1st edition, 1990), Radial Basis Functions (e.g., see M.
`J. I). Powell, Radial basis functions for multivariable inter-
`. polation: A review, In Algorithms for Approximation, J. C.
`Mason and M. G. Cox (Eds.), pages 143—167, Oxford
`Clarendon Press, 1987; F. Girosi, M. Jones, and T. Poggio,
`Regularization theory and neural networks architectures,
`Neural Computation, 7(2):219—269, 1995; and C. M.
`Bishop, Neural Networks for Pattern Recognition, Claren-
`don Press, Oxford, 1995; and Support Vector Machines (e. g.,
`see C. Cortes and V. Vapnik, Support vector networks,
`ll/[achine Learning, 20:273—297, 1995; and V. Vapnik, The
`Nature ofStatistical Learning Theory, Springer Verlag, New
`York 1995) are examples of such methods. Pattern
`recognition, regression estimation, density estimation, and
`operator inversion are examples of problems tackled with
`these approaches (e.g., see A. Smola, V. Vapnik, S.
`Golowich, Support vector method for
`function
`approximation, regression estimation, and signal processing,
`Advances in Neural Information Processing Systems, 9,
`1996). Thus for example in the density estimation case, x is
`a point (in a vector space) at which the probability density
`is required, and F(x) is the approximation to that density; for
`the classification case, x is a test pattern to be classified, and
`sgn(F(x)) gives the corresponding label. Similarly, for pat-
`tern recognition, an SVM is first trained to recognize a target
`
`60
`
`This invention relates generally to a class of problems
`falling within what is known as “kernel-based methods.”
`
`BACKGROUND OF THE INVENTION
`
`Pattern recognition, regression estimates, density estima-
`tion are a few examples of a class of problems that are
`analyzed using kernel—based methods. The latter are illus—
`tratively described herein in the context of pattern recogni-
`tion. However, it should be noted that the inventive concept
`(described below) is not limited to pattern recognition and is
`applicable to kernel-based methods in general (of which
`support—vector—machines are an example).
`In pattern recognition,
`it
`is known in the art to use a
`recognizer having a support-vector-machine (SVM) archi-
`tecture. The SVM is viewed as mapping an input image onto
`a decision plane. The output of the SVM is typically a
`numerical result,
`the value of which is associated with
`whether, or not, the input image has been recognized as a
`particular type of image.
`As a very general example, consider a 16 pixel by 16 pixel
`image of a tree. In this context, an SVM recognition system
`is first “trained” with a set of known images of a tree. For
`example, the SVM system could be trained on 1000 different
`tree images, each image represented by 256 pixels.
`Subsequently, during operation, or testing, the SVM system
`classifies input
`images using the training data generated
`from the 1000 known tree images. The SVM system indi-
`cates classification of an input image as the desired tree if,
`e.g., the output, or result, of the SVM is a positive number.
`Unfortunately, in the above example, the recognizer may
`have to deal not only with a particular type of tree image, but
`also with translates of that tree image. For example, a tree
`image that is shifted in the vertical direction—but is still the
`same tree. To some extent this kind of translation can be
`
`dealt with by using tree images that represent such a vertical
`shift. However, the SVM system is still trained to predefined
`images, it’s just that some of these predefined images are
`used to represent translations of the image (as opposed to,
`e.g., different types of trees).
`SUMMARY OF THE INVENTION
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some symmetry transforma-
`tion.
`
`In an embodiment of the invention, a pattern recognizer
`includes a preprocessor and a support vector machine
`(SVM). The latter is trained to recognize a particular set of
`images. The preprocessor operates on an input image in such
`a way as to provide local
`translation invariance.
`In
`particular, the preprocessor maps a particular input image,
`and its translate, to two points in the decision plane of the
`SVM, whose difference is independent of the original data.
`As a result, the recognizer has built-in local invariance and
`does not require training the SVM to translated versions of
`the images.
`In accordance with a feature of the invention, the size of
`the preprocessed image is less than the size of the original
`image. In other words, the SVM operates on less data than
`required in the prior art. Thus, the inventive concept enables
`
`Page 5 ofll
`
`Page 5 of 11
`
`
`
`6,112,195
`
`4
`
`2 mm.- K<pq, x) a map... x) = 0
`
`(6)
`
`which defines the operator a. From now on the parameter
`vector pq in K is not explicitly written.
`Operator Invariance for the Linear Case
`The following two simple theorems are proven for the
`case in which the transformation represented by equation (5)
`is both linear and invertible.
`
`10
`
`Theorem: For linear, invertible transformations (equation
`(5)), the operator 0 is itself invariant under the transfor-
`mation.
`Proof: Let U denote the unit matrix, “T” denote transpose,
`and 6 denote the vector with components a,Ea/'ax,. Denote
`the transformation by
`x‘=(U+o.M)x.
`Then the operator 0 in equation (6) is given by
`0 =xTMT0
`
`(7)
`
`and
`
`0’ = x’TMTa’ = Z Maw + 11M)jka(U + aM);}6m
`ijk
`
`(3)
`
`: xTMTE} : 0
`
`Theorem: For linear, invertible transformations (equation
`(5)), denoting the argument of K( ) by its components xi, if
`K(x,) satisfies equation (6), then so does K(x,+o.fi(x)), for
`finite ot.
`Proof:
`
`0K(x,)=0=0 K(x,+af,(x))=&K(x,+af,(x))
`since 0 is an invariant by the above theorem.
`Multiple Symmetries
`For a set of M symmetries, there will be M simultaneous
`equations of the form of equation (3), which will result in a
`system of M equations of the form
`
`(9)
`
`(10)
`, M
`.
`.
`, xfl,l:)=0, m=1, .
`.
`.
`amu(x0, .
`where {am} is a set of linear differential operators, each
`of which takes the general form
`
`em = Z fmzlez
`
`(11)
`
`analogous to equation (6).
`integrals of
`The questions arise: when do non—trivial
`equation (10) exist, and how many such independent inte-
`grals are there? Following “Techniques in Partial Dlfleren-
`tial Equations,” Clive R. Chester, McGraw-Hill, 1971, the
`following definitions are made:
`Definition: A system of operators {3,} is called complete
`if all commutators take the form
`
`l:
`[0,0,] = ECU-A0,.
`
`cm. 6 R1
`
`m
`
`)
`
`(I
`
`(Note that, from their definition, the cijk satisfy the known
`Jacobi identity, so they are in fact the structure constants of
`some Lie algebra (e.g., see Peter J. Olver, Applications ofl.ie
`Groups to Diflerential Equations, Springer-verlag, 1986,
`New York).)
`
`3
`image, as described earlier. During operation, or testing, the
`SVM indicates classification of input data (representing an
`input image) according to equation (1). If the SVM classifies
`the input data as the target image, F(x) is, e.g., equal to a
`positive number.
`Clearly this approach is general. However the generality
`brings with it the problem of model selection, i.e., which
`family of functions to choose for the kernels. While some
`work has been done on this (e.g., for the support vector
`pattern recognition case see B. Scholkopf, C. Burges, and V.
`Vapnik, Extracting support data for a given task, In U. M.
`Fayyad and R. Uthurusamy, editors, Proceedings, First
`International Conference on Knowledge Discovery & Data
`Mining, AAAI Press, Menlo Park, Calif., 1995), it remains
`a significant unsolved problem.
`In accordance with the inventive concept, the description
`below explores how imposing constraints arising from
`domain knowledge can restrict the family of kernels. In the
`following, bold face is used for vector and matrix quantities,
`and light face for their components.
`Incorporating Local Invariances
`This section describes the general approach of the inven—
`tive concept. The object is to find those kernels in equation
`(1) such that, if any given test point x is transformed under
`a known symmetry of the problem,
`the result F is
`unchanged.
`Given two test points x and y, equation (1) defines an
`affine structure with distance function:
`
`‘7
`fltx. y) = Z (1qulpq. x) — Klpq. y»
`
`(2)
`
`(An affine structure is a mathematical term for a type of
`vector space as known in the art). For example, p can be
`thought of as counting contours between patterns in the
`pattern recognition case.
`In accordance with the inventive concept, it is preferable
`to choose a class of kernels so that p(x, y) is close to zero
`if y is a transform of x along some symmetry direction. If
`y=x+dx, then
`
`qt
`02/) = Z aquiat Ktpq, x)
`
`where we define 8128/ax, Requiring that this be zero for all
`aq gives:
`
`dei8;K(pq,x)=O
`
`(4)
`
`I\)m
`
`LA)LII
`
`40
`
`Note that for a particular problem, for which the a, are
`known to satisfy certain constraints, equation (4) may be
`more restrictive than is necessary to ensure that dp =0, but in
`this work no such assumption about aq is made.
`We will consider transformations that take the general
`form:
`
`60
`
`xf=xi+afi(x),wERl
`
`(5)
`
`for which, in the limit as (1—>0, equation (4) takes the
`form:
`
`Page 6 ofll
`
`Page 6 of 11
`
`
`
`6,112,195
`
`5
`Definition: Asystem of equations (10) is called complete
`if the corresponding operators form a complete set.
`Theorem: Any complete system of r<n independent
`equations, of the form of equation (10), has exactly n—r
`independent integrals.
`Thus, non—trivial integrals will only exist if the number of
`operators r in the complete set is less than n, and if so, there
`will be n—r of them. In the latter case, the general solution
`of the system of equations (10) has the form
`
`F(u0(x0, .
`
`.
`
`.
`
`i «'14), .
`
`.
`
`.
`
`, u”,,,1(x0, .
`
`.
`
`-
`
`v #1))
`
`(13)
`
`10
`
`, u,,_r_1 are the set of independent integrals
`.
`.
`where uo, .
`of the system of equations (10) and F is any C1 (i.e.,
`differentiable) function (for reasons that will become clear
`below,
`the vector components are indexed starting with
`index 0). Now, the recipe for finding the complete set is
`clear: one generates new operators by computing commu-
`tators of the existing set of operators, and stops when either
`one has found a complete set, or when r>n.
`Note that this gives us capacity control exactly where we
`want it. For example, for a set of commuting symmetries
`(i.e., all Cijk vanish), the number of degrees of freedom in the
`problem is reduced by exactly the number of symmetries
`imposed.
`Thus, and in accordance with the inventive concept,
`generating an invariant kernel amounts to preprocessing the
`data, and then using any existing kernel. By imposing one
`symmetry only one degree of freedom is lost (e.g., in a
`pattern recognition problem, a 256 pixel image becomes a
`255 pixel, preprocessed image).
`Building Locally Invariant Kernels
`Given the general solution to the system of equations (10),
`construction of a kernel function remains. It may not be
`possible to simply substitute any F in equation (13) for K in
`equation (1), since the set of functions K may have further
`constraints placed on them by the particular method being
`used, for example, their dependence on the parameter set pq.
`However, such constraints are easily satisfied, since the only
`requirement on the F’s is that they are differentiable.
`For example, for support vectors machines, the kernels
`take the form K(sq, x), where the sq are the support vectors
`(which have dimension equal to that of x). There are two
`constraints 011 the form the kernels may take: they must be
`symmetric, K(x, y) =K(y,x) Vx, y in ER", and they must
`satisfy Mercer’s positivity constraint (e.g., see, B. E. Boser,
`I. M. Guyon, and V. Vapnik, Extracting support data for a
`given task, in U. M. Fayyad and R. Uthurusame, editors,
`Proceedings, First International Conference on Knowledge
`Discovery & Data Mining, AAAI Press, Menlo Park, Calif,
`1995; and R. Courant and D. Hilbert, Methods of Math-
`ematical Physics, Interscience, 1953) namely
`
`F(a, v)=(u-v+1)”,u, \ER”4
`
`(1 5)
`
`Since such kernels are symmetric and satisfy Mercer’s
`condition, all the desired constraints are satisfied. What was
`a polynomial support vector machine has, in accordance
`with the inventive concept, become a polynomial support
`vector machine acting on preprocessed data. However, the
`functional form that the overall kernels take, when consid-
`ered as functions of the input data x, will no longer in
`general be polynomial.
`An Example: Vertical Translation Invariance
`In this section, a detailed example is provided in the
`context of pattern recognition using an illustrative example
`of vertical translation invariance with cyclic boundary con-
`ditions. A cyclic boundary condition simply means that each
`image wraps around, e.g., if a column of pixels is “shifted
`up” one pixel, the top pixel “wraps-around” to the bottom.
`Consider an image of N rows and one column. Cyclic
`boundary conditions are imposed to simplify the analysis.
`Here, the local invariance is a shift of up to one pixel, i.e.,
`0t is between zero and one (larger shifts could be handled by
`extending the analysis below). In this case the transforma-
`tions take the form:
`
`I\)m
`
`x',=(1—a)x,+ax(,+,),i=o, .
`
`.
`
`.
`
`,N—1
`
`(16)
`
`LA)LII
`
`40
`
`where the pixel values x,- are considered to be components
`of a vector x. (The pixel values are image dependent, e.g.,
`a binary image pixel has a value of one (dark) and zero
`(light). Note that here and below the following convention
`are adopted: all indices i, j, k, and sub-expressions involving
`them, are to be taken modulo n.
`Note that solving the N by 1 problem amounts to solving
`the general N by M problem. An N by M image is converted
`to an NM by 1 image by pasting the top of each column of
`pixels to the bottom of the previous column. To make the
`cyclic boundary conditions a realistic assumption, a border
`of one blank pixel is added to the top and bottom of each
`image.
`The Relation with Group Action
`translation
`We start by using the example of vertical
`invariance to make some general observations regarding the
`rather subtle relation of the transformations (5) to a group
`action. (As known in the art, if a group action is proven then
`other mathematical transformations apply.) Since pixels are
`an approximation to a continuous function which can be
`considered as a representation of the group of translations,
`let us first make the connection between pixels and the
`continuous case. Let [(2) represent the original image field
`for which the N-pixel column is an approximation. Thus
`
`x,=1(i), i=0, .
`
`.
`
`. ,Nil
`
`(17)
`
`JK(ny)g(X)g(y)dxdy>0VgELi l (g(X))2dX>0
`
`(14)
`
`Translating by a distance a means replacing I(z) by
`I'(z)=I(z—ot). The new pixel values become
`
`The data are points in the vector space R”. Below, this
`space is referred to as I for “input.” (Call the vector space for
`the preprocessed data P for “preprocessed.”). Now, suppose
`that some number of symmetries have been imposed, result-
`ing in a complete set of size r<n, so that the number of
`independent integrals (and hence, the dimension of P) is n—r.
`Thus, the solutions have the general form of equation (13)
`above. We can then, in accordance with the principles of the
`invention, simply choose F to have the form of a kernel
`function which is known to satisfy the constraints, but which
`takes n—r arguments instead of n. As a specific example, we
`might take degree p polynomial kernels, for which the F will
`have the form
`
`x”i=I'(z')=I(i—a)=1(i)—u(azl) (i)
`
`(18)
`
`60
`
`Approximating (6XI)(i) by I(i)—I(i—1) then gives equation
`(16).
`The binning of the data into a vector has the consequence
`that equation (5), for finite 0., is not necessarily a group
`action, although it does constitute the desired
`transformation, even for finite a. This point is illustrated by
`the following simple specific case, namely vertical transla-
`tion invariance for a column of three pixels, whose values
`are labeled by x, y, z. Then equation (16) for x becomes:
`
`x"=gc,x:x(1—a)+ay
`
`(19)
`
`Page 7 ofll
`
`Page 7 of 11
`
`
`
`6,112,195
`
`7
`where go, is the operator instantiating the transformation.
`Then
`
`(g5 O,g.'c,;)x=x(1 —a)(1— [5)+y(CH-|3—20L[5)+zot[5
`
`(20)
`
`so there exists no V such that gy=gfi°gw However, to first
`order in or,
`[3, the above transformations do form a group,
`with ga'1=g_a. Thus, the action may be viewed as of a group
`only for infinitesimal values of the parameters (e.g., see the
`above-mentioned reference by Peter J. Olver). Despite this
`fact, the transformation represented by equation (19) does
`constitute a translation of the binned data for finite values of
`
`10
`
`(x (in fact for any (otE[0,1]).
`But if the action is a group action for infinitesimal values
`of the parameters, then the corresponding differential opera-
`tors are necessarily generators for some Lie group. Indeed,
`one may explicitly compute the exponentiation of the gen-
`erator corresponding to equation (16) acting on a particular
`point, for example:
`
`BMty7X)6X+(ZiV)6y+(X7Z)dzlx : x + ()7 _ X)h1 + (x _ 2), + Zlhz +
`(y 7 Ms + (x + y 7 mm + (x 7 Ms + (72x + y + 0%
`
`(21)
`
`where the h,— are functions of 0. alone. This only corre-
`sponds to the transformation (equation (19)) to first order in
`or. To summarize, the transformation (equation (19)) coin-
`cides with that of a Lie group only for infinitesimal values
`of the parameters; for finite values, it is no longer a group,
`but it is still the desired transformation.
`
`I\)m
`
`A Simple Example: The 4 Pixel Stack
`To show how things work,
`the following is a simple
`example of an image which consists of a stack of 4 pixels.
`Equation (6) becomes:
`
`f((Xi—xn)axu+(xz—x1)0x1+(xs‘xz)ax2+(x0‘x3)0x3}“(x)=0
`
`(22)
`
`LA)LII
`
`The general solution to this is:
`
`f(x05 2V1, A‘2- x3) 2 Rue, '41, '42)
`where
`
`u0=x0+x1+x2+x3
`
`
`u, = 14—“""1 ”Z ""3
`(X0 *X2)2 + (x5 7x02
`
`a
`a
`7
`1
`x0 — X2
`2
`M2 = arctan(x3
`x1 ) + ,—ln((x0 — X2)“ + (X3 — 251)“)
`
`)
`
`40
`
`(23)
`
`(24)
`
`05)
`
`(26)
`
`and where F is any C1.
`This solution has two properties to which attention should
`be drawn.
`
`First, 110, and only uO, is “globally” invariant (invariant for
`any of the allowed values of or in equation (16).
`Second, all three independent integrals have a property
`referred to herein as “additive invariance.”
`Additive Invariance
`Below, transformed coordinates are denoted with a prime:
`
`xfsx;(l—Lz)+x;+lw,i=0,...,3.
`
`60
`
`(27)
`
`Definition: An “additive invariant” integral is defined to
`be one that has the property:
`
`z¢j(x')=uj(x)+fj(a), j=0 .....2
`
`(28)
`
`for some functions f].
`
`8
`Denote by (I) the mapping which takes points in I to points
`in P. Additive invariance is a desirable property of the
`mapping (1), It means that two points in I which are related
`by a vertical translation, map into two points in P, whose
`difference is independent of the original data, and depends
`only on the transformation parameter. Thus, and in accor-
`dance with the principles of the invention, to distinguish two
`translated images from each other, the learning machine has
`only to learn a vector valued function of one parameter,
`where that parameter is independent of the original data (i.e.
`the pixel values of the un—translated image). For example, as
`described above, by imposing (upwards) vertical translation
`invariance on a 4 pixel image, the preprocessed image has
`one less pixel. This, in effect, reduces the number of degrees
`of freedom in the problem. Thus, this can be viewed as a
`form of capacity control, where the number of variables
`decreases in a controlled way as the number of imposed
`symmetries increases.
`Note also that the original data does not have this prop—
`erty: for pixel data, the difference between two translated
`images depends on the original image.
`Note that the solution of a partial differential equation
`(PDE) has a large degree of arbitrariness in the independent
`integrals found. For the above example, the u].2 could have
`equally been taken as the integrals instead of the uj. The
`former do not have the additive invariance property. Thus
`for a general problem, one can only hope to find particular
`additive invariant integrals, rather than prove that all inte-
`grals will be additive invariant.
`Finally, it should be noted that uO and 111 above are also
`solutions of the corresponding differential equations for
`vertical
`translation invariance in the opposite direction.
`Since the two operators commute there will be a total of two
`independent
`integrals. Consequently,
`the full solution is
`found, and this solution is clearly additive invariant.
`The example given in this Section raises the following
`questions for the case of arbitrarily sized images: First, can
`the general solution be obtained? (After all, solving a PDE
`with an arbitrary number of dimensions could turn out to be
`analytically intractable). Second, in general, is there only
`one globally invariant independent integral analogous to uO
`above? If so, it will make looking for additive invariant
`solutions of considerable interest. Finally, can one find those
`special integrals which are also additive invariant? Answers
`to these questions are in the next Section.
`The N-pixel Stack
`As mentioned above, in order to answer the above ques-
`tions for an arbitrarily sized image, we only need consider
`the case of an N-pixel stack. The PDE analogous to equation
`(22) is
`n71
`
`(29)
`
`1:0
`2am float-W) = 0
`
`A No-Go Theorem
`
`Consider the following:
`Theorem: The general invariant solution of equation (29)
`has the form F(Zixi), where FECJ.
`Proof: As used herein, the notation PRM denotes cyclic
`permutation of the indices. By definition, an invariant solu-
`tion must satisfy auu(x')=0, where x' is the linear filnction of
`or defined in equation (27). However,
`
`Page 8 ofll
`
`Page 8 of 11
`
`
`
`9
`
`10
`
`6,112,195
`
`6014(1) = (x2 —x1)81 u(x') + PRM.
`
`(30)
`
`((xivri‘xo)ao+PRM)“=0
`
`(42>
`
`(43)
`
`Here, the notation aaEa/aa and 3'1EB/aX'1, EtC- is intro-
`duced. A set of PDEs that u must satisfy can be generated by
`expressing the x, in terms of the x'i. Note first that the
`transformations (equation (16)) can be written
`
`5
`
`x’=Mx,
`
`where
`
`MUEfiij(1—a)+§,-J,lou',j=0, .
`
`.
`
`.
`
`,n—1
`
`Note also that
`
`detMES=(1—a)"—(—1)"0t”
`
`(31)
`
`10
`
`(32)
`
`(33)
`
`One can verify directly, by matrix multiplication, that
`
`(MAL-=05)(0’1:)i’j’10t’5i(*1)"51
`
`(34)
`
`Consequently,
`
`t—i)" 15th —x1,)= (0/ 1 — (a— i)n 1th —
`
`(35)
`
`I\)m
`
`These are independent equations, since the matrix of
`coefficients is of rank n—1 for some choices of the xi. Finally,
`it is straightforward to check that all the operators appearing
`in equation (40—43) commute, so this set of 11—1 PDEs forms
`a complete set. Thus, it has only one integral solution. By
`substitution it is easily checked that u(x)=2,xi is a solution;
`thus the general solution takes the form Fat-xi), where
`FECl. This completes the proof of the theorem.
`This theorem demonstrates that one cannot hope to find
`globally invariant solutions of the equation (29) other than
`F(Z,xi). Thus, one needs to search for “second best”
`solutions, such as additive invariant ones.
`The General Solution
`A general solution for the N-pixel stack is now derived.
`Equation (29) may be viewed as the dot product of the
`gradient of an unknown function u with a vector field in n
`dimensions, and to find the general solution one must solve
`the set of PDE’s which describe the characteristic surfaces,
`which are themselves parameterized by some tER1 (e.g.,
`see, E. C. Zachmanoglou, Dale W. Thoe, “Introduction to
`Partial Difierential Equations with Applications,” Dover,
`Mineola, NY, 1986):
`
`Li x
`— = Ax
`d t
`
`where
`
`A,=_a,j+5w+,
`
`A has determinant zero and eigenvalues M, given by
`
`1+Ak=eWk/",k=0, .
`
`.
`
`. ,n—i
`
`(44)
`
`(45)
`
`(46)
`
`Here and for the remainder of this description, symbol i
`is reserved for the square root of —1. By inspection, eigen-
`vectors Z are constructed
`
`Zkvj=e2mjk’”j,k=0, _
`
`_
`
`_
`
`,n—1
`
`(47)
`
`where the first index k labels the eigenvector, and the
`secondj its components. Let S be the matrix whose columns
`are the eigenvectors. Then S diagonalizes A:
`
`S’lAS=diag(}»,-)
`
`(48)
`
`Note that the following Lemma holds, despite the fact that
`A is neither symmetric nor hermitian:
`Lemma: The inverse of S is giv