throbber
Ulllted States Patent
`[19]
`[11] Patent Number:
`6,112,195
`
`Burges
`[45] Date of Patent:
`*Aug. 29, 2000
`
`USOO6112195A
`
`[54] ELIMINATING INVARIANCES BY
`;%%I;§8§SESSING FOR KERNEL'BASED
`
`[75]
`
`Inventor: Christopher John Burges, Freehold,
`NJ.
`
`[73] Ass1gnee:
`
`[*] Notice:
`
`INITIcent Technologles Inc., Murray [1111’
`'
`'
`This patent issued on a continued pros-
`ecution application filed under 37 CFR
`153((1): and is subject to the twenty year
`patent
`term provisions of 35 U.S.C.
`5400(2)
`
`[21] App]. NO‘: 08/825’287
`[22]
`Filed:
`M2112 27, 1997
`
`Int. C1.7 ............................... G06F 1/035, G06F 5/00
`[51]
`[52] US. Cl.
`............................. 706/20
`
`[58] Field of Search ............................
`382’159, 181,
`382/276; 706/20
`
`[56]
`
`References Cited
`US. PATENT DOCUMENTS
`
`A. Khotanzad and J.—H. Lu, “Classification of Invariant
`Image Representations Using a Neural Network,” IEEE
`Trans. on Aecoustics, Speech, and Signal Processing, vol. 38
`(6), pp. 1028—1038, Jun. 1990.
`
`J.I. MinniX, et al., Multistaged Selinrganizing Neural
`Network with Biologically Inspired Preprocessing Features
`for Rotation and Scale Invariant Pattern Recognition, vol. 3,
`pp. 1605—1610, Oct. 1991.
`
`SJ. Perantonis and P.J.G. I.isboa, “Translation, Rotation,
`and Scale Invariant Pattern Recognition by High—Order
`Neural Networks and Moment Classifiers,” IEEE Trans. on
`Neural Networks, vol. 3(2), pp. 241—251, Mar. 1992.
`
`C. Yuceer and K. Ofiazer, “A Rotation, Scaling and Trans-
`lation Invariant Pattern Classification System,” Proc. llth
`IAPR Int’l Conf. on Pattern Recognition, vol.
`II, pp.
`
`422425, Sep- 1992-
`
`.
`“
`.
`JjA‘ Stirzyk and S: .Chal’ WOW contour prresimafw
`ior Object Recognition in Neural Networks, 199- IEEE
`Int’l Conf. on Systems, Man and Cybernetics, vol. 1, pp.
`3994104, Oct. 1992.
`
`6/1989 Ochoa et a1.
`4,838,644
`............................... 359/9
`5,054,094 10/1991 Barski
`..................................... 382/192
`
`. 382/157
`5,263,107
`11/1993 Ueda et al.
`.
`5,729,662
`3/1998 Rozmus ......
`706/20
`
`' 382/217
`529267568
`7/1999 Chewy et a"
`
`9/1999 Vapmk ............
`. 702/153
`5,950,146
`
`9/1999 Greenspan et a1.
`.
`..... 382/240
`5,956,427
`....................... 324/307
`5,969,524 10/1999 Pierpaoli et a1.
`OTHER PUBLICATIONS
`
`(”St confirmed on new page)
`
`Primary Examiner—Tang R. Hafiz
`Assistant Examiner—Michael Pender
`,
`Attorney) Agent) or Firm—Joseph J~ OPalaCh
`
`[57]
`
`ABSTRACT
`
`Gilbert Strang, Linear Algebra and Its Applications, Third
`edition 341—42, 1988.
`Microsoft Press, Computer Dictionary, Third Edition, 272,
`375, 1997.
`RF. Lippmann, “Pattern Classification Using Neural Net-
`works," IEEE Communications Magazine, vol. 27 (11), pp.
`47—64, Nov. 1989.
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some syrnrnetry transforma-
`tion.
`
`14 Claims, 2 Drawing Sheets
`
`
`
`
`KERNEL-BASED MACHINE
`
`LEARNING SYSTEM
`
`PREPROCESSOR
`
`
`
`_,
`125
`
`
`120
`
`
`PROCESSOR
`
`105
`
`‘00
`
`Page 1 ofll
`
`SAMSUNG EXHIBIT 1031
`
`Page 1 of 11
`
`SAMSUNG EXHIBIT 1031
`
`

`

`6,112,195
`Page 2
`
`OTHER PUBLICATIONS
`
`Y. Onodera, et a1., “Translation and Rotation—Invariant
`Pattern Recognition Method Using Neural Network with
`Back—Propagation,” Communications on the Move, ICCS/
`ISITA ’92, vol. 2, pp. 5487552, Nov. 1992.
`P.Y. Simard, et al., “Memory—Based Character Recognition
`Using a Transformation Invariant Metric," Proc. 12th Int’l
`Conf. on Pattern Recognition, pp. 262—267, Oct. 1994.
`Q.—S. Chen, et a1., “Symmetric Phase—Only Matched Hil-
`tering of FourieriMellin Transforms for Image Registration
`and Recognition,” IEEE Trans. on Pattern Analysis and
`Machine Intelligence, vol. 16(12), pp. 1156—1168, Dec.
`1994.
`
`J. 'l‘anomaru and A. Inubushi, “A Compact Representation of
`Binary Patterns for Invarinat Recognition,” 1995 IEEE Int’l
`Conf. on Systems, Man and Cybernetics, vol. 2, pp.
`1550—1555, Oct. 1995.
`EA. Heredia and GR. Arce, “Piecewise Polynomial Kernel
`Networks,” Proc.
`Int’l. Conf. on Neural Networks, pp.
`155871563, Jun. 1996.
`C. J. C. Burges et al., “Improving the Accuracy and speed of
`support vector machines”, Advances in Neural Information
`Processing Systems 9. Proceedings of the 1996 Conference,
`Advances in Neural Information Processing Systems 9,
`Proceedings of the 1996 Conference, Denver, CO, USA,
`Dec.
`2—5, 1996, pp.
`375—381, XP002088166,
`ISBN
`0—262—10065—7, 1997, London, UK, MIT Press, UK, *the
`whole document* particularly relevant if taken alone.
`
`P. Y. Simard et a1., “MemoryiBased Character Recognition
`Using a Transformation Invariant Metric”, Proceedings of
`the IAPR International Conference on Pattern Recognition,
`Jerusalem, Oct. 9—13, 1994 Conference B: Pattern Recog-
`nition and Neural Networks, vol. 2, No. Conf. 12, Oct. 9,
`1994, pp. 262—267, XP000509894, Institute of Electrical
`and Electronics Engineers, *Abstract* technological back-
`ground.
`Hidefumi Sawai, “Axially Symmetric Neural Network
`Architecture for Rotation—Invariant Patter Recognition”,
`International Conference on Neural Networks/World Con-
`gress on Computational Intelliges, Orlando, Jun. 27719,
`1994, vol.7, Jun. 27, 1994, pp. 4253—4258, XP000531357,
`Institute of Electrical and Electronics Engineers, *Abstract*.
`Mas J. et al: Invariant Perception for Symmetry Related
`Patterns, International Joint Conference on Neural Networks
`(IJCNN), Washington, Jun. 19—22, 1989., vol. 1, Jun. 19,
`1989, pp. 85439, XP000088251, Institute of Electrical and
`Electronic Engineers—*Abstract*.
`“Extracting Support Data For A Given Task,” B. Scholkopf,
`et al., Eds. Proceedings, International Conference on Knowl-
`edge Discovery & Data Mining, AAAI Press, Menlo Park,
`CA 1995.
`
`“Incorporating Invariances in Support Vector Learning
`Machines”, B. Schoklopf et al., Proceedings ICANN’96—
`International Conference on Artificial Neural Networks,
`Springer Verlag, Berlin 1996.
`
`Page 2 of11
`
`Page 2 of 11
`
`

`

`US. Patent
`
`Aug. 29, 2000
`
`Sheet 1 0f2
`
`6,112,195
`
`FIG.
`
`1
`
`INPUT DATA (DIMENSION n)
`
`
`
`
`
`
`PREPROCESS THE DATA,
`(IMPOSE LOCAL SYMMETRIES ON THE DATA, GENERATE
`
`PROCESSED DATA OF DIMENSION m, WHERE m < n)
`
`
`
`
`
`
`
`OPERATE ON THE PREPROCESSED DATA
`BY APPLYING A KERNEL-BASED METHOD
`
`STEP 10
`
`STEP 20
`
`STEP 30
`
`END
`
`E— ________
`
`
`
`KERNEL‘BASED MACHINE
`
`PREPROCESSOR
`
`DATA CAPTURE
`
`LEARNING SYSTEM
`
`125
`
`1 15
`
`
`
`PROCESSOR
`
`
`
`
`
`105
`
`Page 3 ofll
`
`Page 3 of 11
`
`

`

`US. Patent
`
`Aug. 29,2000
`
`Sheet 2 0f2
`
`6,112,195
`
`FI G . 3
`
`
`
`GENERATE INPUT DATA
`
`
`
`
`
`STEP 305
`
`
`
`STEP 310
`
`
`
`
`PREPROCESS THE INPUT DATA
`
`STEP 315
`
`DETERMINE THE PARAMETERS 0F KERNEL-BASED
`
`
`
`
`
`
`CLASSIFICATION
`
`FIG. 4
`
`STEP 405
`
`STEP 410
`
`STEP 415
`
`
`
`GENERATE INPUT DATA
`
`
`PREPROCESS THE INPUT DATA
`
`
`OPERATE ON THE PREPROCESSED DATA BY APPLYING
`
`
`
`THE KERNEL—BASED METHOD TRAINING COEFFICIENTS
`
`
`DETERMINED DURING THE TRAINING PHASE OF FIG. 2
`
`
`
`
`
`Page 4 ofll
`
`Page 4 of 11
`
`

`

`6,112,195
`
`1
`ELIMINATING INVARIANCES BY
`PREPROCESSING FOR KERNEL-BASED
`METHODS
`
`2
`the SVM to operate more efficiently in terms of, e.g.,
`memory size, and training time, yet classify more patterns
`than in the prior art for a given-size SVM.
`
`FIELD OF THE INVENTION
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 shows an illustrative flow chart in accordance with
`
`the principles of the invention;
`FIG. 2 shows a block diagram of a portion of a recognition
`system embodying the principles of the invention;
`FIG. 3 shows an illustrative method for training the
`system of FIG. 2 in accordance with the principles of the
`invention; and
`FIG. 4 shown an illustrative method for operating the
`system of FIG. 2 in accordance with the principles of the
`invention.
`
`DETAILED DESCRIPTION
`
`Before describing an illustrative embodiment of the
`invention, the inventive concept itself is described. (Other
`than the inventive concept, it is assumed that the reader is
`familiar with mathematical notation used to generally rep-
`resent kernel-based methods as known in the art.) Also, the
`inventive concept is illustratively described in the context of
`pattern recognition. However, the inventive concept is appli-
`cable to all kernel-based methods. Some examples of the
`classes of problems covered by kernel-based methods are:
`regression estimates, density estimation, etc.
`Introduction
`As used herein, “kernel-based methods” means methods
`which approximate an unknown function G(x) by F(x),
`where F(x) has the form:
`
`q
`F(x) 2 Z aqK(pq, x) + 17, ac], b E R1, x E R”, pg 6 R”,
`
`(1)
`
`10
`
`I\)m
`
`LA)LII
`
`40
`
`and where aq, b, and Pq, are parameters that are to be
`determined from empirical data by a training procedure, and
`K is a kernel function, whose form is usually chosen in
`advance. Additive models (e.g., see T. J. Hastie and R. J.
`Tibshirani, Generalized Additive Models, Chapman and
`Hall, 1st edition, 1990), Radial Basis Functions (e.g., see M.
`J. I). Powell, Radial basis functions for multivariable inter-
`. polation: A review, In Algorithms for Approximation, J. C.
`Mason and M. G. Cox (Eds.), pages 143—167, Oxford
`Clarendon Press, 1987; F. Girosi, M. Jones, and T. Poggio,
`Regularization theory and neural networks architectures,
`Neural Computation, 7(2):219—269, 1995; and C. M.
`Bishop, Neural Networks for Pattern Recognition, Claren-
`don Press, Oxford, 1995; and Support Vector Machines (e. g.,
`see C. Cortes and V. Vapnik, Support vector networks,
`ll/[achine Learning, 20:273—297, 1995; and V. Vapnik, The
`Nature ofStatistical Learning Theory, Springer Verlag, New
`York 1995) are examples of such methods. Pattern
`recognition, regression estimation, density estimation, and
`operator inversion are examples of problems tackled with
`these approaches (e.g., see A. Smola, V. Vapnik, S.
`Golowich, Support vector method for
`function
`approximation, regression estimation, and signal processing,
`Advances in Neural Information Processing Systems, 9,
`1996). Thus for example in the density estimation case, x is
`a point (in a vector space) at which the probability density
`is required, and F(x) is the approximation to that density; for
`the classification case, x is a test pattern to be classified, and
`sgn(F(x)) gives the corresponding label. Similarly, for pat-
`tern recognition, an SVM is first trained to recognize a target
`
`60
`
`This invention relates generally to a class of problems
`falling within what is known as “kernel-based methods.”
`
`BACKGROUND OF THE INVENTION
`
`Pattern recognition, regression estimates, density estima-
`tion are a few examples of a class of problems that are
`analyzed using kernel—based methods. The latter are illus—
`tratively described herein in the context of pattern recogni-
`tion. However, it should be noted that the inventive concept
`(described below) is not limited to pattern recognition and is
`applicable to kernel-based methods in general (of which
`support—vector—machines are an example).
`In pattern recognition,
`it
`is known in the art to use a
`recognizer having a support-vector-machine (SVM) archi-
`tecture. The SVM is viewed as mapping an input image onto
`a decision plane. The output of the SVM is typically a
`numerical result,
`the value of which is associated with
`whether, or not, the input image has been recognized as a
`particular type of image.
`As a very general example, consider a 16 pixel by 16 pixel
`image of a tree. In this context, an SVM recognition system
`is first “trained” with a set of known images of a tree. For
`example, the SVM system could be trained on 1000 different
`tree images, each image represented by 256 pixels.
`Subsequently, during operation, or testing, the SVM system
`classifies input
`images using the training data generated
`from the 1000 known tree images. The SVM system indi-
`cates classification of an input image as the desired tree if,
`e.g., the output, or result, of the SVM is a positive number.
`Unfortunately, in the above example, the recognizer may
`have to deal not only with a particular type of tree image, but
`also with translates of that tree image. For example, a tree
`image that is shifted in the vertical direction—but is still the
`same tree. To some extent this kind of translation can be
`
`dealt with by using tree images that represent such a vertical
`shift. However, the SVM system is still trained to predefined
`images, it’s just that some of these predefined images are
`used to represent translations of the image (as opposed to,
`e.g., different types of trees).
`SUMMARY OF THE INVENTION
`
`A kernel-based method and apparatus includes a
`preprocessor, which operates on an input data in such a way
`as to provide invariance under some symmetry transforma-
`tion.
`
`In an embodiment of the invention, a pattern recognizer
`includes a preprocessor and a support vector machine
`(SVM). The latter is trained to recognize a particular set of
`images. The preprocessor operates on an input image in such
`a way as to provide local
`translation invariance.
`In
`particular, the preprocessor maps a particular input image,
`and its translate, to two points in the decision plane of the
`SVM, whose difference is independent of the original data.
`As a result, the recognizer has built-in local invariance and
`does not require training the SVM to translated versions of
`the images.
`In accordance with a feature of the invention, the size of
`the preprocessed image is less than the size of the original
`image. In other words, the SVM operates on less data than
`required in the prior art. Thus, the inventive concept enables
`
`Page 5 ofll
`
`Page 5 of 11
`
`

`

`6,112,195
`
`4
`
`2 mm.- K<pq, x) a map... x) = 0
`
`(6)
`
`which defines the operator a. From now on the parameter
`vector pq in K is not explicitly written.
`Operator Invariance for the Linear Case
`The following two simple theorems are proven for the
`case in which the transformation represented by equation (5)
`is both linear and invertible.
`
`10
`
`Theorem: For linear, invertible transformations (equation
`(5)), the operator 0 is itself invariant under the transfor-
`mation.
`Proof: Let U denote the unit matrix, “T” denote transpose,
`and 6 denote the vector with components a,Ea/'ax,. Denote
`the transformation by
`x‘=(U+o.M)x.
`Then the operator 0 in equation (6) is given by
`0 =xTMT0
`
`(7)
`
`and
`
`0’ = x’TMTa’ = Z Maw + 11M)jka(U + aM);}6m
`ijk
`
`(3)
`
`: xTMTE} : 0
`
`Theorem: For linear, invertible transformations (equation
`(5)), denoting the argument of K( ) by its components xi, if
`K(x,) satisfies equation (6), then so does K(x,+o.fi(x)), for
`finite ot.
`Proof:
`
`0K(x,)=0=0 K(x,+af,(x))=&K(x,+af,(x))
`since 0 is an invariant by the above theorem.
`Multiple Symmetries
`For a set of M symmetries, there will be M simultaneous
`equations of the form of equation (3), which will result in a
`system of M equations of the form
`
`(9)
`
`(10)
`, M
`.
`.
`, xfl,l:)=0, m=1, .
`.
`.
`amu(x0, .
`where {am} is a set of linear differential operators, each
`of which takes the general form
`
`em = Z fmzlez
`
`(11)
`
`analogous to equation (6).
`integrals of
`The questions arise: when do non—trivial
`equation (10) exist, and how many such independent inte-
`grals are there? Following “Techniques in Partial Dlfleren-
`tial Equations,” Clive R. Chester, McGraw-Hill, 1971, the
`following definitions are made:
`Definition: A system of operators {3,} is called complete
`if all commutators take the form
`
`l:
`[0,0,] = ECU-A0,.
`
`cm. 6 R1
`
`m
`
`)
`
`(I
`
`(Note that, from their definition, the cijk satisfy the known
`Jacobi identity, so they are in fact the structure constants of
`some Lie algebra (e.g., see Peter J. Olver, Applications ofl.ie
`Groups to Diflerential Equations, Springer-verlag, 1986,
`New York).)
`
`3
`image, as described earlier. During operation, or testing, the
`SVM indicates classification of input data (representing an
`input image) according to equation (1). If the SVM classifies
`the input data as the target image, F(x) is, e.g., equal to a
`positive number.
`Clearly this approach is general. However the generality
`brings with it the problem of model selection, i.e., which
`family of functions to choose for the kernels. While some
`work has been done on this (e.g., for the support vector
`pattern recognition case see B. Scholkopf, C. Burges, and V.
`Vapnik, Extracting support data for a given task, In U. M.
`Fayyad and R. Uthurusamy, editors, Proceedings, First
`International Conference on Knowledge Discovery & Data
`Mining, AAAI Press, Menlo Park, Calif., 1995), it remains
`a significant unsolved problem.
`In accordance with the inventive concept, the description
`below explores how imposing constraints arising from
`domain knowledge can restrict the family of kernels. In the
`following, bold face is used for vector and matrix quantities,
`and light face for their components.
`Incorporating Local Invariances
`This section describes the general approach of the inven—
`tive concept. The object is to find those kernels in equation
`(1) such that, if any given test point x is transformed under
`a known symmetry of the problem,
`the result F is
`unchanged.
`Given two test points x and y, equation (1) defines an
`affine structure with distance function:
`
`‘7
`fltx. y) = Z (1qulpq. x) — Klpq. y»
`
`(2)
`
`(An affine structure is a mathematical term for a type of
`vector space as known in the art). For example, p can be
`thought of as counting contours between patterns in the
`pattern recognition case.
`In accordance with the inventive concept, it is preferable
`to choose a class of kernels so that p(x, y) is close to zero
`if y is a transform of x along some symmetry direction. If
`y=x+dx, then
`
`qt
`02/) = Z aquiat Ktpq, x)
`
`where we define 8128/ax, Requiring that this be zero for all
`aq gives:
`
`dei8;K(pq,x)=O
`
`(4)
`
`I\)m
`
`LA)LII
`
`40
`
`Note that for a particular problem, for which the a, are
`known to satisfy certain constraints, equation (4) may be
`more restrictive than is necessary to ensure that dp =0, but in
`this work no such assumption about aq is made.
`We will consider transformations that take the general
`form:
`
`60
`
`xf=xi+afi(x),wERl
`
`(5)
`
`for which, in the limit as (1—>0, equation (4) takes the
`form:
`
`Page 6 ofll
`
`Page 6 of 11
`
`

`

`6,112,195
`
`5
`Definition: Asystem of equations (10) is called complete
`if the corresponding operators form a complete set.
`Theorem: Any complete system of r<n independent
`equations, of the form of equation (10), has exactly n—r
`independent integrals.
`Thus, non—trivial integrals will only exist if the number of
`operators r in the complete set is less than n, and if so, there
`will be n—r of them. In the latter case, the general solution
`of the system of equations (10) has the form
`
`F(u0(x0, .
`
`.
`
`.
`
`i «'14), .
`
`.
`
`.
`
`, u”,,,1(x0, .
`
`.
`
`-
`
`v #1))
`
`(13)
`
`10
`
`, u,,_r_1 are the set of independent integrals
`.
`.
`where uo, .
`of the system of equations (10) and F is any C1 (i.e.,
`differentiable) function (for reasons that will become clear
`below,
`the vector components are indexed starting with
`index 0). Now, the recipe for finding the complete set is
`clear: one generates new operators by computing commu-
`tators of the existing set of operators, and stops when either
`one has found a complete set, or when r>n.
`Note that this gives us capacity control exactly where we
`want it. For example, for a set of commuting symmetries
`(i.e., all Cijk vanish), the number of degrees of freedom in the
`problem is reduced by exactly the number of symmetries
`imposed.
`Thus, and in accordance with the inventive concept,
`generating an invariant kernel amounts to preprocessing the
`data, and then using any existing kernel. By imposing one
`symmetry only one degree of freedom is lost (e.g., in a
`pattern recognition problem, a 256 pixel image becomes a
`255 pixel, preprocessed image).
`Building Locally Invariant Kernels
`Given the general solution to the system of equations (10),
`construction of a kernel function remains. It may not be
`possible to simply substitute any F in equation (13) for K in
`equation (1), since the set of functions K may have further
`constraints placed on them by the particular method being
`used, for example, their dependence on the parameter set pq.
`However, such constraints are easily satisfied, since the only
`requirement on the F’s is that they are differentiable.
`For example, for support vectors machines, the kernels
`take the form K(sq, x), where the sq are the support vectors
`(which have dimension equal to that of x). There are two
`constraints 011 the form the kernels may take: they must be
`symmetric, K(x, y) =K(y,x) Vx, y in ER", and they must
`satisfy Mercer’s positivity constraint (e.g., see, B. E. Boser,
`I. M. Guyon, and V. Vapnik, Extracting support data for a
`given task, in U. M. Fayyad and R. Uthurusame, editors,
`Proceedings, First International Conference on Knowledge
`Discovery & Data Mining, AAAI Press, Menlo Park, Calif,
`1995; and R. Courant and D. Hilbert, Methods of Math-
`ematical Physics, Interscience, 1953) namely
`
`F(a, v)=(u-v+1)”,u, \ER”4
`
`(1 5)
`
`Since such kernels are symmetric and satisfy Mercer’s
`condition, all the desired constraints are satisfied. What was
`a polynomial support vector machine has, in accordance
`with the inventive concept, become a polynomial support
`vector machine acting on preprocessed data. However, the
`functional form that the overall kernels take, when consid-
`ered as functions of the input data x, will no longer in
`general be polynomial.
`An Example: Vertical Translation Invariance
`In this section, a detailed example is provided in the
`context of pattern recognition using an illustrative example
`of vertical translation invariance with cyclic boundary con-
`ditions. A cyclic boundary condition simply means that each
`image wraps around, e.g., if a column of pixels is “shifted
`up” one pixel, the top pixel “wraps-around” to the bottom.
`Consider an image of N rows and one column. Cyclic
`boundary conditions are imposed to simplify the analysis.
`Here, the local invariance is a shift of up to one pixel, i.e.,
`0t is between zero and one (larger shifts could be handled by
`extending the analysis below). In this case the transforma-
`tions take the form:
`
`I\)m
`
`x',=(1—a)x,+ax(,+,),i=o, .
`
`.
`
`.
`
`,N—1
`
`(16)
`
`LA)LII
`
`40
`
`where the pixel values x,- are considered to be components
`of a vector x. (The pixel values are image dependent, e.g.,
`a binary image pixel has a value of one (dark) and zero
`(light). Note that here and below the following convention
`are adopted: all indices i, j, k, and sub-expressions involving
`them, are to be taken modulo n.
`Note that solving the N by 1 problem amounts to solving
`the general N by M problem. An N by M image is converted
`to an NM by 1 image by pasting the top of each column of
`pixels to the bottom of the previous column. To make the
`cyclic boundary conditions a realistic assumption, a border
`of one blank pixel is added to the top and bottom of each
`image.
`The Relation with Group Action
`translation
`We start by using the example of vertical
`invariance to make some general observations regarding the
`rather subtle relation of the transformations (5) to a group
`action. (As known in the art, if a group action is proven then
`other mathematical transformations apply.) Since pixels are
`an approximation to a continuous function which can be
`considered as a representation of the group of translations,
`let us first make the connection between pixels and the
`continuous case. Let [(2) represent the original image field
`for which the N-pixel column is an approximation. Thus
`
`x,=1(i), i=0, .
`
`.
`
`. ,Nil
`
`(17)
`
`JK(ny)g(X)g(y)dxdy>0VgELi l (g(X))2dX>0
`
`(14)
`
`Translating by a distance a means replacing I(z) by
`I'(z)=I(z—ot). The new pixel values become
`
`The data are points in the vector space R”. Below, this
`space is referred to as I for “input.” (Call the vector space for
`the preprocessed data P for “preprocessed.”). Now, suppose
`that some number of symmetries have been imposed, result-
`ing in a complete set of size r<n, so that the number of
`independent integrals (and hence, the dimension of P) is n—r.
`Thus, the solutions have the general form of equation (13)
`above. We can then, in accordance with the principles of the
`invention, simply choose F to have the form of a kernel
`function which is known to satisfy the constraints, but which
`takes n—r arguments instead of n. As a specific example, we
`might take degree p polynomial kernels, for which the F will
`have the form
`
`x”i=I'(z')=I(i—a)=1(i)—u(azl) (i)
`
`(18)
`
`60
`
`Approximating (6XI)(i) by I(i)—I(i—1) then gives equation
`(16).
`The binning of the data into a vector has the consequence
`that equation (5), for finite 0., is not necessarily a group
`action, although it does constitute the desired
`transformation, even for finite a. This point is illustrated by
`the following simple specific case, namely vertical transla-
`tion invariance for a column of three pixels, whose values
`are labeled by x, y, z. Then equation (16) for x becomes:
`
`x"=gc,x:x(1—a)+ay
`
`(19)
`
`Page 7 ofll
`
`Page 7 of 11
`
`

`

`6,112,195
`
`7
`where go, is the operator instantiating the transformation.
`Then
`
`(g5 O,g.'c,;)x=x(1 —a)(1— [5)+y(CH-|3—20L[5)+zot[5
`
`(20)
`
`so there exists no V such that gy=gfi°gw However, to first
`order in or,
`[3, the above transformations do form a group,
`with ga'1=g_a. Thus, the action may be viewed as of a group
`only for infinitesimal values of the parameters (e.g., see the
`above-mentioned reference by Peter J. Olver). Despite this
`fact, the transformation represented by equation (19) does
`constitute a translation of the binned data for finite values of
`
`10
`
`(x (in fact for any (otE[0,1]).
`But if the action is a group action for infinitesimal values
`of the parameters, then the corresponding differential opera-
`tors are necessarily generators for some Lie group. Indeed,
`one may explicitly compute the exponentiation of the gen-
`erator corresponding to equation (16) acting on a particular
`point, for example:
`
`BMty7X)6X+(ZiV)6y+(X7Z)dzlx : x + ()7 _ X)h1 + (x _ 2), + Zlhz +
`(y 7 Ms + (x + y 7 mm + (x 7 Ms + (72x + y + 0%
`
`(21)
`
`where the h,— are functions of 0. alone. This only corre-
`sponds to the transformation (equation (19)) to first order in
`or. To summarize, the transformation (equation (19)) coin-
`cides with that of a Lie group only for infinitesimal values
`of the parameters; for finite values, it is no longer a group,
`but it is still the desired transformation.
`
`I\)m
`
`A Simple Example: The 4 Pixel Stack
`To show how things work,
`the following is a simple
`example of an image which consists of a stack of 4 pixels.
`Equation (6) becomes:
`
`f((Xi—xn)axu+(xz—x1)0x1+(xs‘xz)ax2+(x0‘x3)0x3}“(x)=0
`
`(22)
`
`LA)LII
`
`The general solution to this is:
`
`f(x05 2V1, A‘2- x3) 2 Rue, '41, '42)
`where
`
`u0=x0+x1+x2+x3
`
`
`u, = 14—“""1 ”Z ""3
`(X0 *X2)2 + (x5 7x02
`
`a
`a
`7
`1
`x0 — X2
`2
`M2 = arctan(x3
`x1 ) + ,—ln((x0 — X2)“ + (X3 — 251)“)
`
`)
`
`40
`
`(23)
`
`(24)
`
`05)
`
`(26)
`
`and where F is any C1.
`This solution has two properties to which attention should
`be drawn.
`
`First, 110, and only uO, is “globally” invariant (invariant for
`any of the allowed values of or in equation (16).
`Second, all three independent integrals have a property
`referred to herein as “additive invariance.”
`Additive Invariance
`Below, transformed coordinates are denoted with a prime:
`
`xfsx;(l—Lz)+x;+lw,i=0,...,3.
`
`60
`
`(27)
`
`Definition: An “additive invariant” integral is defined to
`be one that has the property:
`
`z¢j(x')=uj(x)+fj(a), j=0 .....2
`
`(28)
`
`for some functions f].
`
`8
`Denote by (I) the mapping which takes points in I to points
`in P. Additive invariance is a desirable property of the
`mapping (1), It means that two points in I which are related
`by a vertical translation, map into two points in P, whose
`difference is independent of the original data, and depends
`only on the transformation parameter. Thus, and in accor-
`dance with the principles of the invention, to distinguish two
`translated images from each other, the learning machine has
`only to learn a vector valued function of one parameter,
`where that parameter is independent of the original data (i.e.
`the pixel values of the un—translated image). For example, as
`described above, by imposing (upwards) vertical translation
`invariance on a 4 pixel image, the preprocessed image has
`one less pixel. This, in effect, reduces the number of degrees
`of freedom in the problem. Thus, this can be viewed as a
`form of capacity control, where the number of variables
`decreases in a controlled way as the number of imposed
`symmetries increases.
`Note also that the original data does not have this prop—
`erty: for pixel data, the difference between two translated
`images depends on the original image.
`Note that the solution of a partial differential equation
`(PDE) has a large degree of arbitrariness in the independent
`integrals found. For the above example, the u].2 could have
`equally been taken as the integrals instead of the uj. The
`former do not have the additive invariance property. Thus
`for a general problem, one can only hope to find particular
`additive invariant integrals, rather than prove that all inte-
`grals will be additive invariant.
`Finally, it should be noted that uO and 111 above are also
`solutions of the corresponding differential equations for
`vertical
`translation invariance in the opposite direction.
`Since the two operators commute there will be a total of two
`independent
`integrals. Consequently,
`the full solution is
`found, and this solution is clearly additive invariant.
`The example given in this Section raises the following
`questions for the case of arbitrarily sized images: First, can
`the general solution be obtained? (After all, solving a PDE
`with an arbitrary number of dimensions could turn out to be
`analytically intractable). Second, in general, is there only
`one globally invariant independent integral analogous to uO
`above? If so, it will make looking for additive invariant
`solutions of considerable interest. Finally, can one find those
`special integrals which are also additive invariant? Answers
`to these questions are in the next Section.
`The N-pixel Stack
`As mentioned above, in order to answer the above ques-
`tions for an arbitrarily sized image, we only need consider
`the case of an N-pixel stack. The PDE analogous to equation
`(22) is
`n71
`
`(29)
`
`1:0
`2am float-W) = 0
`
`A No-Go Theorem
`
`Consider the following:
`Theorem: The general invariant solution of equation (29)
`has the form F(Zixi), where FECJ.
`Proof: As used herein, the notation PRM denotes cyclic
`permutation of the indices. By definition, an invariant solu-
`tion must satisfy auu(x')=0, where x' is the linear filnction of
`or defined in equation (27). However,
`
`Page 8 ofll
`
`Page 8 of 11
`
`

`

`9
`
`10
`
`6,112,195
`
`6014(1) = (x2 —x1)81 u(x') + PRM.
`
`(30)
`
`((xivri‘xo)ao+PRM)“=0
`
`(42>
`
`(43)
`
`Here, the notation aaEa/aa and 3'1EB/aX'1, EtC- is intro-
`duced. A set of PDEs that u must satisfy can be generated by
`expressing the x, in terms of the x'i. Note first that the
`transformations (equation (16)) can be written
`
`5
`
`x’=Mx,
`
`where
`
`MUEfiij(1—a)+§,-J,lou',j=0, .
`
`.
`
`.
`
`,n—1
`
`Note also that
`
`detMES=(1—a)"—(—1)"0t”
`
`(31)
`
`10
`
`(32)
`
`(33)
`
`One can verify directly, by matrix multiplication, that
`
`(MAL-=05)(0’1:)i’j’10t’5i(*1)"51
`
`(34)
`
`Consequently,
`
`t—i)" 15th —x1,)= (0/ 1 — (a— i)n 1th —
`
`(35)
`
`I\)m
`
`These are independent equations, since the matrix of
`coefficients is of rank n—1 for some choices of the xi. Finally,
`it is straightforward to check that all the operators appearing
`in equation (40—43) commute, so this set of 11—1 PDEs forms
`a complete set. Thus, it has only one integral solution. By
`substitution it is easily checked that u(x)=2,xi is a solution;
`thus the general solution takes the form Fat-xi), where
`FECl. This completes the proof of the theorem.
`This theorem demonstrates that one cannot hope to find
`globally invariant solutions of the equation (29) other than
`F(Z,xi). Thus, one needs to search for “second best”
`solutions, such as additive invariant ones.
`The General Solution
`A general solution for the N-pixel stack is now derived.
`Equation (29) may be viewed as the dot product of the
`gradient of an unknown function u with a vector field in n
`dimensions, and to find the general solution one must solve
`the set of PDE’s which describe the characteristic surfaces,
`which are themselves parameterized by some tER1 (e.g.,
`see, E. C. Zachmanoglou, Dale W. Thoe, “Introduction to
`Partial Difierential Equations with Applications,” Dover,
`Mineola, NY, 1986):
`
`Li x
`— = Ax
`d t
`
`where
`
`A,=_a,j+5w+,
`
`A has determinant zero and eigenvalues M, given by
`
`1+Ak=eWk/",k=0, .
`
`.
`
`. ,n—i
`
`(44)
`
`(45)
`
`(46)
`
`Here and for the remainder of this description, symbol i
`is reserved for the square root of —1. By inspection, eigen-
`vectors Z are constructed
`
`Zkvj=e2mjk’”j,k=0, _
`
`_
`
`_
`
`,n—1
`
`(47)
`
`where the first index k labels the eigenvector, and the
`secondj its components. Let S be the matrix whose columns
`are the eigenvectors. Then S diagonalizes A:
`
`S’lAS=diag(}»,-)
`
`(48)
`
`Note that the following Lemma holds, despite the fact that
`A is neither symmetric nor hermitian:
`Lemma: The inverse of S is giv

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket