`
`FIRST l\Nv!E
`
`INVENTOR(s)
`M.I.
`
`I Docket No. 02163-0120P
`
`RESIDENCE (city & enhcr state or foreign country)
`
`PROVISIONAL APPLICATION FOR PATENT COVER SHEET
`/
`This is a request for filing a PROVISIONAL APPLICATION FOR PATENT under 37 CFR 1.53(b)(2)
`===co
`=CJ""\
`1~STNAME
`
`u
`
`lfARNHILL
`
`STEPHEN
`
`D.
`
`19 Mad Turkey Crossing
`Savannah, Georgia 31411
`
`TITLE OF THE INVENTION (280 characters max)
`
`A METHOD FOR DISCOVERING KNOWLEDGE USING SUPPORT VECTOR MACHINES
`
`CORRESPONDENCE ADDRESS
`
`Attn: James Dean Johnson
`
`JONES & ASKEW, LLP
`37th Floor
`191 Peachtree Street
`'/ Atlanta, Georgia 30303-1769
`
`...
`. '
`
`:;,,._· -~m Specification Number of PaReS 6
`
`ENCLOSED APPLICATION PARTS (check all that apply)
`D Small Entity Statement
`D Other (specify)
`
`Number of Sheets
`
`.·~•-:
`,,,□
`-· :.
`
`-_z-•
`
`Drawing(s)
`
`'D
`
`:·;}
`!Ji
`..
`---
`
`Provisional Application Filing Fee
`
`METHOD OF PAYMENT
`
`·"'~ ~□ A check 11, enclosed to cover the Provisional Application filing fee.
`
`FILING FEE:$
`
`The invention was not made by an agency of the U.S. Government nor under a contract with an agency of the U.S. Government.
`
`Respectfully '\"l\~cd,
`
`SIGNATURE ~-fr-~'-~
`
`.J\ l:\l\
`
`Date: --~M=a..,_y~l~, =19~9'--"8,__ ___ _
`
`TYPED OR PRJNTED NAME: Marv Anthony Merchant, Ph.D.
`
`Reg. :'-Jo. -~3~9~7~7=1 ______ _
`
`D Additional inventors are bemg named on separately numbered sheets attached hereto
`
`"Express Mail" Mailing Label number EMS15355193US
`
`Date of Deposit: May 1, 1998
`
`INTEL-1086
`U.S. Patent 7,117,188
`
`
`
`05/01/98
`
`15:44
`
`'5'912 352 0959
`
`HT! SAVANNAH
`
`iaioo2
`
`A Method for Discovering Knowledge
`Using Support Vector Machines.
`
`Inventor - Ste;pben D, Barnhill, M.D.
`
`Technical Field;
`
`relates co methods for: e:m:acting desired data from databases. More
`The present invention
`patticululy, the present invention relates to a method for em-acting desned data from databases
`from generated and collected sets of data, large or small,. relating co humans, animals, viruses, and
`bacteria, a.,; well as accounting data, stock and commodity roam.et data, and insunnc~ data, in ordrr(cid:173)
`to effectively classify subgroups by virtue of a con:qmtariooal index (Compulkx:nt#). The pres<::t\.t'
`mvention creates an effective method for multi-dimensional function estimation aod resulting
`wmp~™ th.at an he applied to a wide i:ange of problems including pattern recognition.
`function approximati.o~ regression estimation. tll(.>lecular patterning. proportionality estimations and
`signal pi:ocessi.og.
`
`1be present invention further: relates to a computer assisted memod for classifymg subgro~
`utikl:i.ng pi:e-processed ID.telJiDataTlft (Intelligent D.i.a. created by pro-processing techniques utilized
`specifically as part of the present in:vention). The IotelliDma™ tS then t:ntcred mto one or mote
`Support Veaoc Machines which geneates an optimal hyperplane algori:cbm. This optimal
`hyperplane a.lgoothm is then converted by one or more post-processing steps into a CompuDex'Dtl
`(a single valued computationally derived numerical classifier) fot uiterpretation by a ho.man.
`Ill
`sumrnaty, the present wventio:o begins w:ith raw da&i. and usmg support vector macliines the"
`concludes with a single valued computationally derived numerical classifier ready for huma~
`intetpretation.
`
`Jn the prefecred embodixnent of the presem invcsition, the method is used to classify individual
`subgroups. based on pattcm recognition techniques, from any combination of raw data. Examples
`of the usefulness of this procedure could be demonsttated m 1.) genetics in general and the geno~
`project specifu:ally, 2.) ruagoostics, 3.) evaluation of managed ca:re efficiency, 4.) theai.peu6c
`decisions and follow up, 5.) appropriate therapeutic triage, 6.) pharmaceutical development
`techniques, 7.) discovery of molecular structures, 8.) prognostic evaluations, 9.) medical uifomiatics,
`10.) billiog .&a.ucl, 11.) iaventoty control. and 12.) i.-tock evaluations and predictions, 13.) cQmmodty
`~"3.luations and predictions, and 14.) insu:cmce probability estimates.
`
`In another preferred embodiment, the invention includes a system co receive data from a remote data
`tnnsmittiDg station for processing through the invention and uansm.it the results to the same or
`some other remote data receiving station.
`
`Backgtopnd of the Invention:
`
`Knowledge discovery is the most desirable end-product of data collection. The last decade has
`brought fot'IVard an explosive growth in our capabilities to borh geoeci.te, collect, and store Vl6t
`amounts of data. While database technology has provided the basic tools for: the efficient storage
`and collection of luge data sets, the issue of how to help humans understand and analyze huge
`bodies of data remains a difficult and l.lllSolved in:oblan.1 In oroer to deal with this data glut, a ~w
`genention of tntclligent tools for automated knowledge is needcd.2
`
`
`
`05/01/98
`
`15:45
`
`'fS"912 352 0959
`
`HTI SAVANNAH
`
`~003
`
`For eumple, there are huge scientific data.bases such a.'i in the Human Genome Pro~ct whieh
`include gigabytes of data on the human genetic code and much more is expected) Such volumes of
`data clearly overwhclm the ttaditiooal manual methods of datll analysic;, such as spread sheets and ad.
`hoc queries. 'l 'hose methods can create in.fo!lllarive repons fcom data, but do not have the capacity
`to wscover the knowledge contlined in the data. A significant need exists for a new generation of
`tecbn.iqucs and tools with the ability to intelligently and automatically assist hwnans in analyzing ere
`mountains of data and finding paner:ns of useful knowledge. 4
`
`Likewise, using traditionally accepted reference ranges and standal'ds for intcipretarioo, it is often
`-impossible for humans to identify panems of useful knowledge C.'Vl.-n with very struJt amounts of
`data.
`
`1bis invention in part utilizes Support Vector Machines. The Support Vector Machine implemenb
`the following idea: it maps the input vectors into high dimensional feature space through non-lint:ar
`mapping. chosen (I priori. In this wgh dimeosinnal fea~ space, an optimal sepai:at:ing hyperplane ls
`com,1:ructed. 'Thi..-. Optimal Hyperplane Classifiet Algorithm sepaates the vanous classes of interests.
`
`The dimeosioaally of the feature space will be will be huge. Fot example, to construct a polynonui,l
`of degree 4 or 5 in a 200 dimensional space it is nc..>cessny to consttuct hypetplaoes in a billic:a
`dimensional feanu:c space. This CUf'Se of dimensionality can be solved by constructing the Optimal
`hJpetpla:oe. If it happens rhat in the high dimensional input space one can coDSttUct a sep~
`hyperplane~ a small value, the VC dimension of the cor:esponding element of the structure wilt
`be small, and therefore the gene:nliziti.on ability of the constructed hypetplaoe will. be high.
`
`If the training veetoi:s aJ:C scpuated by the Optimal hypetpbne (or: geoenl:i:red Optimal hypet"_Pla.ne),
`then the expectati011 value of the probability of committing an euor on a test example is bounded "t,
`the examples in the training set.
`
`This bound depends neither oa the dimensionafuy of the space. nor on the aoan of the vector cf
`coccfici.cnts, nor on the bound of the nollll of the input vectors. Therefore, if the Optirre.l
`hyperplane can be constructed from a small number of support vectors relative to the training set
`size,. the generalization ability will be b.igh-even in infinite...dimconal sp:ace.
`
`The problems with Back-Pl'opagation Neural Netwoi:k approach such as.:
`
`1.) The empirical ri.<ik fuctional ha,; many local roio:ima StlWdatd optimization pi:oceech.u:e.s
`guarantee converge.ace to one of them. The quality of the obtained solution depends on
`many &er.ors, in particu1ar oo. the initia]j,ation of the weight matrices.
`2.) The convergence of the gtadient based method is tathet slow.
`3,) The sigmoid function has a scaling &ao,: which affects
`approximation.
`
`the quality of tl\e
`
`prevent neural netwods from bemg wcl1 controlled lea.ming machines.
`
`These shortcomings of neural netwot'ks are overcome using Support Vector: Machines by
`con:;trucri.ng the Optimal hypes.plane. Support Vector Machines a.re descn'bed in detail in Tltte
`Nature of Statisdcal Leaming Theory by Vladimir Vapnik and arc incorporated herein by
`rcfc:r:cnce in their entirety.
`
`
`
`05/01/98
`
`15:46
`
`'5'912 352 0959
`
`HTI SAVANNAH
`
`~004
`
`Summary of Invention:
`
`The present invention is an appa.r:a:tus and a pi:ocess for classifying subgroups, based on pattcu\
`recognition techniques, utifuring hxtelliData ™ (Intelligent Data created by pre-processing
`tech.oiques utilized specifically as part of the present invention). The .lD.tclJi.Data™ is then entered.
`into one or more Support Vector Machines which generates an optimal hyperplane algorithm. TIU$
`optimal hyperplane algorithm is then converted by one or more poi."t-prncessmg steps into a
`CompuDex™ (a single valued computationally derived numerical classifier) for intetpretation by a
`bw:nao.
`
`Gener:a.lly, this objective is accomplished by perfototing the following sreps:
`
`1. Collect data in it's original and/ or natt.ml form.
`2. Optionally apply expert medical pre-processing techniques to derive MediDsta r~ !
`(Daia derived from applying expert medical infonnation to mw data).
`to derive
`3. Optionally apply mathematical (computational) pre•pi:ocessing
`techniques
`CompuDatanl! (Data derived from applying mathematical (computational) infotmat on to
`ttwdata).
`4. Combine the results of MediDara™and Comp.u.Data™with the original raw data to cceate
`IntelliData~
`5. Utilize the created .lD.telliData™as input into one or more support vector macwnes.
`6. Genea.te an opt:imal hyperplane classi6.ei- algocithro.
`7. Apply mathematical post-processing techniques to create a CompuDex™ result.
`
`StU) 1- Creation of Intelligent Data - IntelllDara™
`
`MerliDsta7'M
`(Expert Medical
`Pre-processed l>fna)'
`
`Raw Data - , ; . . . - - - - - - - - - - - -"--... IntelliDaraTM
`
`/g:~t
`
`Compu.Data™ /
`(Comput.atiomlly
`Prc-pl'QCCfiCd Data)
`
`
`
`05/01/98
`
`15:47
`
`'0'912 352 0959
`
`HTI SAVANNAH
`
`141005
`
`S1ep 2 - Determinatioo or the Optimal Hyperplane Qassifier Algoridu:n
`
`IatdliDats™ ----•►Support Ve1..'t()r
`Machine
`(Tn""11is,mt Dau)
`
`Optimal Hyperplane
`----•► Classifkf Algorithm
`Determination
`
`Step 3 - Creation of a Mathematical CompuDex™ (CompUJlltiooal lodex)
`for Human Interpretation
`
`Opt.imal Hyperplane
`Classifier Algorithm
`Determination
`
`..
`
`Post-processing
`Techniques
`
`Comp~
`---1)J► (Comput2tional
`Index)
`
`More detul perloani:og the steps in the invCDtion is as follows:
`
`L Collect dara in it"s original and/ or uanual form.
`
`This initial step involves generating and collecting any given set of data that may
`contain uifor:marion which is not immedia.tely apparent and needs to be evaluated to
`identify any pat:tem.."I of useful knowledge.
`
`2. OptiOJWly apply expert medical pre-processing
`MediDara™.
`This step actually creau:s an additional new set of input data fmput vectors).
`
`technique,.;
`
`to derive.
`
`This nex.t step in the -invention involV1.:s the option of application of expert medical
`pre-procesRiog techniques to the raw data to create an additional set of input data
`known as MediDsn:a™. Examples of expert medical pre-processing steps include.
`but arc not limited to the following:
`A Association with known stancw:d -refetence ranges.
`B. Physiologic Tru:ocat:i.on
`C. Physiologic Combination.-;
`D. Biochemical Combinations
`E. Application of Heuristic Rules
`P. Diagnostic Criteria Determinations
`G. Clinical Weighting Systems
`H. Diagnostic T mnsfonnations
`T. Clinical Tcan.sfoi:mations
`J. Application of:Expen Knowledge
`K
`l.abeliog Techniques
`L- Application of other Domain Knowledge
`M Bayesian Netwo"Ck .Knowledge
`
`3. Optionally apply mathematical (computational) pre-processing techniques 10
`derive CoDJpu.DataT~
`This step actually creates an additional new aet of input data (input vectors).
`
`This next step in the invention involves the option of application of matheroatic;al
`(compuutional) pr:c-processing techniques to the a.w data to create an additional set
`
`
`
`05/01/98
`
`15:47
`
`'a'912 352 0959
`
`HTI SAVANNAH
`
`la! 006
`
`E.r.amples of mathematical
`of input data known as CompuData™.
`(computational) pre-processing steps include but ace not limited t0 the following:
`A. l.Abelrng
`.B. Buwy Coo~o
`C. Logatitbmic T nosformation
`D. Sine 'f tanSformation
`E. Cosine Tr.i.nsfomutioo
`F. Tmgent T tan..'lfo1l'Dation
`G. Cot2.ngent Tonsfoanati.on
`H. Clustmtig
`I. Sw:nmarization
`J. Scaling
`K.. Probabilistic Analysis
`L. Sigoifa:ance Testing
`M. Streogth Testing
`N. Sea«:h for 2-D Regularities
`O. Identify Equivalence Relations
`P. Apply Contingency Tables
`Q. Apply Gt:aph Theoty Principles
`R. Create Vectorizing Maps
`S. Multiplication
`T. Division
`U. Addition
`V. Subttact:ioo
`W. Application of Polyoo.mia.l Eqmtions
`X. Application. of Basic and Complex Sta.1ist:i.cs
`Y. Identify Proportionality's
`Z. Discriminatory Power Dctcnnination
`AA. Apply Combinations of the Above Simultaneously
`
`4. Combine the results of Med.iDara™ and CompuData™ with the original raw data to
`create lntdliDars'TN.
`This step actually creates an additional new set of input data (input vectors).
`
`This :,1:cp of the invention combines tbe attribute.,; (vectors) of the -caw data, the
`MediData711and the CompuData™to create an additional new set of input data (input
`vectors) called Intr:JliData 1111
`to be f-ed into the Support Vector Mac:bint."S for high
`dimensional computation and mapping.
`
`S. Utilize the created lntelliDara™ as input into one or more support vector
`madiines.
`
`'I 'his step of the invention utilizes the origmal ra.w data (vectors) along with the ,,p~n of
`using newly created vectors MediData™, CompuDslil.™, and l.D.tcJJiDsua.TIW tO assist
`in providing smarter data to the Support Vector Machine to allow for benec compw:arion
`and high dimensional mapping in the creation of the optimal hypa:plane aJgori.thm
`
`6. Generate an optimal hyperplane classifier algorithm
`
`This step of the invention uses one or more SuppOt't Vector Machines ro detei:m'111e the
`optimal bypetplane classi.ties: algorithm.. Tiie ke:aw of the Support Vector Macl\ne can
`
`
`
`05/01/98
`
`15:48
`
`'5'912 352 0959
`
`HTI SAVANNAH
`
`141007
`
`be a polynomial ke:nw, a rad-ial bias classifier: kerrutl., a neural ru.1:work kemal, or my orhet'
`type of kemal that satis6cs the Mercer Condition.
`
`'fo conSttUCt the Optimal Hypeiplane. one has to sepatate the vectors of the ~ng set
`belonging to two diffetent classes u.'ling the hyperplane with the smallest nctttn of
`coefficients. 1be Suppnrt Vecto:c Machine implement.-. the following idc.-a: it maps the
`input ~caors into high dimensional feature space through some non-linear mapping.
`In this space, an aptima1 sepai:ating hyperplane is constructed. Tbjs
`chosen a priori.
`Oprimal Hyperplane Classifier A1gocithm separates the V3r.ious classes of inte:r:est.
`
`7. Apply mathematical post-processing aecbniques ro create a Co.DlpUDexrM result.
`
`'!'his st<..'P of the iovention takc.s the Optimal Hypetplme C1a.ssifiet: Algorithm and.
`optionally apply post-pi:oc:essing techniques to create a CompuDcx™ ( a
`computational uidex) which can then be interpreted by a human. Examples or
`Post-processing steps include but are not limited to the following;
`
`A. Reference Range Detconinations
`B. Seiling Techniques (linear and llOll-1.inear)
`C. T ansfounations (lineac and non-linear)
`D. Pcobability &"1imations
`
`Conclusioni
`
`TbiJJ mventioD is a successful met:bod fo.c ext.racring and .t:alUlipu/Jltinc data in data. se't6, 1a:rge
`or small., usiDg data pm-processing steps to create lnte/JiData™ {illr:eDigeJ:Jt daia),. wbicb i.s
`the:a aaalyzed in high dil1'JetJSional space using one or more support vecmr mQcbines; ml!.
`.results of wbicb are tbea sabjected to posc•processiag areps to crnare a single v:alue&
`numerical clBBsi.ier, CompuDa™ (a computational index),, wbicb can the.a be easiJf
`interpreted by a human.
`
`