`
`The negative microfilm copy of this dissertation was prepared and inspected by the
`school granting the degree. We are using this film without further inspection or
`change. If there are any questions about the content. please write directly to the
`s~hool. The quality of this reproduction is heavily dependent upon the quality of
`the original material.
`
`The foiiowing explanation of techniques is provided to help clarify notations which
`may appear on this reproduction.
`
`1. Manuscripts may not always be ccmp!!;'te. \Vhen it is not possible to obtain
`missing pages. a note appears to indicate this.
`
`2. When copyrighted materials are removed from the manuscript. a note ap(cid:173)
`pears to indicate this.
`
`3. Oversize materials (maps. draWings. and charts) are photographed by sec(cid:173)
`tioning the original. beginning at the upper left han.d corner and continu(cid:173)
`ing from left to right in equal sections with small overlaps.
`
`4. Most photographs reproduce acceptably on positive microfilm or micro(cid:173)
`fiche but lack clarity on xerographic copies made from the microfilm. For
`any illustrations that cannot be reproduced satisfactorily by xerography.
`photographic prints can be purchased at additional cost and tipped into
`your xerographic copy. Requests can be made to the Dissertations Cus(cid:173)
`tomer Services Department.
`
`UMI Dissertation
`
`I nformation Service
`•
`•
`University Microfilms International
`A Bell & Howell Information Company
`300 N. Zeeb Road, Ann Arbor, Michigan 48106
`
`001
`
`Facebook Inc. Ex. 1209
`
`
`
`.,
`
`"
`
`,
`\
`
`'.
`
`\
`\
`
`‘\
`
`002
`Facebook Inc. EX. 1209
`
`002
`
`Facebook Inc. Ex. 1209
`
`
`
`Order Number 8328584
`
`EXTENDING TII..E BOOLEAN AND VECTOR SPACE MODELS OF
`INFORMATION RETRIEVAL WITH P·NORM QUERIES A_ND
`MULTIPLE CONCEPT TYPES
`
`Fox, Edward Alan, Ph.D.
`
`Com::!!! University, 1983
`
`Copyright ©1983 by Fox, Edward A!an All rights reserved.
`
`1J·1\Il ·1
`
`300 N. Zeeb Rd.
`Ann Arbor, MI48106
`
`003
`
`Facebook Inc. Ex. 1209
`
`
`
`004
`
`Facebook Inc. EX. 1209
`
`004
`
`Facebook Inc. Ex. 1209
`
`
`
`NOTE TO USERS
`
`THE ORIGINAL DOCIJMENT RECEIVED BY V.M.I. CONTAINED PAGES WITH
`BLACK:MARKS AND POOR PRINT. PAGES WERE FILMED AS RECEIVED.
`
`TH!S REPRODUCTION IS THE BEST AVATLARLE COPY.
`
`- - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - -
`
`005
`
`Facebook Inc. Ex. 1209
`
`
`
`006
`
`Facebook Inc. EX. 1209
`
`006
`
`Facebook Inc. Ex. 1209
`
`
`
`EXTENDING THE BOOLEAN AND VECTOR SPACE MODELS
`
`OF INFORMATJO~ RETRIEVAL W!TH
`
`?-NORM QUERIES AND MULTIPLE CONCEPT TYPES
`
`A Thesis
`
`Presented to the Faculty or t.he Graduate School
`or Cornell UniYer'5ity
`
`in Partial Fulfillment oi the Requirements ror the Degree or
`
`Doctor or Philosophy
`
`by
`
`Edward Alan Fox
`
`August 1983
`
`007
`
`Facebook Inc. Ex. 1209
`
`
`
`008
`
`Facebook Inc. EX. 1209
`
`008
`
`Facebook Inc. Ex. 1209
`
`
`
`Biosr&pblcal Sketch
`
`Ed.ard Alu Fox ,as Dons CD Mal 14, 19S0 o~ Lo=s Wed. New York ud eom(cid:173)
`
`p!E:~ bis bi,h sebool edaeatiOD tbere a\ George W. He<;;,gr. Higla School. He reeeiTed t.he
`
`ID5tit.ate cr TeebDoIOQ. He e!!~e:e: eh~ :;raduate sehool of CoraeD UDinrsit.1 iza Sep(cid:173)
`
`tember. 1978 cd !"eCeived tblr d~~ or,·M::.5w of ScieDce iD Compuw Scieuc:e iD 19s1. II!
`
`August 1982 he begaD wor:k as the Macager of InformatioD S.rsteJDI at ~he International
`
`Icstit.ute of Tropical Asricult.are. IbadaD, NigeriL ReturDiD~ to t.he U.S. ill 1983, he
`
`aceepted ~he post. cf ~is~t. Proressor in the DepanmeD~ elf Computer ScieDce a& Vlfgicia
`?olyteebic Inst.itute and State Uaiversit.y. He w a mem~r of t.he Ass~iatioD lor Compu~
`
`iDe Mac:biDel'1, t.he AQer1CID Societ.y ror IDformat.auD Science, ud t.b Society or t.he Sigma
`
`Xi.
`
`iii
`
`009
`
`Facebook Inc. Ex. 1209
`
`
`
`TO CAROL, JEFFREY AND GREGORY
`
`010
`
`Facebook Inc. Ex. 1209
`
`
`
`Gcrard SaltOD, my thesis ac:IYisor, helped pioneer tbe deftIopmestt. of ntolDalie index ..
`
`iog a::d iororm~tioD retrre,,:aI systems. M.M. Kessler, ~ 1UtCie?padu&e ~is :admC&'.
`
`dCl'eJoped
`
`thl" ElOtioD of bibliop-zphic eoupliDI bet"em doeumcllb, .fUela vas JsI.er
`
`cxteaded ~1 Heary Small to make ~ of c~i~atioDl u a lDe=13re 01 similarity. Han'J Wv.
`propo5ed the ~DOI1D medel aa a ce~i=ioD or the BcoIeaa aDd ftdor space modeb cf
`
`iaform2tioD retrieval.
`
`Thls thesis is e attempt ~ iDt.egrUe thee contributioDs in~ It. wo,.!table framework
`toa&. cu be or practical value to QSerI or future icform~!c= mriev1d tr,1!Jt.e::a. Ae 'GpdateG
`version of the SMART ",tell': was ~ ror experimel!tai nflCbtio= or ideas propcseci, and
`hopeful!y .i11 cootiDue to be 3 ~ bed for imp!'Cyeci lDethodi of iDtormatioa reCrieval.
`
`v
`
`--- ----- - - - - - -
`
`011
`
`Facebook Inc. Ex. 1209
`
`
`
`Ack!3ow1edsmeat.
`
`I would like to upr~ my profound uatitude to IDY thesis adyi"r, pror~ Gerard
`
`Saltoo, for years of support. ~idance, kindness, and p:.f.:e:t =sistaDce. i abo wish to espeo(cid:173)
`
`cially tbank tbe (\tber two members of my special committee: Professor Joseph E. Grimes
`
`(or mao! discussions about lin~isties and information retrieY3i. and Professor Fred
`
`Scbot'ider for teachin~ me the nlue of theoretical models in the s~te!!!! ~ of computer
`
`sci~ot.e. All tbrt'e were very understandic!; about complications lrel~tin~ to c:y mOTe to ::ODd
`
`returo (rom Nig~ria, and about the burden o~ tbeir helpi~g finalize this Yay IO:l~ tbesis. 10
`
`addit;on, I would :ike to exp~ ·m,. appreei:tiu;:; to pror~r 5:ally McC~nllelI-Ci:let wbo
`bl'!lpcod lotroduce me to the complexity or Ih;guistie ~maDti~ Md rel'~nted Professor
`
`Grimt-S duriog his sabbatical.
`
`Maay people and organizatio!lS provided dat,2l..for ex~i:ne~tal stuc;e. Document col(cid:173)
`!~ct:ons w'!re supplied by Jeff Pac:he or INSf>EC - The Institution of E!ectrical En~ineers ia
`
`England. Robert Dat.tola at. XerOl: ·Corporation. and Henry Small of ISI@. the Institute for
`
`Scieot.ific lorormatino@. Tbe !NSPEC query collection was proTided by William FraJ::e5 at
`
`Syracuse tJoivt'r.;ity Many ina~viduals contributed qt~erje5. My wire Carol helped identiry
`
`the citatioD"! among CACM ariicio. Jiil Warner finished loo'~ing up ~~d !.h~!! typed the
`CAC~i citations data as well as most or the !SI abstracts.
`
`Past and present members or the SMART project assisted with program developmeot,
`
`coll('ctioo prt'paralion, ht'lprul discussions, aDd witb the conduct. or some of my eXPf'ri~
`
`ments. Class projects by Bob IhJ'~r, Andy IbDUeheyslcy aDd Alison Brown hel~ ,..itb
`
`dat3 and prograM prepal'3tion. Particular thanks :;0 to Alex Yeh for work OD earl,. prot~
`
`type prObf3ms, George Bennet and Conrad Cady ror data aC3!ysis, SU5~C MaDDing (or
`
`VI
`
`-~- - - - - - - - . ~------ - - - -
`
`012
`
`Facebook Inc. Ex. 1209
`
`
`
`preparing computer graphs, Bruno Wolt ror prootreaciiI:g, Pauline Tz'o for relevance judg(cid:173)
`
`ments, ~ork on SMART statistical programs, and the fanning or experiments. Elena
`
`Siefrid, SMART stair programmer, deserves thanks ror her pat.ience as well as praise tor
`
`dedicated aid oi aii aspects oi the research work. My grateful 2,ppreciat!on goes to Chris
`
`Buckley and Ellen Voorhees tor several years of thoughttul coIiaboration and especially ror
`
`their suggestions and aid during the la.st year of my thesis preparation.
`
`The final typing and production of this thesis would not have been possible without
`
`the assistance of anum her or people in the Cornell Department of Comp~lter Science. Geri
`
`Pinkham, SMART project secretary, gave ongoing assistance to the group on many
`
`matters. Aid from outside was also valuable; Jeff Kunze formatted Appendix Band
`
`Richard Kinch helped with d=ota transfer problems.
`
`The National Science Foundation, through grant 1ST 80-17589, provided support for
`
`these studies, and through grants to Cornell Unive.;-sit;r supplied the computer facilities
`
`which made ali the experiments possible.
`
`Finally, I wish to express my heart felt thanks to my wife and children, ror encourag(cid:173)
`
`ing me and sacrificing in so many ways.
`
`vii
`
`013
`
`Facebook Inc. Ex. 1209
`
`
`
`Table 01 Contents
`
`Preface ..........................................................................................................................
`
`Acknowledgment!: ..........................................................................................................
`
`1. INTRODUCTION ......••.....••.•...•..•••..•.•..•..•..••.•.• ,.......................................................
`
`1.1. Approaches to Information Retrieval .........•••.......•..•..••••...•••..•••.....••.•••...•..•.•••.......
`
`1.1.1. Database Systems as Background ..........•..•.....•.•......•••.•.•••.•..•......•.....•.•..••.••..••..•
`
`1.1.2. Question Answering Systems ...........•..•.•....••.............••.•.••••.•.••......•••.•..••...•.....••...
`
`1.1.3. Textual Information Retrieval Systems .....•..........•...••... <00..................................
`
`1.1.4. Boolean Logic Implementations .................•................•.•..•...•...•.•.•...••....•........•..•.
`
`1.1.5. Vector Sp2.ce Model .............................................................................................
`
`1.1.6. Need ror General!zation ...•.......•.•••...•..•..•...•••........•....••..•••••..•.•••.•.•.•....•...•.....•....•
`
`y
`
`vi
`
`1
`
`3
`
`3
`
`4
`
`5
`
`6
`
`7
`
`7
`
`1.2. Current Approaches and Limitations ............•...................................•....................
`
`. 7
`
`1.2.1. Boolean Systems
`
`1.2.1.1. Ayailability ........................................................................................................
`
`1.2.1.2. Indexing .............................................................................................................
`
`1.2.1.3. inverted Files ....................................................................................................
`
`1.2.1.4. Query Formulation ..................................................•....•.•.....................•...•...•...
`
`1.2.1.5. Presentation or Results .....................................................................................
`
`1.2.1.6. Boolean Logic ........................................................................... ,........................
`
`1.2.2. Vector Mo:iels ...............................................................•.....•................................
`
`1.2.2.1. Automatic Ill<1exing ..........................•.....................•.........................................
`
`1.2.2.2. Ranked Retrieval ..............................................................................................
`
`7
`
`7
`
`8
`
`9
`
`10
`
`10
`
`11
`
`12
`
`12
`
`15
`
`viii
`
`014
`
`Facebook Inc. Ex. 1209
`
`
`
`1.2.2.3. Limitations .........................................................................................................
`
`1.3. Value or Bibl:cgraphic and Factug} In!oo-matioD •..••..•. ,..........................................
`
`1.4. Extensions to Earlier Approaches ••.•.•....•.•..•••.•••....•.••••••..••.•••......•••••..•••••••••••••..••.•
`
`1.5. Experimental Approach .....•..•.••...••.••.•...••.•....•••..••••••..•••••..•••.•••••.•.•••••••••••••••.•.••••. _
`
`1.6. Thesis Outline .........................................................................................................
`
`2. P-NORM QUERIES: THEORY ...•.•.....•...••....••......•.•..• "'...........................................
`
`2.1. Wu's Early Development ........•.............................•......••.•..•••.•.........•..•••••...•.••........
`
`2.1.1. Background ..........................................................................................................
`
`2.1.2. Origins ..............................•.••.......•..•..........•......•.•..•••..•.•••.........•....•••••.••..•....••....••
`
`2.1.3. Definitions ............................................................................................................
`
`2.2. Equal Similarity Contours ......................................................................................
`
`2.2.1. Ex am pIes ............................•.•••......•....•.....•........••.•.....•.•.••.....•..•..•....•.••..•...•.........
`
`2.2.2. Comments on P-Norm Contours ...................... 0 . . . . " ' · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2.3. Ranking Behavior of the P-Norm Similarity Function ..........................................
`
`2.3.1. Introduct:on .........................................................................................................
`
`2.3.2. Binary Query Weights in Two Term Queries .....................................................
`
`2.3.3. Binary Document Weights ...................................................................................
`
`2.3.4. General Case ........................................................................................................
`
`2.3.5. Key Result for Binary Query Weights ................................................................
`
`2.3.6. ~1ulti-Level Vectors .............................................................................................
`
`3. P-NORM IMPROVE~1ENTS OF EXISTING QUERIES ........................................
`
`3.1. Preliminaries ............. ..............................................................................................
`
`is
`
`16
`
`18
`
`19
`
`21
`
`22
`
`22
`
`22
`
`24
`
`25
`
`T1
`
`2:1
`
`32
`
`34
`
`34
`
`34
`
`35
`
`38
`
`39
`
`41
`
`43
`
`43
`
`015
`
`Facebook Inc. Ex. 1209
`
`
`
`3.! .1. Docu ment. Weighting Method ............................ ', .... ,............................................
`
`3.1.2. Query Weights .....................................................................................................
`
`3.1.3. Weighting Cases of Interest ....................................... ,.........................................
`
`3.2. Medlars Ex periments ............................... ,..............................................................
`
`3.2.1. ColI,··. tion ...............................................................................................................
`
`3.2.2. Recall·PrecisioIl Charts ........................................................................................
`
`3.2.3. Com parisons .........................................................................................................
`
`3.2.4. Detailed Analys:'~ of Weighting aJ:d PuNorm Variations .....................................
`
`3.3. INSPEC Experiments
`
`3.3.1. Collection .............................................................................................................
`
`3.3.2. Com parisons .........................................................................................................
`
`3.4. ACM Ex periments ............................................... "..................................................
`
`3.4.1. Collection .............................................................................................................
`
`3.4.2. Comparisons
`
`3.5. lSI Experiments
`
`3.5.1. Collection ...................................... ", ....... ",..........................................................
`
`3.5.2. Com parisons ................................................. ............... .........................................
`
`3.5.3. Lexicai Reiation Experiment ...............................................................................
`
`3.6. Summary and Conclusions ......................................................................................
`
`3.6.1. Principal Results (or All Collectio!!S ....................................................................
`
`3.6.2. Discussion .............................................................................................................
`
`4. AUTOMATIC P-NORM QUERY CONSTRUCTION ............................................
`
`x
`
`44
`
`46
`
`48
`
`49
`
`49
`
`50
`
`54
`
`58
`
`63
`
`64
`
`66
`
`66
`
`67
`
`67
`
`67
`
`68
`
`68
`
`7S
`
`7S
`
`78
`
`81
`
`016
`
`Facebook Inc. Ex. 1209
`
`
`
`4.1. Introduction ...................................•...•.....••.....•....••.•.•••..••.•...•........••••••.••.•.•..........••
`
`4.2. Background - Strategies tor Searching ...........•.........•....•..•.••.••..•.••..•••••••.•••.•.••.••..••
`
`4.3. Frequency Range Based Construction
`
`4.3.1. Term Discrimination Theory Application ...........................................................
`
`4.3.2. Frequency Range M(;dlars Experimeuts ..............................................................
`
`4.3.2.1. Example ............................................................................................................
`
`4.3.2.2. Ex peri men tal Design .......................•......•.......•..•.•....••..........•....•...•.••......•.••....••
`
`4.3.2.3. Results ....................................................•.....•.••.......•.•...............•..•.•••................
`
`4.3.2.4. Conclusions
`
`4.3.3. Frequency Range Summary
`
`4.4. Disjunctive Normal Form Automatic Queries
`
`4.4.1. Theory and Method
`
`4.4.1.1. Sort .......................................................••...•.•..••.••.•••.••...........•...••...............•......
`
`4.4.1.2. Narrow
`
`4.4.2. Sort Method Resuits
`
`4.4.2.1. Medlars Collection ............................................................................................ .
`
`4.4.2.2. INSPEC Collection .......................................................................................... ..
`
`4.5. Summary and Conclusions
`
`5. AUTOMATIC BOOLEAN AND P-NORM FEEDBACK
`
`5.1. Introduction ........................................................................................................... .
`
`5.2. Previous Research .................................................................................................. .
`
`5.2.1. Mixing Vectors and Boolean Queries .................................................................. .
`
`xi
`
`&1
`
`82
`
`8S
`
`85
`
`91
`
`91
`
`92
`
`9S
`
`103
`
`104
`
`106
`
`108
`
`112
`
`114
`
`114
`
`114
`
`119
`
`12i
`
`125
`
`126
`
`126
`
`017
`
`Facebook Inc. Ex. 1209
`
`
`
`-_.
`
`5.2.2. Feedback with Single Terms Only ......................••.•....••.•.....•......•..•..•••...•..••.•..•..
`
`127
`
`5.2.3. Automatic Query Adjustment ............................................................................ .
`
`129
`
`5.2.4. Improving the Set or Terms Selected .........•.. "' ............................... ,. ..•.••••.••..•.•..••
`
`132
`
`5.2.5. Prevalence \Veights and Threshold Based Clauses ............................................ .
`
`5.3. Theory and Method ...............•..•..........•........•....•............•....•.........••......•••••••••...•.•..
`
`5.3.1. Obtaining an Initial Query ..•..............•..•.....•..••...•.•..••.•••......•.•......•.....•.•••......•....
`
`5.3.1.1. Problem ............................................................................................................ .
`
`134
`
`138
`
`139
`
`139
`
`5.3.1.2. Solution ............................................................................................................ .
`
`139
`
`5.3.2. Utilization or the Initial Query in ~e'!dback
`
`W.2.1. Problem
`
`5.3.2.2. Solution
`
`I
`5.3.3. Selection of Terms for Feedback Querie5
`
`139
`
`139
`
`140
`
`141
`
`5.3.3.1. Problem ...............................................••....••..•....•.......••........................•............
`
`141
`
`5.3.3.2. Solutioil .............................................................................................................
`
`141
`
`5.3.4. Grouping of Terms into Elementai Clauses
`
`144
`
`5.3.4.1. Problem .............................................................................................................
`
`144
`
`5.3.4.2. Solution
`
`6.3.5. Construction oi Derived Query Qi'
`
`5.3.5.1. Problem ........................................................................................................... ..
`
`5.3.5.2. Solution .........................................................•...................................................
`
`5.3.6. Evaluation of Results ............................................................................ ,. ............ .
`
`5.3.6.1. Problem
`
`xii
`
`145
`
`145
`
`145
`
`145
`
`146
`
`146
`
`018
`
`Facebook Inc. Ex. 1209
`
`
`
`5.3.6.2. Solution .............................................................................................................
`
`146
`
`5.3.7. Basing Retrieved Set on User Wishes ........... ~:.....................................................
`
`147
`
`5.3.7.1. Problem
`
`5.3.7.2. Solution
`
`147
`
`147
`
`5.4. Experimental Resuits ..............................................................................................
`
`148
`
`5.5. Summary and Conclusions .....................................................................................
`
`149
`
`6. EXTENDED VECTORS: MODEL
`
`6.1. 'rerms Only .............................................................................................................
`
`153
`
`i53
`
`6.2. Additional Information - Authors ..........................................................................
`
`154
`
`6.3. Additional Information Based on References ...................................................... '"
`
`159
`
`6.3.1. Previous \Vork with Bibliographic Information ..................................................
`
`159
`
`6.3.2. Example and Definitions ......................................................................................
`
`164
`
`6.4. Submatrices for Reference Information ..................................................................
`
`169
`
`6.4.1. Definitions ............................................................................................................
`
`169
`
`6.4.2. Example: .~ .......................................................................................................... ".
`
`171
`
`6.4.3. Com posite Similarity ...........................................................................................
`
`174
`
`0.5. l"luitipie Concept Types .........................................................................................
`
`177
`
`6.5.1. Rationale ..............................................................................................................
`
`177
`
`6.5.1.1. Similar Method in Other Fields ........................................................................
`
`177
`
`6.5.1.2. Searching r.1any Fields ......................................................................................
`
`178
`
`6.5.1.3. Clarity ...............................................................................................................
`
`178
`
`6.5.1.4. Updating ............................................................................................................
`
`179
`
`xiii
`
`019
`
`Facebook Inc. Ex. 1209
`
`
`
`'-
`
`6.5.1.5. Indexing .............................................................................................................
`
`179
`
`6.5.1.6. \Veighting ..........................................................................................................
`
`179
`
`6.5.1.7. Solution to Vector Problems .............................................................................
`
`180
`
`6.5.2. Model Appiied to lSI and CACM Collections .....................................................
`
`181
`
`' .. CLUSTERING OF EXTENDED VECTORS ..........................................................
`
`183
`
`7.1. Previous Work ........................................................................................................
`
`184
`
`7.1.1. General Clustering ...............................................................................................
`
`184
`
`7.1.2. Clustering for Information Retrieyal ...................................................................
`
`185
`
`7.1.2.1. Number of Attributes .......................................................................................
`
`185
`
`7.1.2.2. Keyword Classifications ....................................................................................
`
`185
`
`7.1.2.3. Single Link Document Classifications ...............................................................
`
`186
`
`7.1.2.4. Other Clustering Approaches ............................................................................
`
`190
`
`7.1.3. Clustering with Bibliographic Data .....................................................................
`
`191
`
`7.2. Algorithms ..............................................................................................................
`
`192
`
`7.2.1. Clustering .............................................................................................................
`
`193
`
`7.2.2. Searching ..............................................................................................................
`
`194
`
`7.3. Small Collection Clustering Examples ...................................................................
`
`195
`
`7.3.1. Clustering the Last 55 Documents ......................................................................
`
`195
`
`7.3.1.1. Input Data .........................................................................................................
`
`195
`
`7.3.1.2. Parameters ........................................................................................................
`
`199
`
`7.3.1:3. Clustering Process .............................................................................................
`
`199
`
`7.3.1.4. Labelling of Clusters .........................................................................................
`
`201
`
`XIV
`
`020
`
`Facebook Inc. Ex. 1209
`
`
`
`7.3.1.5. Discussion of Results .........................................................................................
`
`201
`
`7.3.2. Clustering of Highly Cited Articles .....................................................................
`
`202
`
`7.3.2.1. Choosing Test Set .............................................................................................
`
`202
`
`7.3.2.2. Clustering using CR Categories ........................................................................
`
`205
`
`7.3.2.3. Clustering Using Various Sub vectors ................................................................
`
`213
`
`7 A. CACM Full Collection Clustering ..........................................................................
`
`218
`
`7.4.1. Search Efficiency ..................................................................................................
`
`218
`
`7.4.2. Clustering Experiments .......................................................................................
`
`222
`
`7.4.2.1. Fixed Pa.rameters ...............................................................................................
`
`223
`
`7.4.2.2. Variables ............................................................................................................
`
`224
`
`7.4.2.3. Contrasts ...........................................................................................................
`
`226
`
`7.4.2.4. Test Results ......................................................................................................
`
`227
`
`7.4.2.5. Cluster Run Conclusions ..................................................................................
`
`230
`
`8. FEEDBACK \VITH EXTENDED VECTORS ..........................................................
`
`232
`
`8.1. Previous Work ........................................................................................................
`
`233
`
`8.1.1. Basic Feedback ?l.1odel ..........................................................................................
`
`233
`
`8.1.2. Feedback Evaluation ...........................................................................................
`
`234
`
`8.1.3. 'rerm Relevance F'eedback ...................................................................................
`
`235
`
`8.1.4. Feedback of Extended Vectors ............................................................................
`
`236
`
`8.2. Singie Document Feedb?.ck -.... ", ......... _...................................................................
`
`237
`
`8.2.1. lSI Experiments ...................................................................................................
`
`238
`
`8.2.l.1. Subvectors Used
`
`238
`
`xv
`
`021
`
`Facebook Inc. Ex. 1209
`
`
`
`8.2.1.2. Retrieval Results ...............................................................................................
`
`240
`
`8.2.1.3. Regrc~5ion Based Coefficients ...........................................................................
`
`244
`
`8.2.1.4. Conclu~ions for lSI Document Feedback ..........................................................
`
`247
`
`8.2.2. CAC!l.1 Experiments .............................................................................................
`
`248
`
`8.2.2.1. Single Subyectors ..............................................................................................
`
`248
`
`8.2.2.2. Regression Tests ................................................................................................
`
`249
`
`8.2.2.3. Feedback Results fo. Combinations .................................................................
`
`251
`
`8.2.2.4. Conclusions for CACM Docum~nt Feedback ...................................................
`
`253
`
`8.3. Term Releyance Feedback ......................................................................................
`
`254
`
`8.3.1. Term \\Teight Computation .................................................................................
`
`254
`
`8.3.2. CACM Single Subvector Experiments .................................................................
`
`255
`
`8.3.3. CACM Feedback Results for Combinations
`
`257
`
`8.4. Conclusions ....... ......................................................................................................
`
`259
`
`9. SUMMARY AND CONCLUSIONS ............................................ :.............................
`
`260
`
`9.1. Boolean and P-Norm ~1ethods ...............................................................................
`
`262
`
`9.1.1. Anaiyticai and Graphical Insights .......................................................................
`
`263
`
`9.1.2. P-Norm Validation ...............................................................................................
`
`263
`
`9.1.3. Automatic Boolean and P-Norm Query Construction ........................................
`
`284
`
`9.1.4. Boolean and P-Norm Feedback ............................................................... ~...........
`
`266
`
`9.2. Extended Vectors ....................................................................................................
`
`267
`
`9.2.1. Extended Vector !-'lodel .......................................................................................
`
`268
`
`9.2.2. Extended Vector Clustering
`
`268
`
`XVI
`
`022
`
`Facebook Inc. Ex. 1209
`
`
`
`9.2.3. Ex tended Vector Feedback ..................................................................................
`
`270
`
`9.3. Generalizations and Appiications ...........................................................................
`
`272
`
`9.4. Conclusion ...............................................................................................................
`
`273
`
`A. P-NORM CONTOURS ............................................................................................
`
`274
`
`A.l. Introduction to Figures ..........................................................................................
`
`274
`
`A.2. List of Figures ........................................................................................................
`
`276
`
`A.3. Actual Figures ........................................................................................................
`
`277
`
`B. P-NORM RANKING BEHAVIOR ...........................................................................
`
`289
`
`B.l. Two Terms, Binary Query Weights .......................................................................
`
`290
`
`B.2. Document "'rerms with Constant \Veight ...............................................................
`
`292
`
`B.3. Similarity Computations for Three Peculiar Cases ...............................................
`
`294
`
`BA. Short Proof of OR Query Similarity Ranking .......................................................
`
`297
`
`B.S. Geometric Proof of OR Query Similarity Ranking ...............................................
`
`300
`
`c. COLLECTION CHARACTERISTICS ....................................................................
`
`314
`
`C.l. Overview ........................................................................................ :.......................
`
`314
`
`C.2. Document Collections ..............................................