throbber
The Evaluation of Automatic Retrieval Procedures—
`Selected Test Results Using the SMART System*
`
`The generation of effective methods far the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic documentretrieval system. The design
`
`af the SMART‘system is first briefly reviewed. The docu-
`mentfile, search requests, and other parametersaffecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of
`the
`retrieval performanceare described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design offully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetis
`
`® Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval. Since
`such an evaluation must of necessity take place in an
`experimental
`situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`© Thig study wae sipported by the National Science Foundation under
`Grant GN-245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difficulties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one,
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment ig then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems,
`
`@ The SMART Retrieval System
`SMARTis a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter-based retrieval systems, the SMART system does
`
`American Documentation, Vol. 16, No.3—July 1965
`
`209
`
`1
`1
`
`PETITIONERS - EXHIBIT 1029
`PETITIONERS- EXHIBIT 1029
`
`IPR2022-00217
`
`IPR2022-00217
`
`

`

`not rely on manually assigned keywords or index terma
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase-generating methods and the like,
`in
`order to obtain the content identifications useful for the
`retrieval process.
`then
`are
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further requirements,
`me
`order a reprocessing of the request under new conditions,
`The newoutput can again be examined and the process
`iterated until the right kind and amount of information
`ure retrieved.
`
`SMARTis thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it May serve as a reasonable prototype for
`fully automatic document, retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest* :
`
`(4) u system for separating English words into
`stems and affizes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`—
`
`a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con-
`cepts in the computerliterature, corresponding
`to about 3000 English word stems);
`these
`coneept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its ‘‘sons,” its
`“brothers,” and any of a set of possible cross-
`references; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`*More detailed) deseriptions of the systems organization are included
`in Refs.
`[ and
`Programming aspecta and complete flowcharts are
`presented in Ref, 4.
`
`210
`
`American Documentation —July 1968
`
`2
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of ‘“cri-
`terion” phrases in such a way that the same
`concept numberis assigned to a large number
`of semantically equivalent, but syntactically
`quite different constructions
`(e.g. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`procedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, suffix dictionary,
`concept hierarchy, statistical phrases, and syn-
`tnetic “criterion” phrases).
`
`The operations of the system are built; around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures the similarity between doeuments and seareh
`reuests, the cut-off value which determines the number
`of documents to be extracted as answers to scarch re-
`quests, and the thesaurussize.
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`tion several times, and making judicious changes in the
`analysis procedures between runs, It is this use of the
`SMARTsystem, as an evaluation tool, which is of par-
`(icular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of the
`present
`report,
`
`

`

`Characteristic
`
`Comment
`
`Number of documents in collection.
`
`Document abstracts in the computer field.
`
`Number of search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`}-— 9 relevant documents
`10 = 30 relevant documents
`
`Technical people and studenta
`
`Number of indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Numberof index terms per document.
`
`Varies greatly depending on indexing
`procedure and document,
`
`Numberof relevant. documents per request
`(a) specific
`(b) general.
`
`Numberof retrieved decuments per request. No cut-off is used to separate retrieved fram
`nonretrieved.
`
`Fie, 1, Test Enyironmen|,
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(avernge) 35
`
`(average) 5
`(uverage) 15
`
`405
`
`® The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to he cdeseribed are summarized in Fig. 1, The
`data collection used consists of a set of 405 abstracts*
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requets are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be releyant
`to each reqitest is equal
`to at
`least ten (for the general requests) or is Jess than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in view of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`An excerpt
`from the document collection, as it is
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`® Practical considerations dictated the use of obstracta rather than full
`documents;
`the SMART system os
`such is not
`restricted to the
`manipulation of abstrecis only.
`
`Fig. 3. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods. The documents are
`listed in deereasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fieation.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret, since the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “veetors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 85 different word stems using the word
`stem analysis (labelled “null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different, concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The numberof index terms
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation —July 1965
`
`211
`
`3
`
`

`

`#TEXT 2MICRO=PROGRAMMING «
`
`$41 CRO=PROGRAMMI NG
`$Re Je MERCER (UNIVERSITY OF CALIFORNIA)
`SUeSe GOVe RESe REPTSe VOL 30 PP 71=72{A)
`
`(AUGUST 155 1958) PB 126893
`
`MICRO=PROGRAMMING e« THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`QF SEQUENCES CF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`PULSE TIME IS DESCRIBED e
`
`#TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`STHE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`SMe Me ASTRAHAN (TRAM CORP.)
`SIBM Je RESe AND DEVs. VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS e THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED e LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WOPDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`e HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES e THESE MEMORIES WILL SERVE AS INDEXES TO THE
`MELUGE OF TECHNIC4L LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED e DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`e MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST» LARGE MEMORIES e
`
`Fic. 2. Typical Document Prints.
`
`ANSWERS TN REQUESTS FUR DOCUMENTS ON SPECIFIED TOPICS
`
`SEPTEMBER 28, 1964
`
`PAGE 83
`
`CURRENT REQUEST = ®LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
` SLIST OLFFERNTL EQ NUMERICAL DIGITAL SOLN OF OIFFERENTIAL EQUATIONS
`
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF ORDINARY
`OLFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS » EVALUATE THE VARIOUS INTEGRATION PAOCEODURES (&.G. RUNGE-—
`KUTTA, MILNE-§ METHOD) WITH RESPECT TO ACCURACY, STABILITY» AND SPEED
`
`ANSWER
`BE4STABILITY
`
`CORRELATION
`0.6675
`
`IDENTIFICATION
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`W. Es. MILNE AND BR. R. REYNOLDS (OREGON STATE COLLEGE)
`Js ASSOC. FOR COMPUTING MACH, WOL 6 PP L96-203 (APRIL» 1959)
`
`ANSWER
`
`CORRELATION
`
`IOENTIFICATION
`
`J6SSPMULATIN
`
`0.5758
`
`SIMULATING SECOND-ORDER EQUATIONS
`D, G. CHADWICK (UTAH STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6, 1959)
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`200SOLUTION
`
`0.5663
`
`ANSWER
`
`CORRELATION
`
`3920N COMPUT
`
`0.5506
`
`ANSWER
`
`CORRECATION
`
`3BGFLIMINATI
`
`0.5403
`
`SOLUTEON OF ALGEBRAIC ANO TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`G.N. LANCE (UNIV. OF SOUTHAMPTON)
`Js ASSOC. FOR COMPUTING MACH. WOL 65 PP S7-10Ls JANa, 1959
`IDENTIFICATION
`
`ON COMPUTING RADIATION INTEGRALS
`Re Ce HANSEN (HUGHES AIRCRAFT CO.) t. Le BAILIN (UNIV. OF SOUTWERN
`CALIFORNIA, AND BR. W. RUTISHAUSER (LITTON ENOUSTRIES,
`INC.)
`COMMMUN. ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 L[FEBRUARY, 1959)
`TDENTLFICATION
`
`ELIMINATION OF SPECIAL FUNCTIONS FROM DIFFERENTIAL EQUATIONS
`Js Ep POWERS (UNIV. OF OKLAHOMA)
`COMMUN. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH, 1959)
`
`Fic. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation ~ July 1965
`
`4
`
`

`

`L3ICALC
`176S0L
`356VEL
`
`18
`12
`12
`
`6
`TLEVAL
`LT795TD 12
`357VYAW 4
`
`SEPTEMBER 26, 1964
`PAGE 17
`92DIGI 12
`1B1QUA 24
`3B4TEG 12
`
`REGULAR
`THESAURUS
`
`6
`158ASE
`STCHNG 6
`LOTOGN 3
`HUY
`L2iMEM 4
`149P0G 36
`L76SOL 12
`2i2zSiz i2
`338MCH 8
`SOLURD
`4
`
`4c
`
`JOALPH
`#LacHe
`T2EKEC
`LO6NgU
`11 94uT
`147S¥5
`1é80RC
`2100UT
`3325€F
`41L9GEM
`CoMPUT
`GIVE
`PARTI
`STABIL
`
`4
`L6BASC
`S30ATA 6
`G3aMaP
`&
`108400 12
`130MEK 4
`USBREL 12
`L7esvYm 18
`21600"
`340LET
`3
`5O08ACT
`6
`12) DIFFER
`OIGIT
`12
`12
`=INTEGR 12
`METHOD L2
`12
`PROCEO 12
`RUNGE-
`WARIE
`12
`12)
`USE
`12
`DESIGN
`comPur
`CHARAG
`La COMPUTER
`BAS
`36 DESCRI 12
`nz|Mult
`ENABLE
`FORM
`12
`DIRECT
`ESTIM
`12
`EXPLAL 12
`THESAURUS
`GIVE
`HANDLE
`TLLUST
`12
`INDEPE 12
`INFORM L2
`OPER
`oro
`PLANE
`12
`MACHIN
`12 ORIENT 12
`Pos
`POSS
`PROBLE
`36
`PROGRA
`RECOGN 12
`SCANN
`SIMPLE
`STRUCT 12
`SIZE
`STORE
`12
`TECHNI
`TOWARD
`TRANSF
`12 USING 12
`WAITT
`22
`
`OCCURRENCES OF CONCEPTS AND PHRASES [N DOCUMENTS
`DOCUMENT
`CONCEPT, OCCURS
`BALGOR
`4ExaCcT
`lz
`bloat
`lé
`L43uTl
`249ELI
`2T4O1F
`4205Ta
`SOSAPP
`SLOCAT
`F2REGu
`SSAANT
`930RcR
`b1L20PE
`146/04
`163Ea5
`JBTOIR
`32TAST
`43S01F0
`
`44 43
`
`15
`L2
`46
`
`21NPUT
`a1erF-
`STDSCBH
`STENBL
`LloautT
`L43UTI
`4e2R0F
`1A25av
`274GEM
`S4oJET
`
`OLFFERNTL EO
`
`LA COMPUTER
`
`OLFFERNTL EQ
`
`ACCUR
`QU
`NUMER
`sOLuT
`
`ALGOaT
`EVALU
`ORDIN
`SPEED
`
`920IG1 12
`
`DIFFEMNTL EG
`
`1A CUMPUTER
`
`SEMACT
`1104uUT
`atte *
`2INPUT
`16645C
`S3DATA
`Simap
`108L00
`LIO4EA
`LS6REL
`1TaSv4
`2Lesiz
`302L00
`3465ET
`
`SaLGUA
`143UTT
`2T401F
`3947S
`SLOCAT
`a1LA1T
`570ScB
`BTENBL
`LLvauT
`143uTl
`Le 2Rur
`182$av
`216008
`327AST
`Fso0F)
`
`13CALC
`17650L
`S56VEL
`#286578
`
`1OALPH
`32RE0U
`S9AMNT
`930R0K
`1L20PE
`146/04
`L63EAaS
`La7TOIR
`
`16
`12
`12
`4
`42
`3
`
`6
`18
`12
`
`6
`TlevAL
`LT9STO 12
`357Yau
`4
`SOSAPP
`14CODR 72
`#LMCHO
`68
`TZEMEC
`6
`lo6nQu
`6
`Li9AUT
`A
`L4?Sy¥s 12
`LoeeoRD
`4
`200Da-
`276GEM 18
`338MCH 8
`
`soldao 4 Saaact
`
`&
`
`Fio. 4, Typical Indexing Products for Three Analysis Procedures,
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`paragraphs.
`
`® Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system's ability to retrieve wanted information and
`to reject nonwanted material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`* Precision has also been eniled ‘'releyance,'’ notably in the literature
`of the ASLIB-Cranfield projevt.®
`7 It has, however, been conjectured that an inverse relationship existe
`between recall and precision, such thot high reeall automatically impliea
`low precision and vice veraa.
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A Jarge number of measures have been proposed in the
`past for the evaluation of retrieval performance. Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects verylittle that is rele-
`vant but mayalso retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informa-
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall,
`Ideally,
`one would of course hope for both high recall and high
`precision.
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding,are eliminated
`since all
`indexing operations are automatic;
`
`American Documentation— July 1965
`
`213
`
`5
`
`

`

`search
`the same reason, conventional
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`who originally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a) relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document. is
`assumed either wholly relevant or wholly
`irrelevant (in case of doubt relevance is as-
`sumed); and
`(c) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved,
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the dif_i-
`culties and the appropriateness of the various operations
`listed in part 1.68 The first task,
`in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control,
`Twosolutions have been suggested, each of which would
`base the relevance decisions on Jess than the whole docu-
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most document collections, this turns out
`to be a difficult task,
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`3. The Cut-off Problem
`
`The other major problem is caused by the require-
`mentto pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seems to be extraneous
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut-off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`tunately, this is not difficult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients, as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre-
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered documentlist of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`documents, when these documents are arranged in de-
`creasing correlation order. A funetion of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precisian*
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`* The measures described in this part were suggested by J. Rocchic.®
`
`214.=American Documentation — July 1965
`6
`
`

`

`ROW CORRELATIONS OF
`MATRIX
`
`PHRASE-OOCUMENT ives==6PAGE 73
`SEPTEMBER 28,
`a)
`14 COMPUTER
`DIFFERNTL
`0.1234
`DLFFEANTL EG J845TABILITY 0.0675
`0.9600 oO
`DOIFFEANTL
`21 CHO-PROGA
`‘O.UATS
`eG
`OLFFEANT( EQ JA0SIMULATIN 0.5758
`9.9400 0
`DIFFERATL
`EQ£0
`JTHE ROLF OF
`O-0294
`OUFFERNTL EG 200SOLUTION 0.5663
`0.9400 0
`6.0044
`OTFFERWTL
`SA NEW CLASS
`DIFFEANTL EQ 3920N COMPUT 0.5508
`9.9200 0
`OLFFERATL
`eoea
`SANALYSES OF
`0.0658
`OLFFEANTL EG JOGELIMINATE 0.5483
`0.000 0
`DIFFERATL
`OesTaL
`SGENLRALIZED
`OLFFERNTL EO 2OSRUNGE-KUT 0.5444
`8600 6
`DIFFEANTL
`a]€a
`TAN [MPROVEC
`O.z2090
`OLFFEANTL EQ BSNOTE ON AN 0.4510
`o-8600 0
`DIFFFANTL
`0.086)
`SSHNAT=CUT #
`OLFFEMNTL EQ L9ZSCLVING E 0.4106
`o.8s00 5
`DEFFERNTL
`£0
`OPERATION &
`OedbL1
`DOLFFEANTL EQ 3SOSTABILEZA 0.3986
`0.6200 0
`DIFFERNTL
`&0
`LCACCUHATE T
`Meh LOZ
`OLFFEANTE FQ LC20N THE 50 0.3986
`o.0000 0
`DIFFERNTL
`O.0083
`J261GTTatL CO
`OLFFEANTL EQ 387ROUNWDARY
`0.3986
`0.7500 0
`DIFFERNTL
`]
`LIWALF-a0DER
`0.0508
`OLFFLAMTL EQ 202S5TABLE PR 0.3906
`0.7600 0
`DIFFERNTL
`fa
`TBCONTROL AP
`OFFFCANTL EQ Z2Z9MATAIX PR C.3505
`0. 7400 0
`OUFFERATL
`a]
`LTTHE FUNCTE
`CIFFIRNIL EQ GAPROPOSED 4 c.3451
`0.7200 0
`DIFFERNTL
`EG
`IgAN ACCURAT
`DIFFERNTL EQ Z51ERRUR EST 0.3129
`0.7000 0
`OLFFRANTL
`Eo
`LRESISTANCE
`OLFFEANTL EQ 234ANALOGUE 0.3176
`0.6800 0
`DIFFERNT
`eG
`2OCIFFEAENTI
`OIFFERNTL EQ 253RUUND-OFF 3.3152
`0.6600 1
`OLFFERANTL
`E0
`214N ERROR-C
`CIFFEANTL EO IBGALGURITHM 0.4144
`O-8400 1
`DIFFERNT
`ec£0
`22LATCHING C
`DIFFERNTL €&
`LASTHENRETEC 025198
`0.6200 1
`23¥ TNT ATURE
`DIF FEWNTL
`OMFFERNTL EQ L2GCOMPUTER 0.3034
`D.6000 1
`DIFFERNTL
`Eo
`FASOME NOVEL
`OLFFEANTL EG 224eDEP1 .. 0.3028
`0.5800 1
`OLFFERWTL
`EQ
`758 WE TRAN
`OLFFERNTL EQ 454 CALCULAT 0.2958
`0.5600 3
`3]Eo
`OGFFRRNTL
`JeSEmICOnhuc
`OIFFEANTL EQ 390MONTE Can 0.2866
`0.5400 &
`BIFFEANTL
`277EN MLGAPU
`OLFFEANTL EQ 3684 METHOD 0.2707
`9.5209 &
`OLFFFRNTL
`EQ£9
`2O0ESIGR OF
`OLFFEANTL EQ ITSAUTOMATIC 0.2753
`0.5000 6
`OLFFERATL
`29INVESTIGAT
`OLFFEANTL EQ J0HELECTAON] 0.2750
`0.4000 6
`3a TRANSIST
`DIFFERNTL
`EQco
`OFFFERNTL EQ JISFROM FORM 0.2741
`0.4600 6
`3ERAGNETIC C
`OLFFERNTL
`OFFFEANTL EQ 249MATHEMATI
`8
`o.esog 7
`eG
`DiFFERNTE
`J2AWALOGUE T
`OIFFEARTL EQ 20bUNTFY ING
`O.s200 7
`DIFFERNTL
`EQ
`33THE US! OF
`OIMFFEANT, EQ 21TSIMULATIO
`6.4000 8
`J4€ND-F IPED
`OIFFERATL
`EGEQ
`OIFFEANTL EQ 2670N EXPONE
`0.3800 12
`354 LTA0- SHA
`DIFFERNTL
`OIFFERNTL EQ Zl3PREDICTIO
`0.3600 12
`JeFUNDAMENTA
`OUFFEANTL
`Eo
`OIFFEANTL EQ LOBSECANT AO
`0.3400 14
`37a HIGH-SPe
`OLFFEANTL
`EQ
`DIFFERNTL E£Q 3934 NOTE ON
`0.3200 15
`Eq
`VEAUTOMATIC
`DLFFERNTL
`OLFFEANTL EQ LOLDIGITAL C
`0.3000 21
`£0
`OLFFEANTL
`SI CORALANT CAT
`OMFFERNTL EQ LTLSMALL COW
`0.2800 23
`424 DIMECT &
`Eo
`OLFFERATL
`OLFFEANTL EO 24aNETHOD FO
`0.2800 33
`43TWe DATA C
`OLFFERNTL
`eoeo
`OLFPEANTL EQ ZO38INARY AR O.291)
`0.2400 34
`44aCCURACY C
`OLFFERNTL
`DUFFERNT! EG 2524 CLASS O 0.7303
`c.2200 47
`EO
`OLEFeRNTL
`DIFFERNTL EQ 365NUMERTCAL 0.2300
`d.2000 40
`§5a CALCULAT ©
`£9
`DIFFLANTL
`SoRA0T0) OTRE
`ODIFFEANTL EQ 210EVALUATIO 0.2205
`o.1noo 73
`OLFFEANTL
`Eo
`ATSPECIAL PU
`DIFFERNTA EQ 22004TA PREP 0.2287
`2aL65d a7
`AGA BUSINESS
`DIFFERNTL
`i)
`OIFFEANTY. EQ 2ANALOGUE 1 0.2282
`G.140c Ice
`494 DUAL MAS
`DIFFERNTL
`FQ
`OLFFEANTL EQ LOVTECHNECAL 0.2280
`ColzZ00 145
`DIFFEANTL
`Ea
`SOACCURACY C
`DIFFERNT EQ 3554 ROUTINE G.2272
`0.1000 146
`EG
`OUFFEGNTL
`SZeaTHENd ,
`OLFFERWTL EQ 2ES0I1GITAL ¢ C.225¢
`Ja3800 200
`S34 COMPUTER
`DIFFEANTL
`Eg
`OIFFERNTL EQ S9COMPUTERS
`0.2249
`Ge0609 257
`OLFFE@nTL
`Eo
`Sean BUTAMAT
`DIFFERWTL EQ ZOLITERATIVE 0.2198
`0.0400 304
`DIFFERNT
`Eo
`SSMUTOMATIC
`OUFFEANTL EQ L9SARTIFIECTIA 0.2196
`O.0700 348
`OLFFEGNTL
`ea
`S4THE COMPUT
`OIFFERNTL EQ J6LS54INT COM 0.2167
`OLFFEANTL
`a)
`STCASE STUDY
`OLFFERNTL EQ 257SURVEY OF 0.2181
`Eo
`SOOTHE LARGES
`DCFFEANTE
`DIFFEANTL EQ 23S0PEAATENG 0.2180
`OCFFERNTL
`eQ
`S9DATA PROCE
`CLFFERWTL EQ LITCOMPUTATI 0.2170
`CLIEFEANTL
`BOUMTELLIGEN
`CIFFERNTL EQ ZOTAN APPLIC 0.2162
`OLFFERNTL
`)
`6LAN INPUT R
`OIFFEANTL EQ 200]FFEREMTI O,2422
`670M PROGRAM
`DIFFESNTL
`EQ
`DIFFERNTL EQ Z235FREEZING 0.2093
`a) INGREASING DOCUMENT
`b) DECREASING CORRELATION
`ORDER
`ORDER
`
`o.09se
`
`
`
`c) HISTOGRAM
`
`
`
`aa
`
`0a
`
`Fic. 5. Correlations Between Search Request and Document Collection.
`
`document at a time until all documents in the whole
`collection have been retrieved. Finally, all R’s and P’s
`are averaged to obtain the normalized measures.
`In practice,
`this manner of proceeding would be ex-
`tremely tedious for large document collections, even if
`the calculations were done by computer.
`It may, how-
`ever, be shown by reasonably straightforward algebra
`that the normalized recall and normalized precision may
`he rewritten, respectively, as
`
`nt
`
`v3
`
`i
`
`j=1
`
`Dun Ly
`
`i=t
`i=l
`Ruy i= nt (N—n)
`
`a
`
`1
`Reann = N
`
`and
`
`» Inv, — > i
`N
`Prorin —_ N | Puy) re -—a (2)
`yl
`In n'(N—nj!
`
`—
`
`=
`
`1
`
`where
`
`rr,
`
`is the rank (in decreasing correlation order
`with the search request) of the ith relevant
`document in the collection,
`
`n is the total number of relevant documents
`in the collection,
`
`American Documentation —July 1965
`
`215
`
`7
`
`levels to define a new pair of
`all possible retrieval
`Measures, termed respectively normalized recall and nor-
`malized precision. Specifically,
`if A,,,
`is
`the standard
`recall after retrieving j documents from the collection
`(that is, if R,,, is equal to the number of relevant docu-
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all), then
`the normahzed reeall ean be defined us
`
`Rio ri 7
`
`N
`
`1TD Rin
`
`j=l
`
`where N is the total n

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket