`
`Selected Test Results Using the SMART System*
`
`The generation of efiective methods for the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic document retrieval system. The design
`
`of the SMART system is first briefly reviewed. The docu-
`ment file, search requests, and other parameters affecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of the
`
`retrieval performance are described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design of fully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetts
`
`0 Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures steins
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval.
`Since
`such an evaluation must of necessity take place in an
`experimental situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`' This study was supported by the National Science Foundation under
`Grant GN‘245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difliculties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one.
`
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment is then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems.
`
`0 The SMART Retrieval System
`
`SMART is a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter-based retrieval systems, the SMART system does
`
`American Documentation, Vol. 16, No. 3 — July 1965
`
`209
`
`1
`
`ELASTIC - EXHIBIT 1029
`
`ELASTIC - EXHIBIT 1029
`
`
`
`not rely on manually assigned keywords or index terms
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase—generating methods and the like,
`in
`order to obtain the content identifications useful for the
`
`retrieval process.
`are then
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further
`requirements,
`order a reprocessing of the request under new conditions.
`The new output can again be examined and the process
`iterated until the right kind and amount of information
`are retrieved.
`
`SMART is thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it may serve as a reasonable prototype for
`fully automatic document retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest*:
`
`(a) a system for separating English words into
`stems and affixes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`(b) a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con—
`cepts in the computer literature, corresponding
`to about 3000 English word stems);
`these
`concept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its “sons,” its
`“brothers,” and any of a set of possible cross-
`references; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`'Mnre detailed descriptions of the systems organization are included
`in Refs.
`1 and 2. Programming aspects and complete flowcharts are
`presented in Ref. 3.
`
`210
`
`American Documentation —— July 1965
`
`2
`
`(d) statistical procedures to compute similarity
`coefiicients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of “cri-
`terion” phrases in such a way that the same
`concept number is assigned to a. large number
`of semantically equivalent, but syntactically
`quite different constructions (e.g. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`procedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if
`the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, sufiix dictionary,
`concept hierarchy, statistical phrases, and syn-
`tactic “criterion” phrases).
`
`The operations of the system are built around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures the similarity between documents and search
`re:,uests, the cut-off value which determines the number
`of documents to be extracted as answers to search re-
`
`quests, and the thesaurus size.
`
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the.
`same search requests against the same document collec-
`tion several
`times, and making judicious changes in the
`analysis procedures between runs. It is this use of the
`SMART system, as an evaluation tool, which is of par—
`ticular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of
`the
`present report.
`
`
`
`Characteristic
`
`Comment
`
`Number of documents in collection.
`
`Document abstracts in the computer field.
`
`Number of search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`0 — 9 relevant documents
`10 — 30 relevant documents
`
`Technical people and students
`
`Number of indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Number of index terms per document.
`
`Varies greatly depending on indexing
`procedure and document.
`
`Number of relevant documents per request
`(a) specific
`(b) general.
`
`Number of retrieved documents per request. No cutroff is used to separate retrieved from
`nonretrieved.
`
`FIG. 1. Test Environment.
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(average) 35
`
`(average) 5
`(average) 15
`
`405
`
`0 The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to be described are summarized in Fig. 1. The
`data collection used consists of a set of 405 obstructs"
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requests are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be relevant
`to each request
`is equal
`to at
`least ten (for the general requests) or is less than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in View of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`is
`An excerpt
`from the document collection, as it
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`Fig. 3. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods. The documents are
`listed in decreasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fication.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret, since the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “vectors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 35 different word stems using the word
`stem analysis (labelled ”null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The number of index terms
`
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`
`° Practical considerations dictated the use of abstracts rather than full
`documents;
`the SMART system as
`such is not
`restricted to the
`manipulation of abstracts only.
`
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation — July 1965
`
`211
`
`3
`
`
`
`*TEXT ZMICRO-PROGRAMMING .
`
`SMICRO-PROGRAMMING
`SR. J. MERCER (UNIVERSITY OF CALIFORNIA)
`SU.S. GOV. RES. REPTS. VOL 30 PP 71-72(A)
`
`(AUGUST 15. 1958) PB 126893
`
`MICRO-PROGRAMMING . THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`OF SEQUENCES OF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`DULSE TIME IS DESCRIBED .
`
`*TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`STHE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`SM. M. ASTRAHAN (IBM CORP.)
`SIBM J. RES. AND DEV. VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS . THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED . LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WOPDS OF MACHINE-RE-
`AOABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`. HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES . THESE MEMORIES WILL SERVE AS INDEXES TO THE
`OELUGE OF TECHNICAL LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED . DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`. MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST. LARGE MEMORIES .
`
`FIG. 2. Typical Document Prints.
`
`ANSHERS TO REQUESTS FOR DOCUMENTS ON SPECIFIED TOPICS
`
`SEPTEMBER 28. 1966
`
`PAGE 83
`
`CURRENT REQUEST - .LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
`OLIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF ORDINARY
`DIFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS . EVALUATE THE VARIOUS INTEGRATION PROCEDURES (6.6. RUNGE--
`KUTTA. MILNt-S METHOD) WITH RESPECT TO ACCURACY. STABILITY. AND SPEED
`
`ANSWER
`
`CORRELATION
`
`IDENTIFICATION
`
`3HASTABILITY
`
`0.6615
`
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`H. E. MILNE AND R. R. REYNOLDS (OREGON STATE COLLEGEI
`J. ASSOC. FOR COMPUTING MACH. VOL 6 PP [96-203 (APRIL. 1959!
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`JACSIMULATIN
`
`0.5158
`
`ANSWER
`
`CORRELATION
`
`ZOOSOLUTION
`
`0.5663
`
`ANSHER
`
`CORRELATION
`
`3920M COMPUT
`
`0.5508
`
`ANSWER
`
`CORREEATION
`
`366ELIMINATI
`
`0 SAD!
`
`SIMULATING SECOND-ORDER EQUATIONS
`D. G. CNADNICR (UTAH STATE UNIV.I
`ELECTRONICS VOL 32 P 64 (MARCH 61 I959)
`IDENTIFICATION
`
`SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`G.N. LANCE (UNIV. OF SOUTHAMPTONI
`J. ASSOC. FOR COMPUTING MACH.. VOL 6. PP 97-101. JAN..
`IDENTIFICATION
`
`I959
`
`ON COMPUTING RADIATION INTEGRALS
`R. C. HANSEN (HUGHES AIRCRAFT CO.I. L. L. DAILIN (UNIV. OF SOUTHERN
`CALIFORNIA. AND R. H. RUTISHAUSER (LITTON INDUSTRIES.
`INC-I
`COMMMUN. ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 (FEDRUARYI I959)
`IDENTIFICATION
`
`ELIMINATION OF SPECIAL FUNCTIONS FROM DIFFERENTIAL EQUATIONS
`J. E. POWERS (UNIV. OF OKLAHOMA)
`COMMON. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH. 1959)
`
`FIG. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation — July 1965
`
`4
`
`
`
`UCCURRENCES 0F CnNCEPTS.AND PHRASES IN DOCUMENTS
`DOCUMENT
`CONCEPT'UCCURS
`DIFFERNTL E0
`“EXACT
`12
`BALGGR
`lloAUT
`12
`143U11
`269£L1
`ZThDIF
`428818
`SOSAPP
`BLDCAT
`32REOU
`59AHN1
`93ORDR
`1120PE
`166JOB
`163EAS
`llTDIR
`327AST
`3501FU
`ALGORI
`EVALU
`ORDIN
`SPEED
`CHARAC
`ENABLE
`HANDLE
`UPER
`POSS
`SIMPLE
`TOWARD
`
`SEPTEHBER 23. 1966
`PAGE 17
`920161
`12
`26
`181QUA
`386156
`12
`
`6
`4
`
`13CA£C
`176501
`356VEL
`
`18
`12
`
`71EVAL
`179510
`351YAH
`
`REGULAR
`THESAURUS
`
`NULL
`IHESAURUS
`
`6 3
`
`21
`I.
`12
`16
`12
`
`36
`
`leASC
`530171
`83MAP
`108LDD
`130MEA
`15BREL
`17OSVI
`21600!
`34012!
`SOIACT
`12
`DIGIT
`12
`HETNDD
`RUNGE-
`12
`12
`VARIE
`12 DESIGN 12
`12
`FOR”
`12
`INFORM
`12
`12
`12
`PLANE
`RECDGN
`12
`12
`12
`STRUCT
`12
`URITT
`
`6
`6
`6
`C
`6
`12
`8
`6
`
`15BASE
`4TCHNG
`71LIST
`1OTDGN
`121NEH
`169700
`176SOL
`212511
`SBBHCH
`SOIURD
`DIFFER
`1NTEGR
`PRDCED
`USt
`
`DESCRI
`EXPLAI
`INDEPE
`ORIENT
`PROGRA
`STORE
`USING
`
`IOALPH
`filHCHD
`TZEXEC
`106NOU
`1196UT
`1115's
`1680RD
`2IOOUT
`332566
`61965N
`COHPUT
`GIVE
`PARTI
`STABIL
`CDHPUT
`ESTIH
`ILLUST
`0RD
`PROBLE
`511E
`TRANS?
`
`p
`
`FFnHN"DNO‘NBU‘ODN
`
`r-P nrN
`36
`12
`12
`12
`36
`26
`12
`
`16
`12
`12
`6
`
`12
`12
`36
`24
`12
`3
`26
`10
`6
`18
`12
`12
`12
`6
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`
`a4 53
`
`15
`12
`56
`12
`
`64
`
`18
`6
`12
`24
`12
`12
`12
`12
`12
`26
`12
`12
`12
`
`[A COMPUTER
`
`DIFFERNTL £0
`
`1‘ COMPUTER
`
`ZINPUT
`31811-
`STDSCB
`STENBL
`llOAUT
`143UT1
`162ROF
`182SAV
`276GEN
`3§6JET
`ACCUR
`ECU
`NUHER
`SDLUT
`BAS
`DIREKT
`GIVE
`HACNIN
`PDS
`SCANM
`TECHNI
`
`6 5m.
`
`TIEVAL
`179510
`357VAN
`SOSAPP
`leDDR
`61HCHO
`TZEXEC
`106NOU
`119AUT
`1‘15YS
`1680RD
`2000A-
`216GEH
`338HCH 8
`5010RD 4
`
`8
`6
`6
`fl
`12
`
`12
`920161
`6
`a-Alo
`12
`«m21
`153ASE
`67CHNG
`11LIST
`2 PHRASES
`30
`loTDGN
`
`4
`121HEN
`36
`1§9POG
`
`116$0L
`12
`
`
`5086CT 6
`
`STAT.
`PHRASE
`LOOK-UP
`
`13CALC
`176SOL
`356VEL
`629513
`IOALPH
`JZREDU
`59AHNT
`930RDR
`lIZDPE
`lfiéJflfl
`163EAS
`ISTDIR
`'PI
`
`12
`3
`26
`10
`6
`18
`12
`12
`
`.m4
`
`19 EN
`
`-
`
`12
`12
`36
`12
`12
`3
`15
`12
`36
`12
`
`BALGUR
`1§3U11
`274DIF
`384T1G
`SLOCAT
`31811
`STDSCH
`BTENBL
`110‘UT
`1‘3U11
`162RUF
`IBZSRV
`21600! 12
`iZTAST 12
`lfiuIFU 6
`
`66
`
`DIFFERNTL EC
`
`bEXACT
`IIOAUT
`
`12
`12
`
`1A COMPUTER
`
`6
`
`i69E||
`21NPUT
`16BASC
`SSDATA
`BINAP
`108100
`ISOHEA
`l5BREL
`18
`ITBSVH
`212511 12
`302LUD 72
`366JET 6
`
`1J‘NO‘OU‘J‘12
`
`FIG. 4. Typical Indexing Products for Three Analysis Procedures.
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`
`paragraphs.
`
`0 Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system’s ability to retrieve wanted information and
`to reject nonwanted material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`' Pracision has also been called "relevance," notably in the literature
`of the ASLIB—Cranfield project.5
`1' It has, however, been conjectured that an inverse relationship exists
`between recall and precision, such that high recall automatically Implies
`low precision and vice verse»
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A large number of measures have been proposed in the
`past for the evaluation of retrieval performance.4 Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects very little that is rele-
`vant but may also retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informa
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall.
`Ideally,
`one would of course hope for both high recall and high
`precision.’r
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding, are eliminated
`since all
`indexmg operatlons are automatic;
`
`American Documentation —- July 1965
`
`213
`
`5
`
`
`
`search
`reason, conventional
`the same
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(0) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`who originally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a) relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document is
`assumed either wholly relevant or wholly
`irrelevant
`(in case of doubt relevance is as-
`sumed); and
`(c) a cut—off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut—off value are retrieved, while the others
`are not retrieved.
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the diffi-
`culties and the appropriateness of the various operations
`listed in part 1.5—8 The first task, in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control.
`.
`Two solutions have been suggested, each of which would
`base the relevance decisions on less than the whole docu-
`
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most document collections, this turns out
`to be a difficult task.
`
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`214
`
`American Documentation —— July 1965
`
`6
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`3. The C'ut-ojjr Problem
`
`The other major problem is caused by the require-
`ment to pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seems to be extraneous
`
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut-off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`
`tunately, this is not difficult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients, as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre—
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered document list of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`
`documents, when these documents are arranged in de—
`creasing correlation order. A function of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precision*
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`’The measures described in this part were suggested by J. Rocchio.B
`
`
`
`R0! CORRELAYIONS OF
`DIFFERNIL
`DIFFERNFL
`DIFFERNTL
`DIFFERNYL
`DIFFERNIL
`DIEFERNTL
`DIFFERNIL
`DIFFFRNYL
`DIFFERNYL
`DIFFERNTL
`DIFFERNIL
`DIFFERN'L
`DIFFERNYL
`DIFFERNYL
`DIFFERNIL
`DIFFFRNTL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFEENYL
`DIFFFRNYL
`DIFFERNVL
`DIFFERNVL
`DIFFERNIL
`DIFFFRNYL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFERNTL
`DIFFERNIL
`DIFFERNYL
`DIFFERNIL
`DIFFERNIL
`DIFFERNYL
`DIFF[KNIL
`DIFFIRNYL
`DIFFERNVL
`DIFFERN'L
`DIFFCRNYL
`DIFFLRNYL
`DIFFERNTL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFEENYL
`DIFFERNIL
`DIFFERNIL
`DEEFERNIL
`DIFFERNTL
`DIFFERNYL
`DIFFERNIL
`DIFFERNYL
`DIF‘ERNIL
`DlFFEflNlL
`DIFFERN'L
`
`PMRASE-DOCUNENI
`E0
`1‘ COMPUIER
`ZIlCRD-PRDGR
`E0
`E0
`ZINE RULE DF
`£0
`AA NEH CLASS
`E0
`SANALVSIS OF
`E0
`bGENtRALIZtD
`E0
`TAN [IPROVEC
`E0
`lSNan-CUT N
`90PERAIION A
`E0
`E0
`ICACCURATE f
`60
`IZBIGIIAL C0
`E0
`lSNAtF-ADDEI
`E0
`leflNlRDL I?
`60
`IIIMF FUNCII
`E0
`[BAN ACCURAT
`E0
`IQIESISTANEE
`IDCIFFERFNII
`E0
`E0
`2IAN ERROR-c
`EC
`ZZLAYCHING C
`ED
`23FINIAIUIE
`E0
`Z‘SUflE NOVEL
`E0
`25‘ NF" 'RAN
`E0
`ZGSEHICDNDUE
`27YEN HLGAPU
`E0
`ED
`ZBDESIuN 0F
`E0
`29INVESTIGAT
`E0
`3C! TRANSISI
`[0
`SIHAGNFIIC C
`SZANALOGUE l
`E0
`JSINL USP 0F
`E0
`JQEND‘FIRED
`E0
`35A LOAD-SN!
`E0
`E0
`36FUNDAHENII
`371 HIGH—SPE
`E0
`E0
`SBAUIOMAIIC
`E0
`AICUNHUNICAT
`52A DIRECI R
`in
`‘3TH: DAYA C
`E0
`E0
`AQACEURACV C
`E0
`55A CALCULAY
`E0
`fibRAOIU DIRE
`E0
`hTSPEClAL PU
`‘3‘ BUSINESS
`E0
`‘9‘ DUAL HAS
`E0
`SOACCURACV C
`E0
`EC
`SZOAVNINA '
`53‘ CUMPUIEH
`50
`£0
`55AM AUVOHAI
`SSAUIUKAYIC
`E0
`56YNE CDHPUY
`£0
`E0
`STCASE SIUDV
`58!”! LARGES
`E0
`E0
`59DAlA FRDCE
`E0
`EDINIELLIGEN
`E0
`blAN INPUT R
`620M PROGRAM
`E0
`
`HAIR]!
`0.1235
`0.0015
`0.0203
`0.0050
`0.0550
`0.0151
`0.2000
`0.0051
`0.0511
`0.110:
`0.0503
`0.0555
`0.0335
`0.0500
`0.1101
`0.0111
`0.2123
`0.21:5
`0.0051
`0-0307
`0.0195
`0.1000
`0.0551
`0.1005
`0.1315
`0.;010
`0.0130
`0.0575
`0.2201
`0.0002
`0.0550
`0.0311
`0.3392
`0.0355
`0.1051
`0.1185
`0.0530
`0.0333
`0.1300
`0.2950
`0.0050
`2.1250
`0.0005
`0.0515
`0.0551
`0.1030
`0.1321
`0.0153
`0.0155
`0.1513
`0.0550
`0.0255
`0.0302
`0.0531
`0.0505
`0.1151
`
`ssvrsuatl 20. 1155
`
`PAGE 13
`
`olrreaujL so 305511511111 0.5515
`DIFFEINVL ED JbOSlIULAIlN 0.5758
`DIFFEINTL so ZOOSDLUYION 0.5553
`DIFFERMTL so 3920» coulur 0.5505
`orrisautt so 3ns£11u1n111 0.5503
`DlFFtINTL ED 103nuncE-xu1 0.5565
`DIFFERNTL so Isuors on an 0.5510
`DIFFERNTL £0 192501v1uc 2 0.5105
`DIFFERNVL so 350511511115 0.3915
`DIFFLINIL so 1czo~ tut so 0.3305
`orrreau11 so 3|150uquIv 0.3955
`DIFFtRNVL 50 202511015 Pl 0.3105
`DIFFEHNYL E0 zzennral: 9| 0.3505
`errrlnNIL to cavaovusro I c.3551
`DIFFERNIL so 251eunun :51 0.3325
`DIFFERMIL E0 235A~ALoGuE
`0.3115
`015522011 50 253luUND-OFF 3.3152
`01rr1aurt
`t0 1055100011un 0.3155
`DIFFFRNVL £1. 15911151111811: 0.3136
`DIFFEINTL to lzlcou'u1sl
`0.3035
`orsrenu11 £0 22550!!! ..
`0.3020
`DIFFERNFL so 551 CALCULAV 0.2551
`DIFFEINTL E0 IQOUONYE til 0520‘.
`alrrsuutL so 3005 ntruoo 0.2101
`DIFFERNIL En I73AUTOIIIIC 0.275!
`DIFFEINYL so 30521ECIIun1 0.2150
`DIFFERNYL E0 JIIFIDI roan 0.2751
`DIFFEINTL £0 255551u£u111 0.2503
`DIFFEINTL E0 ZOOUNIFVIKG 0.16.2
`DIFFEINIL Ea ZIVSIIULIVIO 0.2555
`DIFFEINIL £0 3610M EIEONE 0.1661
`DIFFEINIL E0 213PIEDKCYIO 0-2630
`Dlrrtlflll E0 IOOSECANI In 0.2620
`015555511 so 3035 nave 0» 0.2500
`DIFFEINIL E0 [OIDIGITIL C 0.2310
`DIFFEINIL so |1|SIALL CON 0.1325
`DIFFERNIL £0 ZfiIlEYNDD F0 0.23l0
`DIFFEINIL so ZIJIINAIV 1| 0.2311
`DIFFERNIL E0 252‘ (L155 0 0.1303
`nIFGERMVL so BBSNUNEIICAL 0.2300
`DIFFERNVL EO ZIOEVALUAYIO 0.22'5
`DIFFERNIL 60 2200115 was! 0.2252
`olsrsnntt E0 JIANALOGUE I 0.22I2
`DIFFeuurL 20 19112cnulc11 0.2200
`DIFFEINTL so 3551 Ioullu: 0.2212
`airfienan :0 2150101111 c 0.2259
`DIFFEINVL to secou'utsns
`0.2255
`DIFFERNVL :0 ZOIIIERAIIVE 0.2195
`DIFFERNTL £0 153ARIIFICIA 0.2195
`011555011 so 351511u1 can 0.2101
`DIFFERNTL so astsulvsv or c.2101
`DIFFERNYL so 235025511100 0.2100
`oliFeiurL £0 111cun201111 0.2110
`DIFFEINYL so ZOIAN 11211: 0.2152
`oxrrenu11 so ZODIFFEIENYI 0.2122
`DIFFEINYL E0 235Fn£EzIMG 0.2093
`
`0-9000 0
`0-9600 0
`0-'*0° 0
`0.3200 0
`0.1000 0
`0-.000 0
`0.0500 0
`0-0500 0
`0.1200 0
`0.1000 0
`0-1800 0
`0.1500 0
`0.1500 0
`0.1200 0
`0.1000 0
`0.5500 0
`0.5500 1
`0.5500 1
`0.6200 1
`3.5000 1
`0.5500 1
`0.5500 3
`0-5*00 5
`9.5200 6
`0.5000 6
`0.5300 5
`0.5.00 6
`0-5500 1
`0.5200 1
`0.5000 5
`0.3000 |2
`0-3‘00 l2
`0.3400 15
`0.3200 15
`0.3000 2|
`0-1300 2!
`0.1600 33
`0.2500 35
`2.2100 61
`0.2000 50
`0.1300 73
`0.1550 51
`0.150c 105
`0.1200 135
`0.1000 155
`0.0000 200
`0.0500 257
`0.0500 305
`0.0200 350
`
`mlNCREA$NG DOCUMENT
`ORDER
`
`b) DECREASING CORRELATION
`ORDER
`
`1:) HISTOGRAM
`
`FIG. 5. Correlations Between Search Request and Document Collection.
`
`to define a new pair of
`levels
`all possible retrieval
`measures, termed respectively normalized recall and nor—
`malizcd precision. Specifically,
`if Rm is
`the standard
`recall after retrieving j documents from the collection
`
`(that is, if Rm is equal to the number of relevant docu-
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all),
`then
`the normalized recall can be defined as
`
`Ign011n === 1i?—
`
`whcre N is the total number of documents in the col-
`lection.
`
`if Pm is the standard precision after re-
`Similarly,
`trieving j documents from the collection, then a normal-
`ized precision