`Selected Test Results Using the SMART System*
`
`The generation of effective methods for the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic documentretrieval system. The design
`
`of the SMARTsystem is first briefly reviewed. The docu-
`mentfile, search requests, and other parameters affecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of the
`retrieval performance are described. The maintest results
`are given and tentative conclusions are reached con-
`cerning the design of fully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetts
`
`® Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval.
`Since
`such an evaluation must of necessity take place in an
`experimental situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`© This study was supported by the National Science Foundation under
`Grant GN-245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difficulties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one.
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment is then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems.
`
`e The SMART Retrieval System
`
`SMARTis a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter-based retrieval systems, the SMART system does
`
`American Documentation, Vol. 16, No. 3— July 1965
`
`209
`
`1
`
`ELASTIC - EXHIBIT 1029
`ELASTIC - EXHIBIT 1029
`
`
`
`not rely on manually assigned keywords or index terms
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase-generating methods and the like,
`in
`order to obtain the content identifications useful for the
`retrieval process.
`are then
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further
`requirements,
`order a reprocessing of the request under new conditions.
`The new output can again be examined and the process
`iterated until the right kind and amount of information
`are retrieved.
`
`SMARTis thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it may serve as a reasonable prototype for
`fully automatic document retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest* :
`
`(a) a system for separating English words into
`stems and affizes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`(b) a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con-
`cepts in the computerliterature, corresponding
`to about 3000 English word stems);
`these
`concept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its “sons,” its
`“brothers,” and any of a set of possible cross-
`references; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`*More detailed descriptions of the systems organization are included
`in Refs.
`1 and 2. Programming aspects and complete flowcharts are
`presented in Ref. 3.
`
`210
`
`American Documentation — July 1965
`
`2
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`{e) syntactic analysis and matching methods which
`make it possible to compare the syntactieally
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of “‘eri-
`terion” phrases in such a way that the same
`concept numberis assigned to a large number
`of semantically equivalent, but syntactically
`quite different constructions (eg. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`precedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if
`the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, suffix dictionary,
`concept hierarchy,statistical phrases, and syn-
`tactic “criterion” phrases).
`
`The operations of the system are built around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures tke similarity between documents and search
`recjuests, the cut-off value which determines the number
`of documents to be extracted as answers to scarch re-
`quests, and the thesaurussize.
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`tion several
`times, and making judicious changes in the
`analysis procedures between runs. It is this use of the
`SMARTsystem, as an evaluation tool, which is of par-
`ticular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of
`the
`present report.
`
`
`
`Characteristic
`
`Comment
`
`Number of documentsin collection.
`
`Document abstracts in the computerfield.
`
`Numberof search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`O- 9 relevant documents
`10 — 30 relevant documents
`
`Technical people and students
`
`Numberof indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Numberof index terms per document.
`
`Varies greatly depending on indexing
`procedure and document.
`
`Numberof relevant documents per request
`(a) specific
`(b) general.
`
`Numberof retrieved documents per request. No cut-off is used to separate retrieved from
`nonretrieved.
`
`Fic. 1. Test Environment.
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(average) 35
`
`(average) 5
`(average) 15
`
`405
`
`@ The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to be deseribed are summarized in Fig. 1. The
`data collection used consists of a set of 405 abstracts*
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requests are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be relevant
`to each request
`is equal
`to at
`least ten (for the general requests) or is less than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in view of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`is
`An excerpt
`from the document collection, as it
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typieal scarch request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`* Practical considerations dictated the use of abstracts rather than full
`documents;
`the SMART system as
`such is not
`restricted to the
`manipulation of abstracts only.
`
`Fig. 8. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods. The documents are
`listed in decreasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fication.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important,
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret,
`smmce
`the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “vectors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 35 different word stems using the word
`stem analysis (labelled ‘‘null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The numberof index terms
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`In Fig. 4, each concept numberis followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation — July 1965
`
`211
`
`3
`
`
`
`*#TEXT 2MICRO=PROGRAMMING e
`
`SMI CRO-PROGRAMMING
`$Re Je MERCER (UNIVERSITY OF CALIFORNIA)
`SUeSe GOVe RESe REPTSe VOL 30 PP 71-72(A} (AUGUST 155 1958) PB 126893
`
`MICRO-PROGRAMMING e THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`OF SEQUENCES OF ELEMENTARY OPERATIONS THAT CAN BE FXECUTED IN ONE
`PULSE TIME IS DESCRIBED .
`
`*TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`$THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`$Me Me ASTRAHAN (TBM CORPe)
`SIBM Je RESe AND DEVe VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS e THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED « LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WOPDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`e HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`QF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES e THESE MEMORIES WILL SERVE AS INDEXES TO THE
`MELUGE OF TECHNICAL LITERATURE WHEN THE PROBLEMS OF INPUT ANO OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED « DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`e MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST» LARGE MEMORIES eo
`
`Fic. 2. Typical Document Prints.
`
`ANSWERS TO REQUESTS FOR DOCUMENTS ON SPECIFIED TOPICS
`
`SEPTEMBER 28, 1964
`
`PAGE 83
`
`GURRENT REQUEST - *#LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
`*#LIST OLFFERNTL €Q NUMERICAL DIGITAL SOLN OF OLFFERENTIAL EQUATIONS
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF OROINARY
`OIFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS » EVALUATE THE VARIOUS INTEGRATION PROCEDURES (E.G. RUNGE-~
`KUTTA, MELNE-S METHOD) WITH RESPECT TO ACCURACY, STABILITY, AND SPEED
`
`ANSWER
`
`CORRELATION
`
`IDENTEFICATION
`
`BE4STABILITY
`
`0.6675
`
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`W. &. MILNE AND 8. Ro REYNOLDS (QREGON STATE COLLEGE)
`J. ASSOC. FOR COMPUTING MACH. VOL 6 PP 196-203 (APRIL, 1959)
`
`ANSWER
`
`CORRELATION
`
`IDENTIFICATION
`
`Z6CSEMULATIN
`
`0.5758
`
`ANSWER
`
`CORRELATION
`
`20C SOLUTION
`
`925663
`
`ANSWER
`
`CORRELATION
`
`3920N COMPUT
`
`0.5508
`
`ANSWER
`
`CORRELATION
`
`3B6FLIMINATI
`
`0.5483
`
`SIMULATING SECOND-ORDER EQUATIONS
`O. Ge CHADWICK {UTAH STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6, 1959)
`IDENTIFICATION
`
`SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`GaN. LANCE (UNIV. OF SOUTHAMPTON)
`Je ASSOC. FOR COMPUTING MACHes VOL 6) PP 97-101,
`IDENTIFICATION
`
`JANee 1959
`
`ON COMPUTING RAQIATION INTEGRALS
`Re Co HANSEN (HUGHES AIRCRAFT CO), Le Le BAILIN (UNIV. OF SOUTHERN
`CALIFORNIA, AND Ro We. RUTISHAUSER (LITTON INOUSTRIES,
`INC.)
`COMMMUNe ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 (FEBRUARY, 1959)
`IDENTIFICATION
`
`ELIMINATION OF SPECIAL FUNCTEONS FROM DIFFERENTIAL EQUATIONS
`Je Eo POWERS (UNIV. OF OKLAHOMA)
`COMMUN. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH, 1959)
`
`Fic. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation —~ July 1965
`
`4
`
`
`
`OCCURRENCES OF CONCEPTS AND PHRASES IN DOCUMENTS
`DOCUMENT
`CONCEPT, CCCURS
`DIFFERNTL EG
`4EXACT 12
`BALGOR 12
`1LOAUT 12
`143uUTI 12
`26GtLI
`4
`27401F 36
`428STBR
`4
`SOSAPP 24
`SLOCAT 12
`1OALPH
`32REQU 3
`41LRCHO
`T2EXEC
`S9QAMNT 24
`930RCR. 10
`1LO6NQU
`LL20PE
`6
`119aUT
`147SYS
`146J08 18
`1680RC
`163EAS 12
`2100UT
`UB7OIR 12
`B32SEE
`327AST 12
`419GEM
`3501FO0
`6
`comMPUT
`ALGORT 12
`EVALU
`12° GIVE
`ORDIN
`12
`PARTI
`SPEED
`12
`STABIL
`comPut
`CHARAC 12
`ENABLE 12
`ESTIM
`TLLUST
`HANOLE 12
`OPER
`12
`ORD
`PROBLE
`«POSS
`12
`SIMPLE 12
`SIZE
`TRANSF
`FOWARD 12
`
`-
`
`-~FNPFENOTODN
`
`1A COMPUTER
`
`OIFFERNTL EQ
`
`1a COMPUTER
`
`4
`2INPUT
`3
`3181T"
`57DSCB 15
`STENBL 12
`L10AUT 36
`143UTI 12
`162R0F
`6
`182SAV 4
`276GEM 18
`346JET
`6
`ACCuR
`12
`€QU
`24
`NUMER
`12
`SOLUT
`12
`BAS
`12)
`DIRECT 12
`GIVE
`12
`MACHIN 24
`pos
`12
`SCANN
`12
`TECHNI £2
`
`T3CALC
`176SOL
`356VEL
`
`18
`12)
`12
`
`6
`TLEVAL
`«179STO 12
`357YAW 4
`
`SEPTEMBER 28, 1964
`PAGE 17
`92DIGI 12
`1B1QUA 24
`BB4TEG 12
`
`6
`15 BASE
`LéB8ASC
`47CHNG 6
`53DATA
`TTLIST
`4
`LOTOGN 3¢
`108100 1
`tomo
`4
`130MEA
`121MEM 4
`158REL 12
`149P0G 36
`176SOL 12
`178SYM 16
`21600 12
`212S1Z 12
`338MCH
`8
`340LET
`3
`5010R0
`4
`SO08ACT
`6
`12 DIFFER 24
`OIGIT
`12
`METHOD 12
`12)
`INTEGR 12
`RUNGE- 12
`12.
`PROCEO 12
`12)
`USE
`12
`VARIE
`12
`36 DESCRI 12 DESIGN 12
`12
`EXPLAI 12
`FORM
`12
`12
`INDEPE 12
`INFORM 12
`12 ORIENT 12
`PLANE
`12
`36
`PROGRA 36
`RECOGN 12
`24
`STORE
`12
`STRUCT 12
`12) USING
`12
`=WRITT
`12
`
`REGULAR
`THESAURUS
`
`NULL
`THESAURUS
`
`6
`
`662
`
`CEFFERNTL EG
`
`92DIGI 12
`BILQUA
`24
`
`13CALC
`SEKACT 12 @ALGUR 12
`18
`6
`7IEVAL
`176SOL
`TLOAUT L2
` 143UFT Le
`12.
`179STO 12
`69ELL
`4
`27401F 36
`356VEL
`12
`357vYaw 4
`428STB
`4
`SOSAPP 24
`G79c1F 72). 3A4TeG 12
`4
`TOALPH
`LA COMPUTER
`2INPUT
`SLOCAT 12
`12
`=14CODR 72
`158ASE
`6 syar|STAT.
`16B8ASC
`3I1BIT
`3
`6
`32REQU
`3
`41MCHO
`8
`47CHNG
`6
`PHRASE
`S9AMNT
`S3DATA
`5S7TOSCB I5
`6
`24
`‘T2EXEC
`6
`TTLIST
`4 PHRASES:
`LOOK-UP
`930RDR
`LO70GN 30
`&3MAP
`STENUL 12
`10
`«106NQU
`6
`
`1120PE
`6
`LI9AUT
`8
`106100 L
`LLQAUT 36
`12.MEM 4
`
`4
`L30HEA
`L43UTI 12
`146J08
`149P0G 36
`18
`L4&7SYS 12
`
`163EAS
`1680RD 4
`12
`176SOL 12
`15@REL 12
`162RUF
`6
`
`200DA- 72
`12
`178SY4 18
`182SAV 4
`1le7oIR
`212812 12
`276GEM 18
`36
`21600 12
`9°0G
`302L00 f2
`327AST 12 G3ZSEF48)338MCH 8
`J46JET 6
`3501FO 6
`ST9GEM 6
`SOLGRO 4
`508ACT 6
`
`
`
`62
`
`Fia. 4. Typical Indexing Products for Three Analysis Procedures.
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`paragraphs.
`
`@ Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system’s ability to retrieve wanted information and
`to reject nonwanted material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`* Precision has also been called “relevance,” notably in the literature
`of the ASLIB-Cranfield project.6
`+ It has, however, been conjectured that an inverse relationship exists
`between recall and precision, such that high recall automatically implies
`low precision and vice versa,
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A large number of measures have been proposed in the
`past for the evaluation of retrieval performance.* Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects very little that is rele-
`vant but mayalso retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informa-
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall.
`Ideally,
`one would of course hope for Both high recall and high
`precision.t
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding, are eliminated
`since all
`indexing operations are automatic;
`
`American Documentation — July 1965
`
`213
`
`5
`
`
`
`search
`reason, conventional
`the same
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and fina] ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`whooriginally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a) relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given documentis
`assumed either wholly relevant or wholly
`irrelevant
`(in case of doubt relevance is as-
`sumed); and
`(c) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved.
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the diffi-
`culties and the appropriateness of the various operations
`listed in part 15-8 Thefirst task, in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control.
`.
`Two solutions have been suggested, each of which would
`base the relevance decisions on less than the whole docu-
`ment collection. Thefirst one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most documentcollections, this turns out
`to be a difficult task.
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`214
`
`American Documentation — July 1965
`
`6
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls mherent in the sampling procedure and in the use
`of souree documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`8. The Cut-off Problem
`
`The other major problem is caused by the require-
`ment to pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seemsto be extraneous
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut-off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`tunately, this is not difficult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients, as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre-
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered documentlist of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`documents, when these documents are arranged in de-
`creasing correlation order. A function of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precision*
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`* The measures described in this part were suggested by J. Rocchio.?
`
`
`
`ROW CORRELATIONS OF PHRASE-DOCUMENT MATRIX
`CIFFERNTL EO 1M COMPUTER
`0.1234
`OIFFERNTL £Q 2MICRO-PROGR Q.UATS
`DIFFERNTL EQ 4THE ROLF OF 0.0293
`DIFFEANTL EQ 4A NEW CLASS 6.0844
`OLFFERNTL EQ SANALYSIS OF 0,0658
`OUFFERNTL EQ 6GENERALIZEO O24]
`DIFFERNTL EQ TAN LMPROVEEL 0.2090
`OUFFERNTL EQ BSHORT-CUT # 0.0861
`OIFFERNTL EQ SOPERATION A 0.J61L1
`OIFFERNTL EQ ICACCURATE f 9.1192
`OIFFERNTL EO L2CIGITAL CO 0.0883
`OIFFERNTL EQ }3HALF-ADDER 0.054A
`DIFFERNTL EQ 16CONTROL AP 0.0334
`OTFFERNTL EO L7THF FUNCTI 0.0587
`DIFFERNTL €Q IBAN ACCURAT 0.1397
`DIFFERNTL EQ LGRESISTANCE 0.0177
`OIFFERNTE EG ZQCIFFERENTI 0.2123
`DIFFERNTL EQ 21AN ERROR-C 0.7105
`OIFFERNTL EC 22LATCHING € 6.0057
`DAFFERNTL EQ 23MINIJATURE
`0.0307
`DIFFERNTL EQ 24SOME NOVFL 0.0199
`OLFFERNTL EQ 25A NEW TRAN 0.1068
`OEFFERNTL EQ 26SEMICONNUC 0.6653
`DIFFERNTL EQ 27TEN MEGAPU 0.1004
`OMFFFRNTL EQ 2B0ESTGN OF
`CAL3TS
`OIFFERNTL EQ 29INVESTIGAT Co 5879
`DIFFERNTL EQ 3CA TRANSIST O,-C736
`OLFFERNTL EO 31MAGNETIC C 0.0575
`DEFFERNTL EQ 32ANALOGUE J 0.2283
`DIFFERNTL EQ 33THE USF OF 9.9802
`DIFFERNTL EQ 34ENC-FIRED 060456
`OIFFERNTL EQ 354 LOAD-SHA 0.0331
`OLFFERNTL EQ 36FUNDAMENTA G.3392
`OUFFERNTL EQ 374 HIGH-SPE 0.4364
`OLFFERNTL EQ 38AUTOMATIC
`0.1043
`DIFFERNTL EQ SICOMMUNICAT 0.1285
`CIFFERNTL EQ 424 DIRECT R C1439
`DIFFERNTL EQ 43THe DATA C C.3333
`OIFFERNTL EO 44aCCURACY C 0.1399
`DIFFERNTL EQ 45A CALCULAT 0.2958
`DEFFLRNTL EO 46RA010 DIRE 0.0980
`OMFFERNTL EQ 47TSPECEAL PU 5.1268
`DIFFERNTL EQ 48A BUSINESS 0.0086
`DIFFERNTE £Q 494 DUAL MAS 0.05745
`OIFFERNTL EQ SCACCURACY C 0.0668
`OIFFEANTL EQ S2¢ATHENA »
`0.1039
`OUFFEANTL EQ 534 COMPUTER 0.1327
`DIFFERNTL EQ SAN AUTOMAT 0.0763
`DEFFERNTL EQ SSAUTOMATIC
`0.9746
`OLFFERNTL €Q STHE COMPUT 9.1513
`OCFFERNTL EQ 57CASE STUDY 0.095C
`OLFFERNTL EG SBTHE LARGES 0.0256
`OLFFERNTL EQ SQDATA PROCE 0.9302
`CLFFERNTL £Q GOINTELLIGEN 0.0791
`OLFFERNTL EO 61AN INPUT R 0.U04C4
`DIFFERNTL EQ 620N PROGRAM 0.1181
`
`SEPTEMBER 28, 1964
`
`PaGE 73
`
`OIFFERNTL EO 384STABILITY 0.6675
`OLFFERNFL EG 38CSIMULATIN 0.5758
`OIFFERNTL EQ 200SOLUTION 0.5663
`DIFFERNTL EQ 3920N COMPUT 5.5508
`CLFFERNTL EQ 3A6ELIMINATI 0.5483
`OLFFERNTL EO LOSRUNGE-KUT 0.5444
`OIFFERNTL EQ BSNOTE ON AN 0.4510
`OLFFERNTL EQ 192S0LVING E 0.4106
`OIFFERNTL EQ 35@STABILIZA 0.3986
`OIFFLANTL EQ 1C20N THE SO 0.3986
`ODIFFERNTL EQ 387ROUNDARY 0.3966
`DIFFLANTL EQ 202STABLE PR 0.3906
`DIFFEANTL EQ 229MATRIX PR 0.3505
`CIFFIRNIL £O BAPROPOSED MC. 3451
`OLFFERNTL EO 251ERRUR EST 0.3329
`OLFFLANTL EQ 234ANALOGUE 0.3176
`DIFFERNTL EQ 253RUUND-OFF 2.3152
`OFFFLANIL EQ 186MLGURITHM 0.5144
`DIFFERNT cu L69THEQRETIC 0.5136
`OIFFERNTL EQ 1Z26COMPUTER
`0.3034
`OIFFERNTL EQ 22600EPI 2.
`6.3028
`OLFFERNTL EQ 454 CALCULAT 0.2958
`OIFFERNTL EQ 390MONTE CAR 0.2866
`OLFFERNTL EQ 368A METHOD 0.2787
`OLFFERNTL EQ I73AUTOMATIC 0.2753
`OLFFEANTL €Q 306ELECTRONI 0.2750
`OLFFERMTL EQ 3JLOFROM FORM 0.2741
`DIFFERNTL EQ 249NATHEMAT! 0.2683
`DIFFEANTL EQ 260UNIFYING ©.2662
`OIFFERNTL EQ Z17SIMULATIO 0.2665
`OIFFERNTL EQ 3670N EXPONE 0.2661
`OIMFFERNTL EQ Z213PREDICTIOC 0.2630
`OLFFERNTL EQ LOOSECANT MO 0.2620
`OIFEERNTL EQ 3834 NOTE GN 0.2580
`OLFFERNTL EQ LOLDIGITAL C 0.2370
`DIFFERNTL EQ L71SKALL COM 0.2325
`OLFFERNTL EQ 248NETHOD FO 0.2319
`DOLFFERNTL EQ 283BINARY AR 0.2311
`DUFFERNTL EQ 2528 CLASS 0 6.2303
`OLFFERNTL EQ 38SNUMERECAL 0.2300
`DIFFERNTL EQ 210EVALUATIO 0.2205
`DIFFERNTL EQ 220DATA PREP 0.2282
`OLFFEANTL EQ 32ANALOGUE § 0.2282
`OLFFEANTL EQ 19TTECHNICAL 0.2280
`OIFFERNTL EQ 3558 ROUTINE 6.2272
`OLFFERNTL EQ 2ESDIGITAL C C.2259
`OLFFERNTL EQ 69COKPUTERS
`06.2249
`DIFFERNTL EQ 2011 TERATIVE 0.2198
`DIFFEANTL EO 193ARTIFICIA 0.2196
`OIFFERNTL EQ 361SAINT COM 0.2187
`OLFFERNTL EG 257SURVEY OF 0.2181
`OIFFERNTL EQ 2360PERATING 0.2180
`OLFFERNTL EQ 1LTCOMPUTATI 0.2270
`OLFFERNTL EQ 2074N APPLIC 0.2162
`OUFFEANTL EQ 200TFFERENTI O.2222
`OLFFERNTL EQ 235FREEZING 06.2093
`
`0.9800 0
`0.9600 0
`0.9400 0
`0.9200 0
`0.9000 0
`0.8800 0
`0.6600 6
`0.8400 9
`0.8200 0
`0.8000 6
`0.7800 0
`0.7600 6
`0.7400 0
`0.7200 0
`0.7000 0
`0.6800 0
`0.6600 1
`09-6400 1
`0.6200 1
`9.6000 1
`0.5800 1
`0.5600 3
`0.5400 6
`9.5209 6
`0.5000 6
`0.4300 6
`0.4600 6
`0.4400 7
`O-4200 7
`0.4000 8
`0.3800 12
`0.3609 12
`6.3400 14
`0.3200 15
`0.3000 21
`0.2800 23
`0.2600 33
`0.2400 34
`2.2200 47
`€.2000
`60
`0.1800 73
`3.1659 87
`GotSOC 166
`Co120C 135
`0.1000 16a
`3.9800 200
`6.0600 257
`9.0400 304
`0.0206 348
`
`a} INCREASING DOCUMENT
`ORDER
`
`b) DECREASING CORRELATION
`ORDER
`
`c) HISTOGRAM
`
`Fic. 5, Correlations Between Search Request and Document Collection.
`
`to define a new pair of
`levels
`all possible retrieval
`measures, termed respectively normalized recall and nor-
`malized precision. Specifically,
`if R,,,
`is
`the standard
`recall after retrieving j documents from the collection
`(that is, if #,;, is equal to the number of relevant docu-
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all),
`then
`the normalized recall can be defined as
`
`Rnorm ~~
`
`where N is the total number of documents in the col-
`lection.
`is the standard precision after re-
`if P.;,
`Similarly,
`trieving j documents from the collection,