`
`Selected Test Results Using the SMART System*
`
`The generation of eflective methods for the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic document retrieval system. The design
`
`of the SMART system is first briefly reviewed. The docu-
`ment file, search requests, and other parameters affecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of the
`
`retrieval performance are described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design of fully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetts
`
`0 Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval.
`Since
`such an evaluation must of necessity take place in an
`experimental situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`' This study was supported by the National Science Foundation under
`Grant, GN—245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difliculties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one.
`
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment is then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems.
`
`0 The SMART Retrieval System
`
`SMART is a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter—based retrieval systems, the SMART system does
`
`American Documentation, V01. 16, No. 3 — July 1965
`
`209
`
`EX1027
`
`EX1027
`
`1
`
`
`
`not rely on manually assigned keywords or index terms
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase—generating methods and the like,
`in
`order to obtain the content identifications useful for the
`
`retrieval process.
`are then
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further
`requirements,
`order a reprocessing of the request under new conditions.
`The new output can again be examined and the process
`iterated until the right kind and amount of information
`are retrieved.
`
`SMART is thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it may serve as a reasonable prototype for
`fully automatic document retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest*:
`
`(a) a system for separating English words into
`stems and affixes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`(b) a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con—
`cepts in the computer literature, corresponding
`to about 3000 English word stems);
`these
`concept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its “sons,” its
`“brothers,” and any of a set of possible cross-
`refercnces; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`'More detailed descriptions of the systems organization are included
`in Refs.
`1 and 2. Programming aspects and complete flowcharts are
`presented in Ref. 3.
`
`210
`
`American Documentation — July 1965
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of “cri-
`terion” phrases in such a way that the same
`concept number is assigned to a. large number
`of semantically equivalent, but syntactically
`quite diiierent constructions
`(e.g. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`procedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if
`the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, sufiix dictionary,
`concept hierarchy, statistical phrases, and syn-
`tactic “criterion” phrases).
`
`The operations of the system are built around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures the similarity between documents and search
`requests, the cut—off value which determines the number
`of documents to be extracted as answers to search re-
`
`quests, and the thesaurus size.
`
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`
`times, and making judicious changes in the
`tion several
`analysis procedures between runs. It is this use of the
`SMART system, as an evaluation tool, which is of par—
`ticular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of the
`present report.
`
`2
`
`
`
`Characteristic
`
`Comment
`
`Number of documents in collection.
`
`Document abstracts in the computer field.
`
`Number of search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`0 —— 9 relevant documents
`10 — 30 relevant documents
`
`Technical people and students
`
`Number of indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Number of index terms per document.
`
`Varies greatly depending on indexing
`procedure and document.
`
`Number of relevant documents per request
`(a) specific
`(b) general.
`
`Number of retrieved documents per request. No cutrof’f is used to separate retrieved from
`nonretrieved.
`
`FIG. 1. Test Environment.
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(average) 35
`
`.5
`(average)
`(average) 15
`
`405
`
`0 The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to be described are summarized in Fig. 1. The
`data collection used consists of a set of 405 abstracts"
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requests are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be relevant
`to each request
`is equal
`to at
`least ten (for the general requests) or is less than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer
`field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in View of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`An excerpt
`from the document collection, as it is
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`' Practical considerations dictated the use of abstracts rather than full
`documents;
`the SMART system as
`such is not
`restricted to the
`manipulation of abstracts only.
`
`Fig. 3. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods, The documents are
`listed in decreasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fication.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret, since the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “vectors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 35 different word stems using the word
`stem analysis (labelled “null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The number of index terms
`
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation — July 1965
`
`21]
`
`3
`
`
`
`*TEXT ZMICRO-PROGRAMMING .
`
`SMICRO-PROGRAMMING
`SR. J. MERCER (UNIVERSITY OF CALIFORNIA)
`SU.S. GOV. RES. REPTS. VOL 30 PP 71-72(A)
`
`(AUGUST 15. 1958) PB 126893
`
`MICRO-PROGRAMMING . THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`OF SEOUENCES OF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`DULSE TIME IS DESCRIBED .
`
`*TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`STHE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`5M. M. ASTRAHAN (IBM CORP.)
`SIBM J. RES. AND DEV. VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS . THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED . LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WORDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`. HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES . THESE MEMORIES WILL SERVE AS INDEXES TO THE
`DELUGE OF TECHNICAL LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED . DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`. MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST. LARGE MEMORIES .
`
`FIG. 2. Typical Document Prints.
`
`PAGE 83
`SEPTEMBER 28. 1964
`ANSHERS TO REQUESTS FOR DOCUMENTS ON SPECIFIED TOPICS
`CURRENT REQUEST - .LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
`OLIST DIFFERNTL EQ NUMERICAL DIGITAL SDLN OF DIFFERENTIAL EQUATIONS
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF ORDINARY
`DIFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS . EVALUATE THE VARIOUS INTEGRATION PROCEDURES (E.G. RUNGE--
`KUTTA. MILNt-S METHOD) HITH RESPECT TO ACCURACY. STABILITY. AND SPEED
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`BEASTABILITY
`
`0.6615
`
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`N. E. MILNE AND R. R. REYNOLDS (OREGON STATE COLLEGEI
`J. ASSOC. FOR COMPUTING MACH. VOL 6 PP (96-203 (APRIL.
`
`I959)
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`366$IMULATIN
`
`0.5158
`
`SIMULATING SECOND-ORDER EQUATIONS
`D. G. CMADMICR (UTAM STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6. 1959)
`
`ANSWER
`
`CORRELATION
`
`IDENTIFICATION
`
`ZOOSOLUTION
`
`0.5663
`
`ANSHER
`
`CORRELATION
`
`3920M COMPUT
`
`0.5508
`
`ANSWER
`
`CORREEATION
`
`386ELIMINATI
`
`0 5‘83
`
`SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`G.N. LANCE (UNIV. OF SOUTHAMPTON)
`J. ASSOC. FOR COMPUTING MACH.. VOL 6. PP 97-101. JAN.. 1959
`IDENTIFICATION
`
`ON COMPUTING RADIATION INTEGRALS
`R. C. HANSEN (HUGHES AIRCRAFT CO.). L. L. BAILIN (UNIV. OF SOUTHERN
`CALIFORNIA. AND R. M. RUTISHAUSER (LITTON INDUSTRIES.
`INC.)
`COMMMUN. ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 (FEBRUARY. 1959)
`IDENTIFICATION
`
`ELIMINATION OF SPECIAL FUNCTIONS FROM DIFFERENTIAL EQUATIONS
`J. E. POWERS (UNIV. OF OKLAHOMA)
`COMMON. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH. 1959)
`
`FIG. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation — July 1965
`
`4
`
`
`
`SEPTEHBER 23' 1964
`PAGE 17
`920161
`12
`29
`1810UA
`3IQTEG
`12
`
`REGULAR
`THESAURUS
`
`NULL
`IHESAURUS
`
`6 3
`
`21
`I.
`12
`16
`12
`
`leASC
`SJDATA
`B3HAP
`108L00
`130MEA
`15BREL
`l7BSYI
`21600”
`I‘DLET
`SOBACT
`12
`DIGIT
`12
`METHOD
`RUNGE-
`12
`12
`VARIE
`DESIGN 12
`FORK
`12
`INFORM
`12
`PLANE
`12
`RECOGN
`12
`12
`STRUCT
`12
`NRITT
`
`36
`
`12
`920161
`8-1 1.
`
`6
`6
`4
`C
`‘
`12
`8
`6
`
`12
`
`6
`
`STAT.
`PHRASE
`LOOK-UP
`
`«a21
`
`6 5m.
`
`ISDASE
`47CHNG
`77LIST
`2 PHRASES
`30
`107DGN
`
`s
`IZIMEN
`36
`1§9POG
`
`176$0L
`12
`
`
`6 SGGACT
`
`
`
`OCCURRENCES OF CONCEPTS.AND PHRASES IN DOCUMENTS
`DOCUMENT
`CONCEPT'UCCURS
`“EXACT
`12
`BALGOR
`110AUT
`12
`163UTI
`269611
`ZTQDIF
`428818
`SOSAPP
`BLDCAT
`32REQU
`59AHNT
`93ORDR
`1120PE
`1%6J06
`163EAS
`IITDIR
`327AST
`3501F0
`ALGORI
`EVALU
`ORDIN
`SPEED
`CHARAC
`ENABLE
`HANDLE
`DPER
`POSS
`SIMPLE
`TOHARD
`
`71EVAL
`179STD
`357YAH
`
`15BASE
`47CHNG
`77LIST
`107DGN
`121NEH
`169700
`176$0L
`212511
`SBBHCH
`SOIURD
`DIFFER
`INTEGR
`PROCED
`USL
`DESCRI
`EXPLAI
`INDEPE
`ORIENT
`PROGRA
`STORE
`USING
`
`18
`12
`
`U‘NOONBU‘ODN
`
`p HP
`
`p-pn—p
`
`naNruN
`
`36
`12
`12
`12
`36
`26
`12
`
`12
`12
`36
`26
`12
`3
`26
`10
`6
`1!
`12
`12
`12
`6
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`
`13CALC
`176501
`356VEL
`
`10ALPH
`41NCHO
`72EXEC
`106NOU
`119AUT
`1515's
`1680RD
`2100UT
`332565
`6196EN
`COMPUT
`GIVE
`PARII
`STABIL
`COMPUT
`ESTIM
`ILLUST
`0RD
`PROBLE
`511E
`TRANS?
`
`44 53
`
`15
`12
`16
`12
`
`66
`
`1a
`6
`12
`24
`12
`12
`12
`12
`12
`26
`12
`12
`12
`
`DIFFERNTL E0
`
`[A COMPUTER
`
`DIFFERNTL E0
`
`1A COMPUTER
`
`ZINPUT
`31BIT'
`57DSCB
`67ENBL
`IIOAUT
`lbluTI
`162ROF
`IBZSAV
`2766EN
`346JET
`ACCUR
`ECU
`NUNER
`SULUT
`BIS
`DIRECT
`GIVE
`HACHIN
`PDS
`SCANN
`TECHNI
`
`TIEVAL
`179STD
`357VAN
`SOSAPP
`leDDR
`61HCHO
`72EXEC
`106NOU
`119AUT
`167SYS
`1680R0
`2000A-
`276GEH
`333MCH D
`5010RD 4
`
`72
`8
`6
`6
`H
`
`13CALC
`176$0L
`356VEL
`629513
`10ALPH
`JZREDU
`59AHNT
`930RDR
`lIZDPE
`166J08
`163EAS
`187DIR
`'PI
`
`16
`12
`12
`6
`
`12
`1
`26
`10
`6
`18
`12
`12
`
`.m4
`
`19 EN
`
`-
`
`12
`12
`36
`12
`12
`3
`15
`12
`36
`12
`
`BALGUR
`1§3UTI
`27fiDIF
`384TtG
`SLOCAT
`31811
`STDSCH
`BTENBL
`llOAUT
`IbJUII
`162RUF
`IBZSRV
`21600! 12
`iZTAST 12
`lfiuIFU 6
`
`66
`
`DIFFERNTL EC
`
`1A COMPUTER
`
`bEXACT
`IIOAUT
`
`12
`12
`
`b
`
`i69E||
`ZINPUT
`lbfilSC
`SSDATA
`GINA?
`108100
`lSOHEA
`ISBREL
`ITBSYH
`212511
`302L00 TZ
`366JET 6
`
`FIG. 4. Typical Indexing Products for Three Analysis Procedures.
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`
`paragraphs.
`
`0 Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system’s ability to retrieve wanted information and
`to reject nonwantcd material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`° Preaision has also been called "relevance," notably in the literature
`of the ASLIB—Cranfield project.5
`'1' It has, however, been conjectured that an inverse relationship exists
`between recall and precision, such that high recall automatically Implies
`low precision and vice versa.
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A large number of measures have been proposed in the
`past for the evaluation of retrieval performance.‘1 Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects very little that is rele-
`vant but may also retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informs,
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall.
`Ideally,
`one would of course hope for both high recall and high
`precision.+
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding, are eliminated
`since all
`indexmg operatlons are automatic;
`
`American Documentation — July 1965
`
`213
`
`5
`
`
`
`search
`reason, conventional
`the same
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`who originally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a)
`
`relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document is
`assumed either wholly relevant or wholly
`irrelevant
`(in case of doubt relevance is as-
`sumed); and
`(c) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved.
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the diffi-
`culties and the appropriateness of the various operations
`listed in part 1.5—8 The first task, in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control.
`.
`Two solutions have been suggested, each of which would
`base the relevance decisions on less than the whole docu-
`
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most document collections, this turns out
`to be a difficult task.
`
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`214
`
`American Documentation —— July 1965
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`3. The (Tut-0])r Problem
`
`The other major problem is caused by the require-
`ment to pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seems to be extraneous
`
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut—off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`
`tunately, this is not diflicult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients,
`as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre—
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered document list of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`
`documents, when these documents are arranged in de—
`creasing correlation order. A function of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precision"
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`’The measures described in this part were suggested by J. Rocchio.“
`
`6
`
`
`
`R0! CORRELAVIONS OF
`DIFFERNIL
`DIFFERNIL
`DIFFERNYL
`DIFFERNTL
`DIFFERNIL
`DIFFERNTL
`DIFFERNIL
`DIFFFRN'L
`DIFFERNIL
`DIFFERN'L
`DIFFERNIL
`DIFFERNIL
`DIFFERN'L
`DIFFERNYL
`DIFFERNIL
`DIFFFRNFL
`DlFFERNIl
`DIFFERNYL
`DIFFERN'L
`DIFFEENTL
`DIFFFRNIL
`DIFFERNIL
`DIFFERNIL
`DIFFERNYL
`DIFFFRNIL
`DIFFERN'L
`DIFFERNTL
`DIFFERNYL
`DIFFERNTL
`DIFFERNTL
`DlFFEINlL
`DIFFERNTL
`DIF‘FRNYL
`DIFFERNYL
`DIFFERNYL
`DIFFEKNTL
`DIFFIRNYL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFLRNYL
`DIFFERNTL
`DIFFERNTL
`DIFFERNYL
`DIFFERN'L
`DIFFEENTL
`DIEFERNIL
`DIFFERNIL
`DIEFERNIL
`DIFFERNTL
`DIFFERNYL
`DIFFERNIL
`DIFFERNYL
`DIF‘ERNIL
`DIFFEflNlL
`DIFFERN'L
`
`PMRASE-DOCUNENI
`E0
`1‘ COMPUIER
`ZHlCRO-PIDGR
`E0
`E0
`31HE RULE DF
`20
`‘1 MEN CLASS
`E0
`SANALVSIS OF
`E0
`GGENERALIZtD
`E0
`TAN [IPROVED
`E0
`lSNflRl-CUT I
`90PEIAIION A
`E0
`E0
`ICACCURAYE I
`60
`IZDIGIIAL CO
`E0
`lJNAtF-ADDEI
`E0
`leflN'ROL l?
`E0
`IIIMF FUNCII
`50
`18AM ACCURAY
`£0
`IQIESISTANEE
`ZQCIFFEKFNII
`20
`E0
`21AN ERROR-C
`E0
`ZZLI'CHING C
`50
`ZBFINIAYUIE
`E0
`I‘SUHE NOVEL
`E0
`25‘ NF" 'RAN
`E0
`ZESEHICONDUC
`ZTYEN HLGAPU
`E0
`E0
`ZUDESIGN 0F
`E0
`29lNVE5TIGAT
`50
`3CA IRANSIS'
`[0
`SIHAGNFIIC C
`JZANALGGUE l
`t0
`33INL USP Of
`E0
`JQEND‘FIRED
`50
`35A LOAD-SNA
`E0
`E0
`ZAFUNDAHENIA
`37A HIGH—SPt
`E0
`E0
`JEAUIOHAIIC
`E0
`AICUNHUNICAT
`02A DIRECI N
`50
`03TH: DAYA C
`E0
`E0
`l‘ACCUHACV C
`E0
`05A CALCULA'
`E0
`0001010 DIRE
`E0
`h7SPFClAL PU
`‘8‘ BUSINESS
`E0
`‘9‘ DUAL HAS
`E0
`501CCURACV C
`E0
`E0
`520A1NINA '
`E0
`53‘ CUMPUYEI
`£0
`ShAN AUVOHAI
`55‘UIUKAIIC
`E0
`56YNE CDHPUY
`E0
`E0
`51CASE SIUDV
`SUINE LARGES
`E0
`E0
`59DAVA FKDCE
`E0
`bOlNYELLIGEN
`E0
`blAN INPUT R
`620M PROGRAR
`E0
`
`HAIR]!
`0.1230
`0.0015
`0.0103
`0.00..
`0.0050
`0.1101
`0.2000
`0.0001
`0.0511
`0.110:
`0.0003
`0.0510
`0.033.
`0.0501
`0.1191
`0.0111
`0.1113
`0.01:5
`0.0051
`0-0307
`0.0193
`0.1000
`0.0051
`0.1004
`0.1315
`0.;019
`0.0130
`0.0515
`0.1203
`0.0012
`0.0050
`0.0311
`0.1392
`0.0300
`0.1003
`0.1185
`0.0030
`0.3333
`0.1300
`0.2950
`0.0000
`2.1200
`0.0000
`0.0511
`0.0101
`0.1030
`0.1321
`0.0103
`0.0766
`0.1513
`0.0950
`0.0250
`0.0302
`0.0031
`0.010.
`0.1101
`
`SEPTEIIEI ll.
`|960
`VAC! 73
`01rrea~11 so 300311011111 0.0015
`DIFFEINTL £0 30031nu1011N 0.3153
`DIFFEINTL E0 100301u110u 0.5003
`DIFFERMIL :0 39100 convur 0.5500
`015510011 E0 zucELIHINAII 0.5003
`DlFFtIMTL £0 103nuucs-xu1 0.5000
`DIFFERNTL £0 05M01s on an 0.0510
`otsrtuan £0 101501v1u0 1 0.0100
`DIFFERNVt
`:0 333510011110 0.3300
`017r10u11 10 1czo~ in: so 0.3030
`01Freau11 60 anisounounv 0.3360
`DIFFERNVL 50 101511015 r0 0.3300
`DIFFERNYL E0 119n11011 Pl 0.3505
`cirrinurL £0 0000000510 I 0.3051
`DIFFERNIL £0 151eunun :51 0.3311
`oissnaurt £0 1330~01000E 0.3110
`011551011 50 253IUUND-OFF 3.1151
`cirrtaurt t0 IOéALGuRIINH 0.3104
`DIFFERNYL £0 learnenastlc 0.3130
`DIFFEINTL £0 110:0000110 0.3030
`DIFFEIMTL 10 11000111 ..
`0.3010
`011510011 50 050 CALCULAV 0.1350
`ulFFEINIL £0 JQonours can 0.1000
`0111:0u11 :0 3000 nt1u00 0.1101
`anFEInIL :0 11300100011: 0.1153
`DIFFEINYL £0 zoostecvlunl 0.1150
`ulifsaurt 00 31|Fuon full 0.1101
`DIFFEIMTL so 103001u10111 0.1003
`011550011 10 zobuNIFVIMG 0.1601
`oirssautL £0 111s1nu10110 0.1003
`DIFFEINIL £0 3010a EXIONE 0.1001
`n1rren~11 £0 213101010110 0.1030
`DIFFEINVL £0 loosECANI In 0.1010
`01rssnu11 :0 303A nave 0» 0.1500
`DIFFEINIL 10 1010161111 c 0.1310
`011520011 :0 11133011 con 0.1315
`olrreau11 :0 Z‘IIEYNDD 10 0.1310
`011120011 10 10301NAlv Al 0.1311
`011511011 00 1510 c1155 0 0.1303
`DIFFERNYt £0 333000101031 0.1300
`DIFFERNYL £0 1101v3100110 0.1103
`DIFFERNIL 60 1100110 005' 0.2131
`011111011 50 31101100»:
`I 0.1101
`DIFFEuNTL £0 131rscnulc31 0.1100
`DIFFEINTL 50 3551 10011»: 0.1111
`01‘sean11 :0 1150101111 c 0.2153
`oxrrsnnvr :0 oecaueursas
`0.1103
`DIFFERNVL :0 ZOIIIEIAIIVE 0.1193
`DIFFEINTL :0 103001111c11 0.1190
`DIFFEINTL 50 30133101 con 0.2101
`DIFFERNTL £0 151sunvsv or 0.1101
`DIFFERNYL :0 1330150311uc 0.1100
`oisreiuvt £0 117COIFUVAII 0.1110
`01rsznu11 £0 10110 11111: 0.1131
`DIFFEINYl £0 100111£ne011 0.1111
`DIFFEINIL £0 13511111100 0.1093
`
`0.9Ioo n
`0-9600 0
`0-1000 0
`0.3100 0
`0.3000 0
`o-Iaoo o
`0.0000 0
`0-0300 0
`0.0100 0
`0.0000 0
`0-7800 0
`0.1000 0
`0-1300 o
`0.1200 0
`0.1000 0
`0.0000 0
`0.0000 1
`0.0000 1
`0.6200 l
`0.0000 1
`0.3000 1
`0.5000 3
`0.3300 0
`9.5100 6
`0.5000 0
`0.3000 0
`0.3000 0
`0.0000 1
`0.0100 1
`0.0000 0
`0.3000 11
`0.3000 11
`0.3000 10
`0.3100 15
`0.3000 11
`0.1000 23
`0.1000 33
`0.1000 30
`1.1100 01
`0.1000 00
`0.1000 13
`1.1050 01
`0.1000 100
`0.1100 135
`0.1000 100
`0.0000 200
`0.0000 251
`0.0000 300
`0.0100 300
`
`MINCREA$NG DOCUMENT
`ORDER
`
`leECREA$NG CORRELAHON
`ORDER
`
`c)HmTOGRAM
`
`FIG. 5. Correlations Between Search Request and Document Collection.
`
`to define a new pair of
`levels
`all possible retrieval
`measures, termed respectively normalized recall and nor—
`malizcd precision. Specifically,
`if Rm is
`the standard
`recall after retrieving j documents from the collection
`
`is equal to the number of relevant docu-
`(that is, if R11)
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all),
`then
`the normalized recall can be defined as
`
`1?nonn == 1‘?
`
`where N is the total number of documents in the col-
`lection.
`
`if Pm is the standard precision after re-
`Similarly,
`trieving j documents from the collection, then a normal-
`ized precision measure is defined as
`[J
`
`1 2 :
`I)norrn == ii?‘
`
`
`
`]__1
`
`13(1)
`
`Rnorm and I’