`Selected Test Results Using the SMART System*
`
`The generation of effective methods far the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic documentretrieval system. The design
`
`af the SMART‘system is first briefly reviewed. The docu-
`mentfile, search requests, and other parametersaffecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of
`the
`retrieval performanceare described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design offully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetis
`
`® Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval. Since
`such an evaluation must of necessity take place in an
`experimental
`situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`© Thig study wae sipported by the National Science Foundation under
`Grant GN-245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difficulties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one,
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment ig then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems,
`
`@ The SMART Retrieval System
`SMARTis a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter-based retrieval systems, the SMART system does
`
`American Documentation, Vol. 16, No.3—July 1965
`
`209
`
`1
`1
`
`PETITIONERS - EXHIBIT 1029
`PETITIONERS- EXHIBIT 1029
`
`IPR2022-00217
`
`IPR2022-00217
`
`
`
`not rely on manually assigned keywords or index terma
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase-generating methods and the like,
`in
`order to obtain the content identifications useful for the
`retrieval process.
`then
`are
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further requirements,
`me
`order a reprocessing of the request under new conditions,
`The newoutput can again be examined and the process
`iterated until the right kind and amount of information
`ure retrieved.
`
`SMARTis thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it May serve as a reasonable prototype for
`fully automatic document, retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest* :
`
`(4) u system for separating English words into
`stems and affizes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`—
`
`a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con-
`cepts in the computerliterature, corresponding
`to about 3000 English word stems);
`these
`coneept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its ‘‘sons,” its
`“brothers,” and any of a set of possible cross-
`references; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`*More detailed) deseriptions of the systems organization are included
`in Refs.
`[ and
`Programming aspecta and complete flowcharts are
`presented in Ref, 4.
`
`210
`
`American Documentation —July 1968
`
`2
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of ‘“cri-
`terion” phrases in such a way that the same
`concept numberis assigned to a large number
`of semantically equivalent, but syntactically
`quite different constructions
`(e.g. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`procedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, suffix dictionary,
`concept hierarchy, statistical phrases, and syn-
`tnetic “criterion” phrases).
`
`The operations of the system are built; around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures the similarity between doeuments and seareh
`reuests, the cut-off value which determines the number
`of documents to be extracted as answers to scarch re-
`quests, and the thesaurussize.
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`tion several times, and making judicious changes in the
`analysis procedures between runs, It is this use of the
`SMARTsystem, as an evaluation tool, which is of par-
`(icular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of the
`present
`report,
`
`
`
`Characteristic
`
`Comment
`
`Number of documents in collection.
`
`Document abstracts in the computer field.
`
`Number of search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`}-— 9 relevant documents
`10 = 30 relevant documents
`
`Technical people and studenta
`
`Number of indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Numberof index terms per document.
`
`Varies greatly depending on indexing
`procedure and document,
`
`Numberof relevant. documents per request
`(a) specific
`(b) general.
`
`Numberof retrieved decuments per request. No cut-off is used to separate retrieved fram
`nonretrieved.
`
`Fie, 1, Test Enyironmen|,
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(avernge) 35
`
`(average) 5
`(uverage) 15
`
`405
`
`® The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to he cdeseribed are summarized in Fig. 1, The
`data collection used consists of a set of 405 abstracts*
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requets are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be releyant
`to each reqitest is equal
`to at
`least ten (for the general requests) or is Jess than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in view of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`An excerpt
`from the document collection, as it is
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`® Practical considerations dictated the use of obstracta rather than full
`documents;
`the SMART system os
`such is not
`restricted to the
`manipulation of abstrecis only.
`
`Fig. 3. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods. The documents are
`listed in deereasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fieation.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret, since the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “veetors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 85 different word stems using the word
`stem analysis (labelled “null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different, concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The numberof index terms
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation —July 1965
`
`211
`
`3
`
`
`
`#TEXT 2MICRO=PROGRAMMING «
`
`$41 CRO=PROGRAMMI NG
`$Re Je MERCER (UNIVERSITY OF CALIFORNIA)
`SUeSe GOVe RESe REPTSe VOL 30 PP 71=72{A)
`
`(AUGUST 155 1958) PB 126893
`
`MICRO=PROGRAMMING e« THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`QF SEQUENCES CF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`PULSE TIME IS DESCRIBED e
`
`#TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`STHE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`SMe Me ASTRAHAN (TRAM CORP.)
`SIBM Je RESe AND DEVs. VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS e THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED e LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WOPDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`e HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES e THESE MEMORIES WILL SERVE AS INDEXES TO THE
`MELUGE OF TECHNIC4L LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED e DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`e MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST» LARGE MEMORIES e
`
`Fic. 2. Typical Document Prints.
`
`ANSWERS TN REQUESTS FUR DOCUMENTS ON SPECIFIED TOPICS
`
`SEPTEMBER 28, 1964
`
`PAGE 83
`
`CURRENT REQUEST = ®LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
` SLIST OLFFERNTL EQ NUMERICAL DIGITAL SOLN OF OIFFERENTIAL EQUATIONS
`
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF ORDINARY
`OLFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS » EVALUATE THE VARIOUS INTEGRATION PAOCEODURES (&.G. RUNGE-—
`KUTTA, MILNE-§ METHOD) WITH RESPECT TO ACCURACY, STABILITY» AND SPEED
`
`ANSWER
`BE4STABILITY
`
`CORRELATION
`0.6675
`
`IDENTIFICATION
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`W. Es. MILNE AND BR. R. REYNOLDS (OREGON STATE COLLEGE)
`Js ASSOC. FOR COMPUTING MACH, WOL 6 PP L96-203 (APRIL» 1959)
`
`ANSWER
`
`CORRELATION
`
`IOENTIFICATION
`
`J6SSPMULATIN
`
`0.5758
`
`SIMULATING SECOND-ORDER EQUATIONS
`D, G. CHADWICK (UTAH STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6, 1959)
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`200SOLUTION
`
`0.5663
`
`ANSWER
`
`CORRELATION
`
`3920N COMPUT
`
`0.5506
`
`ANSWER
`
`CORRECATION
`
`3BGFLIMINATI
`
`0.5403
`
`SOLUTEON OF ALGEBRAIC ANO TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`G.N. LANCE (UNIV. OF SOUTHAMPTON)
`Js ASSOC. FOR COMPUTING MACH. WOL 65 PP S7-10Ls JANa, 1959
`IDENTIFICATION
`
`ON COMPUTING RADIATION INTEGRALS
`Re Ce HANSEN (HUGHES AIRCRAFT CO.) t. Le BAILIN (UNIV. OF SOUTWERN
`CALIFORNIA, AND BR. W. RUTISHAUSER (LITTON ENOUSTRIES,
`INC.)
`COMMMUN. ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 L[FEBRUARY, 1959)
`TDENTLFICATION
`
`ELIMINATION OF SPECIAL FUNCTIONS FROM DIFFERENTIAL EQUATIONS
`Js Ep POWERS (UNIV. OF OKLAHOMA)
`COMMUN. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH, 1959)
`
`Fic. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation ~ July 1965
`
`4
`
`
`
`L3ICALC
`176S0L
`356VEL
`
`18
`12
`12
`
`6
`TLEVAL
`LT795TD 12
`357VYAW 4
`
`SEPTEMBER 26, 1964
`PAGE 17
`92DIGI 12
`1B1QUA 24
`3B4TEG 12
`
`REGULAR
`THESAURUS
`
`6
`158ASE
`STCHNG 6
`LOTOGN 3
`HUY
`L2iMEM 4
`149P0G 36
`L76SOL 12
`2i2zSiz i2
`338MCH 8
`SOLURD
`4
`
`4c
`
`JOALPH
`#LacHe
`T2EKEC
`LO6NgU
`11 94uT
`147S¥5
`1é80RC
`2100UT
`3325€F
`41L9GEM
`CoMPUT
`GIVE
`PARTI
`STABIL
`
`4
`L6BASC
`S30ATA 6
`G3aMaP
`&
`108400 12
`130MEK 4
`USBREL 12
`L7esvYm 18
`21600"
`340LET
`3
`5O08ACT
`6
`12) DIFFER
`OIGIT
`12
`12
`=INTEGR 12
`METHOD L2
`12
`PROCEO 12
`RUNGE-
`WARIE
`12
`12)
`USE
`12
`DESIGN
`comPur
`CHARAG
`La COMPUTER
`BAS
`36 DESCRI 12
`nz|Mult
`ENABLE
`FORM
`12
`DIRECT
`ESTIM
`12
`EXPLAL 12
`THESAURUS
`GIVE
`HANDLE
`TLLUST
`12
`INDEPE 12
`INFORM L2
`OPER
`oro
`PLANE
`12
`MACHIN
`12 ORIENT 12
`Pos
`POSS
`PROBLE
`36
`PROGRA
`RECOGN 12
`SCANN
`SIMPLE
`STRUCT 12
`SIZE
`STORE
`12
`TECHNI
`TOWARD
`TRANSF
`12 USING 12
`WAITT
`22
`
`OCCURRENCES OF CONCEPTS AND PHRASES [N DOCUMENTS
`DOCUMENT
`CONCEPT, OCCURS
`BALGOR
`4ExaCcT
`lz
`bloat
`lé
`L43uTl
`249ELI
`2T4O1F
`4205Ta
`SOSAPP
`SLOCAT
`F2REGu
`SSAANT
`930RcR
`b1L20PE
`146/04
`163Ea5
`JBTOIR
`32TAST
`43S01F0
`
`44 43
`
`15
`L2
`46
`
`21NPUT
`a1erF-
`STDSCBH
`STENBL
`LloautT
`L43UTI
`4e2R0F
`1A25av
`274GEM
`S4oJET
`
`OLFFERNTL EO
`
`LA COMPUTER
`
`OLFFERNTL EQ
`
`ACCUR
`QU
`NUMER
`sOLuT
`
`ALGOaT
`EVALU
`ORDIN
`SPEED
`
`920IG1 12
`
`DIFFEMNTL EG
`
`1A CUMPUTER
`
`SEMACT
`1104uUT
`atte *
`2INPUT
`16645C
`S3DATA
`Simap
`108L00
`LIO4EA
`LS6REL
`1TaSv4
`2Lesiz
`302L00
`3465ET
`
`SaLGUA
`143UTT
`2T401F
`3947S
`SLOCAT
`a1LA1T
`570ScB
`BTENBL
`LLvauT
`143uTl
`Le 2Rur
`182$av
`216008
`327AST
`Fso0F)
`
`13CALC
`17650L
`S56VEL
`#286578
`
`1OALPH
`32RE0U
`S9AMNT
`930R0K
`1L20PE
`146/04
`L63EAaS
`La7TOIR
`
`16
`12
`12
`4
`42
`3
`
`6
`18
`12
`
`6
`TlevAL
`LT9STO 12
`357Yau
`4
`SOSAPP
`14CODR 72
`#LMCHO
`68
`TZEMEC
`6
`lo6nQu
`6
`Li9AUT
`A
`L4?Sy¥s 12
`LoeeoRD
`4
`200Da-
`276GEM 18
`338MCH 8
`
`soldao 4 Saaact
`
`&
`
`Fio. 4, Typical Indexing Products for Three Analysis Procedures,
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`paragraphs.
`
`® Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system's ability to retrieve wanted information and
`to reject nonwanted material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`* Precision has also been eniled ‘'releyance,'’ notably in the literature
`of the ASLIB-Cranfield projevt.®
`7 It has, however, been conjectured that an inverse relationship existe
`between recall and precision, such thot high reeall automatically impliea
`low precision and vice veraa.
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A Jarge number of measures have been proposed in the
`past for the evaluation of retrieval performance. Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects verylittle that is rele-
`vant but mayalso retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informa-
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall,
`Ideally,
`one would of course hope for both high recall and high
`precision.
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding,are eliminated
`since all
`indexing operations are automatic;
`
`American Documentation— July 1965
`
`213
`
`5
`
`
`
`search
`the same reason, conventional
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`who originally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a) relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document. is
`assumed either wholly relevant or wholly
`irrelevant (in case of doubt relevance is as-
`sumed); and
`(c) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved,
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the dif_i-
`culties and the appropriateness of the various operations
`listed in part 1.68 The first task,
`in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control,
`Twosolutions have been suggested, each of which would
`base the relevance decisions on Jess than the whole docu-
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most document collections, this turns out
`to be a difficult task,
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`3. The Cut-off Problem
`
`The other major problem is caused by the require-
`mentto pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seems to be extraneous
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut-off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`tunately, this is not difficult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients, as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre-
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered documentlist of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`documents, when these documents are arranged in de-
`creasing correlation order. A funetion of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precisian*
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`* The measures described in this part were suggested by J. Rocchic.®
`
`214.=American Documentation — July 1965
`6
`
`
`
`ROW CORRELATIONS OF
`MATRIX
`
`PHRASE-OOCUMENT ives==6PAGE 73
`SEPTEMBER 28,
`a)
`14 COMPUTER
`DIFFERNTL
`0.1234
`DLFFEANTL EG J845TABILITY 0.0675
`0.9600 oO
`DOIFFEANTL
`21 CHO-PROGA
`‘O.UATS
`eG
`OLFFEANT( EQ JA0SIMULATIN 0.5758
`9.9400 0
`DIFFERATL
`EQ£0
`JTHE ROLF OF
`O-0294
`OUFFERNTL EG 200SOLUTION 0.5663
`0.9400 0
`6.0044
`OTFFERWTL
`SA NEW CLASS
`DIFFEANTL EQ 3920N COMPUT 0.5508
`9.9200 0
`OLFFERATL
`eoea
`SANALYSES OF
`0.0658
`OLFFEANTL EG JOGELIMINATE 0.5483
`0.000 0
`DIFFERATL
`OesTaL
`SGENLRALIZED
`OLFFERNTL EO 2OSRUNGE-KUT 0.5444
`8600 6
`DIFFEANTL
`a]€a
`TAN [MPROVEC
`O.z2090
`OLFFEANTL EQ BSNOTE ON AN 0.4510
`o-8600 0
`DIFFFANTL
`0.086)
`SSHNAT=CUT #
`OLFFEMNTL EQ L9ZSCLVING E 0.4106
`o.8s00 5
`DEFFERNTL
`£0
`OPERATION &
`OedbL1
`DOLFFEANTL EQ 3SOSTABILEZA 0.3986
`0.6200 0
`DIFFERNTL
`&0
`LCACCUHATE T
`Meh LOZ
`OLFFEANTE FQ LC20N THE 50 0.3986
`o.0000 0
`DIFFERNTL
`O.0083
`J261GTTatL CO
`OLFFEANTL EQ 387ROUNWDARY
`0.3986
`0.7500 0
`DIFFERNTL
`]
`LIWALF-a0DER
`0.0508
`OLFFLAMTL EQ 202S5TABLE PR 0.3906
`0.7600 0
`DIFFERNTL
`fa
`TBCONTROL AP
`OFFFCANTL EQ Z2Z9MATAIX PR C.3505
`0. 7400 0
`OUFFERATL
`a]
`LTTHE FUNCTE
`CIFFIRNIL EQ GAPROPOSED 4 c.3451
`0.7200 0
`DIFFERNTL
`EG
`IgAN ACCURAT
`DIFFERNTL EQ Z51ERRUR EST 0.3129
`0.7000 0
`OLFFRANTL
`Eo
`LRESISTANCE
`OLFFEANTL EQ 234ANALOGUE 0.3176
`0.6800 0
`DIFFERNT
`eG
`2OCIFFEAENTI
`OIFFERNTL EQ 253RUUND-OFF 3.3152
`0.6600 1
`OLFFERANTL
`E0
`214N ERROR-C
`CIFFEANTL EO IBGALGURITHM 0.4144
`O-8400 1
`DIFFERNT
`ec£0
`22LATCHING C
`DIFFERNTL €&
`LASTHENRETEC 025198
`0.6200 1
`23¥ TNT ATURE
`DIF FEWNTL
`OMFFERNTL EQ L2GCOMPUTER 0.3034
`D.6000 1
`DIFFERNTL
`Eo
`FASOME NOVEL
`OLFFEANTL EG 224eDEP1 .. 0.3028
`0.5800 1
`OLFFERWTL
`EQ
`758 WE TRAN
`OLFFERNTL EQ 454 CALCULAT 0.2958
`0.5600 3
`3]Eo
`OGFFRRNTL
`JeSEmICOnhuc
`OIFFEANTL EQ 390MONTE Can 0.2866
`0.5400 &
`BIFFEANTL
`277EN MLGAPU
`OLFFEANTL EQ 3684 METHOD 0.2707
`9.5209 &
`OLFFFRNTL
`EQ£9
`2O0ESIGR OF
`OLFFEANTL EQ ITSAUTOMATIC 0.2753
`0.5000 6
`OLFFERATL
`29INVESTIGAT
`OLFFEANTL EQ J0HELECTAON] 0.2750
`0.4000 6
`3a TRANSIST
`DIFFERNTL
`EQco
`OFFFERNTL EQ JISFROM FORM 0.2741
`0.4600 6
`3ERAGNETIC C
`OLFFERNTL
`OFFFEANTL EQ 249MATHEMATI
`8
`o.esog 7
`eG
`DiFFERNTE
`J2AWALOGUE T
`OIFFEARTL EQ 20bUNTFY ING
`O.s200 7
`DIFFERNTL
`EQ
`33THE US! OF
`OIMFFEANT, EQ 21TSIMULATIO
`6.4000 8
`J4€ND-F IPED
`OIFFERATL
`EGEQ
`OIFFEANTL EQ 2670N EXPONE
`0.3800 12
`354 LTA0- SHA
`DIFFERNTL
`OIFFERNTL EQ Zl3PREDICTIO
`0.3600 12
`JeFUNDAMENTA
`OUFFEANTL
`Eo
`OIFFEANTL EQ LOBSECANT AO
`0.3400 14
`37a HIGH-SPe
`OLFFEANTL
`EQ
`DIFFERNTL E£Q 3934 NOTE ON
`0.3200 15
`Eq
`VEAUTOMATIC
`DLFFERNTL
`OLFFEANTL EQ LOLDIGITAL C
`0.3000 21
`£0
`OLFFEANTL
`SI CORALANT CAT
`OMFFERNTL EQ LTLSMALL COW
`0.2800 23
`424 DIMECT &
`Eo
`OLFFERATL
`OLFFEANTL EO 24aNETHOD FO
`0.2800 33
`43TWe DATA C
`OLFFERNTL
`eoeo
`OLFPEANTL EQ ZO38INARY AR O.291)
`0.2400 34
`44aCCURACY C
`OLFFERNTL
`DUFFERNT! EG 2524 CLASS O 0.7303
`c.2200 47
`EO
`OLEFeRNTL
`DIFFERNTL EQ 365NUMERTCAL 0.2300
`d.2000 40
`§5a CALCULAT ©
`£9
`DIFFLANTL
`SoRA0T0) OTRE
`ODIFFEANTL EQ 210EVALUATIO 0.2205
`o.1noo 73
`OLFFEANTL
`Eo
`ATSPECIAL PU
`DIFFERNTA EQ 22004TA PREP 0.2287
`2aL65d a7
`AGA BUSINESS
`DIFFERNTL
`i)
`OIFFEANTY. EQ 2ANALOGUE 1 0.2282
`G.140c Ice
`494 DUAL MAS
`DIFFERNTL
`FQ
`OLFFEANTL EQ LOVTECHNECAL 0.2280
`ColzZ00 145
`DIFFEANTL
`Ea
`SOACCURACY C
`DIFFERNT EQ 3554 ROUTINE G.2272
`0.1000 146
`EG
`OUFFEGNTL
`SZeaTHENd ,
`OLFFERWTL EQ 2ES0I1GITAL ¢ C.225¢
`Ja3800 200
`S34 COMPUTER
`DIFFEANTL
`Eg
`OIFFERNTL EQ S9COMPUTERS
`0.2249
`Ge0609 257
`OLFFE@nTL
`Eo
`Sean BUTAMAT
`DIFFERWTL EQ ZOLITERATIVE 0.2198
`0.0400 304
`DIFFERNT
`Eo
`SSMUTOMATIC
`OUFFEANTL EQ L9SARTIFIECTIA 0.2196
`O.0700 348
`OLFFEGNTL
`ea
`S4THE COMPUT
`OIFFERNTL EQ J6LS54INT COM 0.2167
`OLFFEANTL
`a)
`STCASE STUDY
`OLFFERNTL EQ 257SURVEY OF 0.2181
`Eo
`SOOTHE LARGES
`DCFFEANTE
`DIFFEANTL EQ 23S0PEAATENG 0.2180
`OCFFERNTL
`eQ
`S9DATA PROCE
`CLFFERWTL EQ LITCOMPUTATI 0.2170
`CLIEFEANTL
`BOUMTELLIGEN
`CIFFERNTL EQ ZOTAN APPLIC 0.2162
`OLFFERNTL
`)
`6LAN INPUT R
`OIFFEANTL EQ 200]FFEREMTI O,2422
`670M PROGRAM
`DIFFESNTL
`EQ
`DIFFERNTL EQ Z235FREEZING 0.2093
`a) INGREASING DOCUMENT
`b) DECREASING CORRELATION
`ORDER
`ORDER
`
`o.09se
`
`
`
`c) HISTOGRAM
`
`
`
`aa
`
`0a
`
`Fic. 5. Correlations Between Search Request and Document Collection.
`
`document at a time until all documents in the whole
`collection have been retrieved. Finally, all R’s and P’s
`are averaged to obtain the normalized measures.
`In practice,
`this manner of proceeding would be ex-
`tremely tedious for large document collections, even if
`the calculations were done by computer.
`It may, how-
`ever, be shown by reasonably straightforward algebra
`that the normalized recall and normalized precision may
`he rewritten, respectively, as
`
`nt
`
`v3
`
`i
`
`j=1
`
`Dun Ly
`
`i=t
`i=l
`Ruy i= nt (N—n)
`
`a
`
`1
`Reann = N
`
`and
`
`» Inv, — > i
`N
`Prorin —_ N | Puy) re -—a (2)
`yl
`In n'(N—nj!
`
`—
`
`=
`
`1
`
`where
`
`rr,
`
`is the rank (in decreasing correlation order
`with the search request) of the ith relevant
`document in the collection,
`
`n is the total number of relevant documents
`in the collection,
`
`American Documentation —July 1965
`
`215
`
`7
`
`levels to define a new pair of
`all possible retrieval
`Measures, termed respectively normalized recall and nor-
`malized precision. Specifically,
`if A,,,
`is
`the standard
`recall after retrieving j documents from the collection
`(that is, if R,,, is equal to the number of relevant docu-
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all), then
`the normahzed reeall ean be defined us
`
`Rio ri 7
`
`N
`
`1TD Rin
`
`j=l
`
`where N is the total n