throbber
The Evaluation of Automatic Retrieval Procedures—
`Selected Test Results Using the SMART System*
`
`The generation of effective methods for the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic documentretrieval system. The design
`
`of the SMARTsystem is first briefly reviewed. The docu-
`mentfile, search requests, and other parameters affecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of the
`retrieval performance are described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design of fully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetts
`
`® Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`More specifically, it is desired to compare theeffective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval.
`Since
`such an evaluation must of necessity take place in an
`experimental situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`© This study was supported by the National Science Foundation under
`Grant GN-245,
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difficulties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one.
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment is then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall’ and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems.
`
`e The SMART Retrieval System
`
`SMARTis a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter-based retrieval systems, the SMART system does
`
`American Documentation, Vol. 16, No. 3— July 1965
`
`209
`
`EX1027
`
`EX1027
`
`1
`
`

`

`not rely on manually assigned keywords or index terms
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase-generating methods and the like,
`in
`order to obtain the content identifications useful for the
`retrieval process.
`are then
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further
`requirements,
`order a reprocessing of the request under new conditions.
`The new output can again be examined and the process
`iterated until the right kind and amount of information
`are retrieved.
`
`SMARTis thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it may serve as a reasonable prototype for
`fully automatic document retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest* :
`
`(a) a system for separating English words into
`stems and affizes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`(b) a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con-
`cepts in the computerliterature, corresponding
`to about 3000 English word stems);
`these
`concept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its ‘‘sons,”
`its
`“brothers,” and any of a set of possible cross-
`references; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`arehy, more specifie ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`* More detailed descriptions of the systems organization are included
`in Refs.
`1 and 2. Programming aspects and complete flowcharts are
`presented in Ref. 3.
`
`210
`
`American Documentation — July 1965
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of “eri-
`terion” phrases in such a way that the same
`concept numberis assigned to a large number
`of semantically equivalent, but syntactically
`quite different constructions
`(eg. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`precedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if
`the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, suffix dictionary,
`concept hierarchy,statistical phrases, and syn-
`tactic “criterion” phrases).
`
`The operations of the system are built around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures tke similarity between documents and search
`rezuests, the cut-off value which determines the number
`of documents to be extracted as answers to scarch re-
`quests, and the thesaurussize.
`The SMARTsystems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`tion several
`times, and making judicious changes in the
`analysis procedures between runs. It is this use of the
`SMARTsystem, as an evaluation tool, which is of par-
`ticular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of the
`present report.
`
`2
`
`

`

`Characteristic
`
`Comment
`
`Numberof documentsin collection.
`
`Documentabstracts in the computerfield.
`
`Numberof search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`0-— 9 relevant documents
`10 — 30 relevant documents
`
`Technical people and students
`
`Numberof indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Numberof index terms per document.
`
`Varies greatly depending on indexing
`procedure and document.
`
`Numberof relevant documents per request
`(a) specific
`(b) general.
`
`Numberof retrieved documents per request. No cut-off is used to separate retrieved from
`nonretrieved.
`
`Fic. 1. Test Environment.
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(average) 35
`
`(average) 5
`(average) 15
`
`405
`
`@ The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to be described are summarized in Fig. 1. The
`data collection used consists of a set of 405 abstracts*
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requests are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be relevant
`to each request
`is equal
`to at
`least ten (for the general requests) or is less than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer
`field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in view of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`An excerpt
`from the document collection, as it is
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`* Practical considerations dictated the use of abstracts rather than full
`documents:
`the SMART system as
`such is not
`restricted to the
`manipulation of abstracts only.
`
`Fig. 8. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods. The documents are
`listed in decreasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fication.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret,
`smmce
`the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “vectors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 35 different word stems using the word
`stem analysis (labelled ‘null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The number of index terms
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation — July 1965
`
`211
`
`3
`
`

`

`*TEXT 2MICRO=PROGRAMMING e
`
`SMI CRO-PROGRAMMING
`$Re Je MERCER (UNIVERSITY OF CALIFORNIA)
`$UeSe GOVe RESe REPTSe VOL 30 PP 71-72(A)
`
`(AUGUST 159 1958) PB 126893
`
`MICRO-PROGRAMMING e THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`OF SEQUENCES OF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`PULSE TIME IS DESCRIBED .
`
`*TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`$THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`$Me Me ASTRAWHAN (IBM CORPe)
`$IBM Je RESe AND DEVe VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS e THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED « LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WORDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`e HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES e THESE MEMORIES WILL SERVE AS INDEXES TO THE
`MELUGE OF TECHNICAL LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED « DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`e MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST» LARGE MEMORIES o«
`
`Fic. 2. Typical Document Prints.
`
`PAGE 83
`SEPTEMBER 28, 1964
`ANSWERS TO REQUESTS FOR DOCUMENTS ON SPECIFIED TOPICS
`CURRENT REQUEST — ®#LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
`*LIST OLFFERNTL €Q NUMERICAL DIGITAL SOLN OF OLFFERENTIAL EQUATIONS
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF OROINARY
`DIFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS » EVALUATE THE VARIOUS INTEGRATION PROCEDURES (£.G. RUNGE-~
`KUTTA, MEILNE-S METHOD) WITH RESPECT TO ACCURACY, STABILITY, AND SPEED
`
`ANSWER
`
`CORRELATION
`
`IDENTEFICATION
`
`BE4STABILITY
`
`0.6675
`
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`We Eo MILNE AND Ro Re REYNOLOS (OREGON STATE COLLEGE)
`Je ASSOC. FOR COMPUTING MACH. VOL 6 PP 196-203 (APRIL, 1959)
`
`ANSWER
`
`CORRELATION
`
`TOENTIFICATION
`
`Z6CSEMULATIN
`
`0.5758
`
`SIMULATING SECOND-ORODER EQUATIONS
`O. G. CHADWICK (UTAH STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6, 1959)
`
`ANSWER
`
`CORRELATION
`
`IDENTIFICATION
`
`200SOLUTION
`
`9.5663
`
`ANSWER
`
`CORRELATION
`
`3920N COMPUT
`
`0.5508
`
`ANSWER
`
`CORRELATION
`
`3B6FLIMINATI
`
`0.5483
`
`SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`GeNe LANCE (UNIV. OF SOUTHAMPTON)
`Je ASSOC. FGR COMPUTING MACHes VOL 6, PP 97-101,
`IDENTIFICATION
`
`JANee 1959
`
`ON COMPUTING RAQIATION INTEGRALS
`R. Co HANSEN (HUGHES AIRCRAFT CO-)}, Le Le BAILIN (UNIV. OF SOUTHERN
`CALIFORNIA, AND Ro We. RUTISHAUSER (LITTON INOUSTRIES,
`INC.)
`COMMMUN. ASSOC} FOR COMPUTING MACH. VOL 2 PP 28-31 (FEBRUARY, 1959)
`IDENTIFICATION
`
`ELIMINATION OF SPECIAL FUNCTEONS FROM DIFFERENTIAL EQUATIONS
`Je Eo POWERS (UNIV. OF QKLAHOMA)
`COMMUN. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH, 1959)
`
`Fic. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation —~ July 1965
`
`4
`
`

`

`GOCCURRENCES OF CONCEPTS AND PHRASES IN DOCUMENTS
`DOCUMENT
`CONCEPT,CCCURS
`DIFFERNTL EG
`4EXACT 12
`BALGOR 12
`1LOAUT 12
`143uTI 12
`269ELI
`4
`27401F 36
`4288STBR
`4
`SQOSAPP 24
`SLOCAT 12
`1OALPH
`32REQU 3
`41aCHO
`T2EXEC
`SQAMNT 24
`930RCR. 10
`1LO6NQU
`LL20PE
`6
`11L9aUT
`147SYS
`146J08 18
`1680RC
`163EAS 12
`2100UT
`1B7OT1R 12
`B32SEE
`327AST 12
`419GEM
`350IFO0
`6
`comeuT
`ALGORI 12
`EVALU
`12) GIVE
`ORDIN
`12
`PARTI
`SPEED 12
`STABIL
`comPutT
`CHARAC 12
`ENABLE L2
`ESTIM
`TLLUST
`HANDLE 12
`OPER
`12
`ORO
`PROBLE
`~=POSS
`12
`SIMPLE 12
`SIZE
`TRANSF
`TOWARD 12
`
`-
`
`eeCNPENDPODN
`
`1A COMPUTER
`
`OIFFERNTIL EQ
`
`1a COMPUTER
`
`4
`2ENPUT
`3
`3181T”
`57DSCB 15
`STENBL 12
`L1LOAUT 36
`143UTI 12
`162R0F
`6
`182SAV 4
`276GEM 18
`346JET
`6
`accuR 12
`€QU
`24
`NUMER
`12
`SOLUT
`12
`BAS
`12
`OIRELT 12
`GIVE
`12
`MACHIN 24
`pos
`12
`SCANN
`12
`TECHNI L2
`
`T3CALC
`176SOL
`356VEL
`
`18
`12°
`12
`
`6
`TLEVAL
`«179STO 12
`357YAW 4
`
`SEPTEMBER 2B, 1964
`PAGE 17
`92DIGI 12
`161QUA 24
`3B4TEG 12
`
`6
`15 BASE
`6
`1LéB8ASC
`47CHNG 6
`530ATA 6
`TTLIST
`4
`LOTOGN 3¢
`108100 1
`oamee
`4
`130MEA
`121MEM 4
`15QREL 12
`149P0G 36
`176SOL 12
`178SYM 16
`21600M 12
`212812 12
`338MCH
`8
`340LET
`3
`5010R0
`4
`5O08ACT
`6
`12. DIFFER 24
`OIGIT
`12
`METHOD 12
`12)
`INTEGR 12
`RUNGE- 12
`12
`PROCEOD 12
`12)
`USE
`12
`VARIE
`12
`36 DESCRI 12 DESIGN 12
`12
`EXPLAT 12
`FORM
`12
`12
`INDEPE 12
`INFORM 12
`12 ORIENT 12)
`PLANE
`12
`36
`PROGRA 36
`RECOGN 12
`24
`STORE
`12
`STRUCT 12
`12) USING
`12 WRITT
`12
`
`REGULAR
`THESAURUS
`
`NULL
`THESAURUS
`
`62
`
`CEFFERNTL EG
`
`92DIGI 12
`BLQUA
`24
`
`13CALC
`SEKACT 12 @ALGUR 12
`18
`6
`7lEVAL
`176SOL
`TLOAUT 22
`143UFT Le
`12.
`179STO 12
`S9ELL
`4
`27401F 36
`356VEL
`12
`357Yaw 4
`428STB
`4
`SO5APP 24
`GT9CtF 72). 394TeG 12
`4
`TOALPH
`LA COMPUTER
`2INPUT
`SLOCAT 12
`412
`=14CODR 72
`158ASE
`6 sjar|STAT.
`168ASC
`31BIT
`3
`6
`32REQU
`3
`41MCHO
`8
`47CHNG
`6
`PHRASE
`S9AMNT
`S3DATA
`S7TBSCB I5
`6
`24
`‘T2EXEC
`6
`TTLIST
`4 PHRSEST
`LOOK-UP
`930RDR
`LO7OGN 30
`&3MAP
`STENUL 12
`10
`«1l06NQU
`6
`
`1120PE
`6
`I9AUT 8
`106100 1
`LLQAUT 36
`121MEM 4
`
`4
`L30HEA
`143UTI 12
`146J08
`149P0G 36
`18
`=L4&7SYS 12
`
`163EAS
`1680RD 4
`12
`176SOL 12
`15@REL 12
`162RUF
`6
`
`200DA- 72
`12
`178SY4 18
`182SAV 4
`1le7orRr
`212812 12
`276GEM 18
`36
`21600 12
`9°0G
`302L00 72
`327AST 12 G3ZSEF48)338MCH 8
`J46SET
`6
`3501F0
`6
`ST9GEM 6
`SOLGRO 4
`508ACT 6
`
`
`
`62
`
`Fia. 4. Typical Indexing Products for Three Analysis Procedures.
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`paragraphs.
`
`® Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system’s ability to retrieve wanted information and
`to reject nonwanted material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last. mentioned factors
`
`© Precision has also been called ‘‘relevance,”’ notably in the literature
`of the ASLIB-Cranfield project.®
`{It has, however, been conjectured that an inverse relationship exists
`between recall and precision, such that high recall automatically implies
`low precision and vice versa,
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A large number of measures have been proposed in the
`past for the evaluation of retrieval performance.* Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects very little that is rele-
`vant but mayalso retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant inferma-
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall.
`Ideally,
`one would of course hope for both high recall and high
`precision.t
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large numberof extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding, are eliminated
`since all
`indexing operations are automatic;
`
`American Documentation — July 1965
`
`213
`
`5
`
`

`

`search
`reason, conventional
`the same
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`whooriginally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a)
`
`relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document is
`assumed either wholly relevant or wholly
`irrelevant
`(in case of doubt relevance is as-
`sumed); and
`(ec) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved.
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the diffi-
`culties and the appropriateness of the various operations
`listed in part 15-8 The first task, in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control.
`Two solutions have been suggested, each of which would
`base the relevance decisions on less than the whole docu-
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most documentcollections, this turns out
`to be a difficult task.
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`214
`
`American Documentation — July 1965
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the documentcollection used in connection with
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`8. The Cut-off Problem
`
`The other major problem is caused by the require-
`ment to pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seemsto be extraneous
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut-off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`tunately, this is not difficult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients,
`as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre-
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered documentlist of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`documents, when these documents are arranged in de-
`creasing correlation order. A function of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precision*
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`* The measures described in this part were suggested by J. Rocchio.®
`
`6
`
`

`

`ROW CORRELATIONS OF PHRASE-DOCUMENT MATRIX
`OIFFERNTL EO 14 COMPUTER
`0.1234
`OIFFERNFIL EQ 2MICRO-PROGR O.UATS
`DIFFERNTL EQ ATHE ROLF OF 0.0293
`DIFFERNTL EQ 4A NEW CLASS 0.0844
`OLFFERNTL EQ SANALYSIS OF 0,0658
`OLFFERNTL EQ 6GENERALIZED O.S741
`DIFFERNTL EQ TAN LMPROVEC 0.2090
`OUFFERNTL EQ BSHORT-CUT # 0.0861
`OLFFERNTL EQ SOPERATION & 0.J61L1
`OIFFERNTL EQ ICACCURATE [ 9.1192
`OTFFERNTL EO L2CIGITAL CO 0.0863
`OIFFERNTL EQ }3HALF-ADDER 0.0548
`DIFFERNTL EQ LOCMNTROL AP 0593346
`OUFFERNTL EO 17THF FUNCTI 0.0587
`DIFFERNTL EG ISAN ACCURAT 0.1397
`DIFFERNFL EQ LORESISTANCE 0.0177
`OLFFERNTL EG ZOCIFFERENTI Oo2123
`DIFFERNTL EQ 21AN ERROR-C 0.7105
`OIFFERNTL EG 22LATCHING € 0.0057
`DIFFERNTL EQ 23MINJATURE
`0.0307
`OLFFERNTL EQ 24SOME NOVEL 0.0199
`OLFFERNTL EQ 25A NEW TRAN 0.1068
`OLFFERNTL EQ 26SEMJCONNUT 0.6653
`DIFFERNTL EQ 27TEN MEGAPU 0.1004
`OLFFERNTL EG 2B80ESIGN OF Col3TS
`OIFFERNTL EQ 29INVESTIGAT 0.5879
`DIFFERNTL EO 3CA TRANSIST 0.0736
`OIFFERNTL EO 31MAGNETIC C 0.0575
`DEIFFERNTL EQ 32ANALOGUE I C2283
`DIFFERNTL EQ 33THE USF OF 9.902
`DIFFERNTL EQ 34ENC-FIRED 0.0456
`OIFFERNTL EQ 354 LOAD-SHA 0.0331
`OLFFERNTL EQ 36FUNDAMENTA ©. 3392
`OFFFERNTL EQ 374 HIGH-SPEt 0.4364
`OLFFERNTL EQ 3J8AUTOMATIC
`0.1047
`DIFFERNTL EQ SICOMMUNICAT 0.1285
`CIFFERNTL EQ 424 DIRECT R C2439
`DIFFERNTL €Q 43THe DATA C C.3333
`OLFFERNTL EO 44aCCuRACY C 0.1399
`DIFFERNTL EQ 45A CALCULAT 0.2958
`DEFFLRNTL EO S6RADLU DIRE 0.0980
`OIFFERNTL EQ 4TSPECEAL PU 5.1268
`DIFFERNTL EQ 486A BUSINESS 0.0086
`DIFFERNTE £Q 494 DUAL MAS 0.05745
`OLIFFERNTL EQ SOACCURACY C 0.06648
`OIFFEANTL EQ S2¢ATHENA »
`041035
`DIFFEANIL EQ 534 COMPUTER 0.1327
`DIFFERNTL EQ S&AN AUTOMAT 0.0763
`DIFFERNTL EQ SSAUTOMATIC
`0.9746
`OLFFERNTL EQ SASTHE COMPUT 9.1513
`OUFFERNTL EO 57CASE STUDY 0.095C
`OIFFERNTL EQ SBTHE LARGES 0.0256
`OLFFERNTL EQ SQDATA PROCE 0.9302
`CEFFERNTL £Q GOINTELLIGEN 0.0791
`OIFFERNTL EO 61AN INPUT R 0204C4
`DIFFERNTL EQ 620N PROGRAM 0.1181
`
`PaGE 73
`SEPTEMBER 28, 1964
`OIFFERNTL EO 384STABILITY 0.6675
`OLFFERNTL EQ 360SIMULATIN 0.5758
`OIFFERNTL EQ 200SOLUTION 0.5663
`DIFFERNTL EQ 3920N COMPUT 0.5508
`OLFFERNTL, EQ JBGELIMINATI 0.5483
`OLFFERNTL EQ LOIRUNGE-KUT 0.5444
`OIFFERNTL EQ BSNOTE ON AN 0.4510
`OLFEERNTL EQ 192SOLVING E 0.4106
`DIFFERNTL EQ 35@STABILIZA 923966
`BIFFEANTL EQ 1C20N THE SO 0.3986
`DIFFERNTL EQ 387ROUNDARY 0.3966
`OLFELANTL EQ 202STABLE PR 0.3906
`DIFFERNTL EQ 229MATRIX PR 0.3505
`CIFFIRNIL £O BAPROPOSED M C.3451
`OFFFERNTL EO 251ERRUR EST 063329
`OLFFEANTL EQ 234ANALOGUE 0.3176
`OLFFERNTL EQ 253QUUND-OFF 9.3152
`CIFELRNIL tO 186MLGURITHM 0.5144
`DIFFERNTL tu LOITHENRETIC 065136
`OIFFERNTL EQ 126COMPUTER 0.3034
`OIFFERNTL EQ 226¢DEPI 22
`0.3028
`OLFFERNTL EQ 454 CALCULAT 0.2958
`GIFFEANTL EQ 390MONTE CAR 0.2866
`OLFFERNTL EQ 368A METHOD 0.2787
`CEFFERNTL EQ 173AUTOMATIC 0.2753
`OLFFERNTL EQ 306ELECTRONI 0.2750
`DIFFERNTL EQ 3L8FROM FORM 0.2741
`OIFEERNTL EQ 249MATHEMATI 0.2683
`DIFFEANTL EQ 266UNIFYING 0.2662
`OLFEERNTL EQ Z17SIMULATIO 0.2665
`OIFFERNTL EQ 3670N EXPONE 0.2661
`OIFFERNTL EQ 213PREDICTIO 0.2630
`OLFEERNTL EQ LOGSECANT MO 0.2620
`OIFFERNTL EQ 383A NOTE ON 0.25980
`OLFFERNTL EQ 19LDIGITAL C 0.2370
`DIFFERNTL EQ I7ASKALL COM 0.2325
`OLFFERNTL EQ 248METHOD FO 0.2319
`OIFRFERNTL EQ 283BINARY AR 0.2311
`DIFFERNTL EQ 2528 CLASS 0 0.2303
`OLFFERNTL EQ 385NUMERICAL 0.2300
`DIFFERNTL EQ 210EVALUATIO 0.2205
`DIFFERNTL EQ 2200ATA PREP 0.2282
`OLIFFEANTL EQ 32ANALOGUE | 6.2262
`OIFFEANTL EQ 197TECHNICAL 0.2280
`OIFFERNTL EQ 3558 ROUTINE 6.2272
`OLFFERNTL EQ 2ISDIGITAL C 0.2259
`OTFFERNTL EQ 69COMPUTERS
`0.2249
`DIFFERNTL EQ 2011 TERATIVE 0.2198
`DIFFERNTL EQ 193ARTIFICTA 0.2196
`OIFFERNTL EQ 361SAINT COM 0.2187
`OLFFERNTL EQ 257SURVEY OF 0.2581
`OIFFERNTL EQ 2360PERATING 0.2180
`OLFFERNTL EQ 1LTCOMPUTATI 0.2170
`OLFFERNTL EQ 2074N APPLIC 0.2162
`OUFFEANTL EQ 2COTFFERENTI O.2222
`OLFFERNTL EQ 235FREEZING 0.2093
`
`0.9800 0
`0.9600 0
`0.9400 0
`9.9200 0
`6.9000 0
`0.8800 0
`0.6600 0
`0.8400 9
`0.8200 0
`0.8000 0
`0.7800 0
`0.7600 0
`0.7400 0
`9.7200 0
`0.7000 0
`0.6800 9
`0.6600 1
`90,6400 1
`0.6200 1
`9.6900 1
`0.5800 1
`0.5600 3
`0.5400 6
`9.5209 6
`0.5000 6
`0.42900 6
`0.4600 6
`0.4400 7
`0.4200 7
`0.4000 8
`0.3800 12
`0.3600 12
`0.3400 14
`0.3200 15
`0.3000 21
`0.2800 23
`0.2600 33
`0.2400 34
`2.2200 47
`0.2000 60
`0.1800 73
`261659 87
`Getsac 166
`Co1206 135
`©.1000 166
`3.9800 200
`6.0600 257
`G.0400 304
`0.0206 348
`
`a) INCREASING DOCUMENT
`ORDER
`
`b) DECREASING CORRELATION
`ORDER
`
`c) HISTOGRAM
`
`Fic. 5, Correlations Between Search Request and Decument Collection.
`
`to define a new pair of
`levels
`all possible retrieval
`measures, termed respectively normalized recall and nor-
`malized precision. Specifically,
`if R,,,
`is
`the standard
`recall after retrieving j documents from the collection
`(that is, if R,;) is equal to the number of relevant docu-
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all),
`then
`the normalized recall can be defined as
`
`Rnorm
`
`where N is the total number of documents in the col-
`lection.
`is the standard precision after re-
`if P(;,
`Similarly,
`trieving j documents from the collection, then a normal-
`ized preci

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket