throbber
The Evaluation of Automatic Retrieval Procedures—
`
`Selected Test Results Using the SMART System*
`
`The generation of eflective methods for the evaluation
`of information retrieval systems and techniques is becom-
`ing increasingly important as more and more systems
`are designed and implemented. The present
`report
`deals with the evaluation of a variety of automatic index-
`ing and retrieval procedures
`incorporated into the
`SMART automatic document retrieval system. The design
`
`of the SMART system is first briefly reviewed. The docu-
`ment file, search requests, and other parameters affecting
`the evaluation system are then examined in detail, and
`the measures used to assess the effectiveness of the
`
`retrieval performance are described. The main test results
`are given and tentative conclusions are reached con-
`cerning the design of fully automatic information systems.
`
`GERARD SALTON
`
`The Computation Laboratory of Harvard University
`Cambridge, Massachusetts
`
`0 Introduction
`
`The evaluation of information retrieval systems and
`of techniques for indexing, storing, searching and retriev-
`ing information has become of increasing importance in
`recent years. The interest in evaluation procedures stems
`from two main causes:
`first, more and more retrieval
`systems are being designed, thus raising an immediate
`question concerning performance and efficacy of these
`systems; and, second, evaluation methods are of interest
`in themselves,
`in that they lead to many complicated
`problems in test design and performance, and in the
`interpretation of test results.
`The present study differs from other reports on systems
`evaluation in that it deals with the evaluation of auto-
`matic rather than conventional
`information retrieval.
`
`More specifically, it is desired to compare the effective-
`ness of a large variety of fully automatic procedures for
`information analysis
`(indexing)
`and retrieval.
`Since
`such an evaluation must of necessity take place in an
`experimental situation rather than in an operational
`environment, it becomes possible to eliminate from con-
`sideration such important system parameters as cost of
`retrieval,
`response time,
`influence of physical
`lay-out,
`personnel problems and so on, and to concentrate fully
`
`' This study was supported by the National Science Foundation under
`Grant, GN—245.
`
`on the evaluation of retrieval techniques. Furthermore,
`a number of human problems which complicate matters in
`a conventional evaluation procedure,
`including, for ex-
`ample, the difliculties due to inconsistency among indexers
`or to the presence of search errors, need not be considered.
`Other problems, including those which have to do with
`the identification of information relevant to a given search
`request, and those concerning themselves with the in-
`terpretation of test results, must, of course, be faced
`in an automatic system just as in a conventional one.
`
`The design of the SMART automatic document re-
`trieval system is first briefly reviewed. The test environ-
`ment is then described in detail, including in particular
`a description of the document
`file and of the search
`requests used. Parameters are introduced to measure the
`effectiveness of the retrieval performance;
`these param-
`eters are similar to the standard recall and precision
`measures, but do not require that a distinction be made
`between retrieved and nonretrieved documents. The
`
`main test results are then given, and some tentative con-
`clusions are reached concerning the design of fully auto-
`matic retrieval systems.
`
`0 The SMART Retrieval System
`
`SMART is a fully automatic document retrieval sys-
`tem operating on the IBM 7094. Unlike other com-
`puter—based retrieval systems, the SMART system does
`
`American Documentation, V01. 16, No. 3 — July 1965
`
`209
`
`EX1027
`
`EX1027
`
`1
`
`

`

`not rely on manually assigned keywords or index terms
`for the identification of documents and search requests,
`nor does it use primarily the frequency of occurrence of
`certain words or phrases included in the texts of docu-
`ments. Instead, an attempt is made to go beyond simple
`word-matching procedures by using a variety of intel-
`lectual aids in the form of synonym dictionaries, hier-
`archical arrangements of subject identifiers, statistical and
`syntactic phrase—generating methods and the like,
`in
`order to obtain the content identifications useful for the
`
`retrieval process.
`are then
`and search requests
`Stored documents
`processed without any prior manual analysis by one of
`several hundred automatic content analysis methods, and
`those documents which most nearly match a given search
`request are extracted from the document file in answer
`to the request. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode;
`the user can then analyze the output
`obtained and, depending on his further
`requirements,
`order a reprocessing of the request under new conditions.
`The new output can again be examined and the process
`iterated until the right kind and amount of information
`are retrieved.
`
`SMART is thus designed to correct many of the short-
`comings of presently available automatic retrieval sys-
`tems, and it may serve as a reasonable prototype for
`fully automatic document retrieval. The following facil-
`ities incorporated into the SMART system for purposes
`of document analysis may be of principal interest*:
`
`(a) a system for separating English words into
`stems and affixes
`(the so-called “null
`the-
`saurus” method) which can be used to con-
`struct document
`identifications consisting of
`the word stems contained in the documents;
`
`(b) a synonym dictionary, or thesaurus, which can
`be used to recognize synonyms by replacing
`each word stem by one or more “concept”
`numbers (the thesaurus is a manually con-
`structed dictionary including about 600 con—
`cepts in the computer literature, corresponding
`to about 3000 English word stems);
`these
`concept numbers can serve as content identi-
`fiers instead of the original word stems;
`
`(c) a hierarchical arrangement of the concepts in-
`cluded in the thesaurus which makes it possi-
`ble, given any concept number,
`to find its
`“parent” in the hierarchy,
`its “sons,” its
`“brothers,” and any of a set of possible cross-
`refercnces; the hierarchy can be used to obtain
`more general content identifiers than the ones
`originally given by going “up” in the hier-
`archy, more specific ones by going “down” in
`the structure, and a set of related ones by
`picking up brothers and cross-references;
`
`'More detailed descriptions of the systems organization are included
`in Refs.
`1 and 2. Programming aspects and complete flowcharts are
`presented in Ref. 3.
`
`210
`
`American Documentation — July 1965
`
`(d) statistical procedures to compute similarity
`coefficients based on co-occurrences of con-
`cepts within the sentences of a given docu-
`ment, or within the documents of a given
`collection; association factors between docu-
`ments can also be determined, as can clusters
`(rather than only pairs) of related documents,
`or related concepts;
`the related concepts, de-
`termined by statistical association, can then
`be added to the originally available concepts
`to identify the various documents;
`
`(e) syntactic analysis and matching methods which
`make it possible to compare the syntactically
`analyzed sentences of documents and search
`requests with a pre-coded dictionary of “cri-
`terion” phrases in such a way that the same
`concept number is assigned to a. large number
`of semantically equivalent, but syntactically
`quite diiierent constructions
`(e.g. “informa-
`tion retrieval,” “the retrieval of information,”
`“the retrieval of documents,” “text process-
`ing,” and so on);
`
`(f) statistical phrase matching methods which
`operate like the preceding syntactic phrase
`procedures, that is, by using a preconstructed
`dictionary to identify phrases used as content
`identifiers; however, no syntactic analysis is
`performed in this case, and phrases are de-
`fined as equivalent
`if
`the concept numbers
`of all components match,
`regardless of
`the
`syntactic relationships between components;
`
`(g) a dictionary updating system, designed to re-
`vise the five principal dictionaries included in
`the system (stem thesaurus, sufiix dictionary,
`concept hierarchy, statistical phrases, and syn-
`tactic “criterion” phrases).
`
`The operations of the system are built around a super-
`visory system which decodes the input instructions and
`arranges the processing sequence in accordance with the
`instructions received. At
`the present
`time, about 35
`different processing options are available,
`in addition
`to a number of variable parameter settings. The latter
`are used to specify the type of correlation function which
`measures the similarity between documents and search
`requests, the cut—off value which determines the number
`of documents to be extracted as answers to search re-
`
`quests, and the thesaurus size.
`
`The SMART systems organization makes it possible to
`evaluate the effectiveness of the various processing meth-
`ods by comparing the outputs obtained from a variety
`of different
`runs. This is achieved by processing the
`same search requests against the same document collec-
`
`times, and making judicious changes in the
`tion several
`analysis procedures between runs. It is this use of the
`SMART system, as an evaluation tool, which is of par—
`ticular interest in the present context, and is therefore
`treated in more detail
`in the remaining parts of the
`present report.
`
`2
`
`

`

`Characteristic
`
`Comment
`
`Number of documents in collection.
`
`Document abstracts in the computer field.
`
`Number of search requests
`(a) specific
`(b) general.
`
`User population
`(requester also makes
`relevance judgments).
`
`0 —— 9 relevant documents
`10 — 30 relevant documents
`
`Technical people and students
`
`Number of indexing and search
`programs used.
`
`All search and indexing operations
`are automatic.
`
`Number of index terms per document.
`
`Varies greatly depending on indexing
`procedure and document.
`
`Number of relevant documents per request
`(a) specific
`(b) general.
`
`Number of retrieved documents per request. No cutrof’f is used to separate retrieved from
`nonretrieved.
`
`FIG. 1. Test Environment.
`
`Count
`
`405
`
`10
`7
`
`about 10
`
`15
`
`(average) 35
`
`.5
`(average)
`(average) 15
`
`405
`
`0 The Test Environment
`
`the testing procedures
`The parameters which control
`about
`to be described are summarized in Fig. 1. The
`data collection used consists of a set of 405 abstracts"
`of documents in the computer literature published dur-
`ing 1959 in the IRE Transactions on Electronic Com-
`puters. The results reported are based on the processing
`of about 20 search requests, each of which is analyzed by
`approximately 15 different
`indexing procedures. The
`search requests are somewhat arbitrarily separated into
`two groups, called respectively “general” and “specific”
`requests, depending on whether the number of documents
`believed to be relevant
`to each request
`is equal
`to at
`least ten (for the general requests) or is less than ten
`(for the specific ones). Results are reported separately
`for each of these two request groups; cumulative results
`are also reported for the complete set of requests.
`The user population responsible for the search requests
`consists of about ten technical people with background in
`the computer
`field. Requests are formulated without
`study of
`the document collection, and no document
`already included in the collection is normally used as
`a source for any given search request. On the other
`hand, in View of the experimental nature of the system
`it cannot be stated unequivocally that an actual user
`need in fact exists which requires fulfilment.
`An excerpt
`from the document collection, as it is
`originally introduced into computer storage,
`is
`repro-
`duced in Fig. 2. It may be noted that the full abstracts
`are stored together with the bibliographic citations. A
`typical search request, dealing with the numerical solu-
`tion of differential equations,
`is shown at
`the top of
`
`' Practical considerations dictated the use of abstracts rather than full
`documents;
`the SMART system as
`such is not
`restricted to the
`manipulation of abstracts only.
`
`Fig. 3. Any search request expressed in English words
`is acceptable, and no particular format restrictions exist.
`Also shown in Fig. 3 is a set of documents found in answer
`to the request on differential equations by using one
`of the available processing methods, The documents are
`listed in decreasing order of the correlation coefficient
`with the search request; a short 12-character identifier
`is shown for each document under the heading “answer,”
`and full bibliographic citations are shown under “identi-
`fication.”
`
`The average number of index terms used to identify
`each document is sometimes believed to be an important
`factor affecting retrieval performance.
`In the SMART
`system, this parameter is a difficult one to present and
`interpret, since the many procedures which exist
`for
`analyzing the documents and search requests generate
`indexing products with widely differing characteristics.
`A typical example is shown in Fig. 4, consisting of the
`index “vectors” generated by three different processing
`methods for the request on differential equations (short
`form “DIFFERNTL EQ”), and for document number 1
`of the collection (short form “1A COMPUTER”).
`
`It may be seen from Fig. 4 that the number of terms
`identifying a document can change drastically from one
`method to another:
`for example, document number 1
`is identified by 35 different word stems using the word
`stem analysis (labelled “null thesaurus” in Fig. 4); these
`35 stems, however, give rise to 50 different concept num-
`bers using the regular thesaurus, and to 55 concepts for
`the statistical phrase method. The number of index terms
`
`per document shown in the summary of Fig. 1 (35) must
`therefore be taken as an indication at best, and does not
`properly reflect the true situation.
`
`In Fig. 4, each concept number is followed by some
`mnemonic characters to identify the concept and by a
`
`American Documentation — July 1965
`
`21]
`
`3
`
`

`

`*TEXT ZMICRO-PROGRAMMING .
`
`SMICRO-PROGRAMMING
`SR. J. MERCER (UNIVERSITY OF CALIFORNIA)
`SU.S. GOV. RES. REPTS. VOL 30 PP 71-72(A)
`
`(AUGUST 15. 1958) PB 126893
`
`MICRO-PROGRAMMING . THE MICRO-PROGRAMMING TECHNIQUE OF DESIGNING THE
`CONTROL CIRCUITS OF AN ELECTRONIC DIGITAL COMPUTER TO FORMALLY INTERPRET
`AND EXECUTE A GIVEN SET OF MACHINE OPERATIONS AS AN EQUIVALENT SET
`OF SEOUENCES OF ELEMENTARY OPERATIONS THAT CAN BE EXECUTED IN ONE
`DULSE TIME IS DESCRIBED .
`
`*TEXT 3THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`STHE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`5M. M. ASTRAHAN (IBM CORP.)
`SIBM J. RES. AND DEV. VOL 2 PP 310-313 (OCTOBER 1958)
`
`THE ROLE OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS . THE ROLE
`OF LARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS IS DISCUSSED . LARGE
`MEMORIES PROVIDE AUTOMATIC REFERENCE TO MILLIONS OF WORDS OF MACHINE-RE-
`ADABLE CODED INFORMATION OR TO MILLIONS OF IMAGES OF DOCUMENT PAGES
`. HIGHER DENSITIES OF STORAGE WILL MAKE POSSIBLE LOW-COST MEMORIES
`OF BILLIONS OF WORDS WITH ACCESS TO ANY PART IN A FEW SECONDS OR COMPLE-
`TE SEARCHES IN MINUTES . THESE MEMORIES WILL SERVE AS INDEXES TO THE
`DELUGE OF TECHNICAL LITERATURE WHEN THE PROBLEMS OF INPUT AND OF THE
`AUTOMATIC GENERATION OF CLASSIFICATION INFORMATION ARE SOLVED . DOCUMENT
`FILES WILL MAKE THE INDEXED LITERATURE RAPIDLY AVAILABLE TO THE SEARCHER
`. MACHINE TRANSLATION OF LANGUAGE AND RECOGNITION OF SPOKEN INFORMATION
`ARE TWO OTHER AREAS WHICH WILL REQUIRE FAST. LARGE MEMORIES .
`
`FIG. 2. Typical Document Prints.
`
`PAGE 83
`SEPTEMBER 28. 1964
`ANSHERS TO REQUESTS FOR DOCUMENTS ON SPECIFIED TOPICS
`CURRENT REQUEST - .LIST DIFFERNTL EQ NUMERICAL DIGITAL SOLN OF DIFFERENTIAL EQUATIONS
`
`REQUEST
`
`OLIST DIFFERNTL EQ NUMERICAL DIGITAL SDLN OF DIFFERENTIAL EQUATIONS
`GIVE ALGORITHMS USEFUL FOR THE NUMERICAL SOLUTION OF ORDINARY
`DIFFERENTIAL EQUATIONS AND PARTIAL DIFFERENTIAL EQUATIONS ON DIGITAL
`COMPUTERS . EVALUATE THE VARIOUS INTEGRATION PROCEDURES (E.G. RUNGE--
`KUTTA. MILNt-S METHOD) HITH RESPECT TO ACCURACY. STABILITY. AND SPEED
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`BEASTABILITY
`
`0.6615
`
`STABILITY OF NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS
`N. E. MILNE AND R. R. REYNOLDS (OREGON STATE COLLEGEI
`J. ASSOC. FOR COMPUTING MACH. VOL 6 PP (96-203 (APRIL.
`
`I959)
`
`ANSHER
`
`CORRELATION
`
`IDENTIFICATION
`
`366$IMULATIN
`
`0.5158
`
`SIMULATING SECOND-ORDER EQUATIONS
`D. G. CMADMICR (UTAM STATE UNIV.)
`ELECTRONICS VOL 32 P 64 (MARCH 6. 1959)
`
`ANSWER
`
`CORRELATION
`
`IDENTIFICATION
`
`ZOOSOLUTION
`
`0.5663
`
`ANSHER
`
`CORRELATION
`
`3920M COMPUT
`
`0.5508
`
`ANSWER
`
`CORREEATION
`
`386ELIMINATI
`
`0 5‘83
`
`SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS ON AN AUTOMATIC
`DIGITAL COMPUTER
`G.N. LANCE (UNIV. OF SOUTHAMPTON)
`J. ASSOC. FOR COMPUTING MACH.. VOL 6. PP 97-101. JAN.. 1959
`IDENTIFICATION
`
`ON COMPUTING RADIATION INTEGRALS
`R. C. HANSEN (HUGHES AIRCRAFT CO.). L. L. BAILIN (UNIV. OF SOUTHERN
`CALIFORNIA. AND R. M. RUTISHAUSER (LITTON INDUSTRIES.
`INC.)
`COMMMUN. ASSOC) FOR COMPUTING MACH. VOL 2 PP 28-31 (FEBRUARY. 1959)
`IDENTIFICATION
`
`ELIMINATION OF SPECIAL FUNCTIONS FROM DIFFERENTIAL EQUATIONS
`J. E. POWERS (UNIV. OF OKLAHOMA)
`COMMON. ASSOC. FOR COMPTING MACH. VOL 2 PP 3-4 (MARCH. 1959)
`
`FIG. 3. Typical Search Request and Corresponding Answers.
`
`212
`
`American Documentation — July 1965
`
`4
`
`

`

`SEPTEHBER 23' 1964
`PAGE 17
`920161
`12
`29
`1810UA
`3IQTEG
`12
`
`REGULAR
`THESAURUS
`
`NULL
`IHESAURUS
`
`6 3
`
`21
`I.
`12
`16
`12
`
`leASC
`SJDATA
`B3HAP
`108L00
`130MEA
`15BREL
`l7BSYI
`21600”
`I‘DLET
`SOBACT
`12
`DIGIT
`12
`METHOD
`RUNGE-
`12
`12
`VARIE
`DESIGN 12
`FORK
`12
`INFORM
`12
`PLANE
`12
`RECOGN
`12
`12
`STRUCT
`12
`NRITT
`
`36
`
`12
`920161
`8-1 1.
`
`6
`6
`4
`C
`‘
`12
`8
`6
`
`12
`
`6
`
`STAT.
`PHRASE
`LOOK-UP
`
`«a21
`
`6 5m.
`
`ISDASE
`47CHNG
`77LIST
`2 PHRASES
`30
`107DGN
`
`s
`IZIMEN
`36
`1§9POG
`
`176$0L
`12
`
`
`6 SGGACT
`
`
`
`OCCURRENCES OF CONCEPTS.AND PHRASES IN DOCUMENTS
`DOCUMENT
`CONCEPT'UCCURS
`“EXACT
`12
`BALGOR
`110AUT
`12
`163UTI
`269611
`ZTQDIF
`428818
`SOSAPP
`BLDCAT
`32REQU
`59AHNT
`93ORDR
`1120PE
`1%6J06
`163EAS
`IITDIR
`327AST
`3501F0
`ALGORI
`EVALU
`ORDIN
`SPEED
`CHARAC
`ENABLE
`HANDLE
`DPER
`POSS
`SIMPLE
`TOHARD
`
`71EVAL
`179STD
`357YAH
`
`15BASE
`47CHNG
`77LIST
`107DGN
`121NEH
`169700
`176$0L
`212511
`SBBHCH
`SOIURD
`DIFFER
`INTEGR
`PROCED
`USL
`DESCRI
`EXPLAI
`INDEPE
`ORIENT
`PROGRA
`STORE
`USING
`
`18
`12
`
`U‘NOONBU‘ODN
`
`p HP
`
`p-pn—p
`
`naNruN
`
`36
`12
`12
`12
`36
`26
`12
`
`12
`12
`36
`26
`12
`3
`26
`10
`6
`1!
`12
`12
`12
`6
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`12
`
`13CALC
`176501
`356VEL
`
`10ALPH
`41NCHO
`72EXEC
`106NOU
`119AUT
`1515's
`1680RD
`2100UT
`332565
`6196EN
`COMPUT
`GIVE
`PARII
`STABIL
`COMPUT
`ESTIM
`ILLUST
`0RD
`PROBLE
`511E
`TRANS?
`
`44 53
`
`15
`12
`16
`12
`
`66
`
`1a
`6
`12
`24
`12
`12
`12
`12
`12
`26
`12
`12
`12
`
`DIFFERNTL E0
`
`[A COMPUTER
`
`DIFFERNTL E0
`
`1A COMPUTER
`
`ZINPUT
`31BIT'
`57DSCB
`67ENBL
`IIOAUT
`lbluTI
`162ROF
`IBZSAV
`2766EN
`346JET
`ACCUR
`ECU
`NUNER
`SULUT
`BIS
`DIRECT
`GIVE
`HACHIN
`PDS
`SCANN
`TECHNI
`
`TIEVAL
`179STD
`357VAN
`SOSAPP
`leDDR
`61HCHO
`72EXEC
`106NOU
`119AUT
`167SYS
`1680R0
`2000A-
`276GEH
`333MCH D
`5010RD 4
`
`72
`8
`6
`6
`H
`
`13CALC
`176$0L
`356VEL
`629513
`10ALPH
`JZREDU
`59AHNT
`930RDR
`lIZDPE
`166J08
`163EAS
`187DIR
`'PI
`
`16
`12
`12
`6
`
`12
`1
`26
`10
`6
`18
`12
`12
`
`.m4
`
`19 EN
`
`-
`
`12
`12
`36
`12
`12
`3
`15
`12
`36
`12
`
`BALGUR
`1§3UTI
`27fiDIF
`384TtG
`SLOCAT
`31811
`STDSCH
`BTENBL
`llOAUT
`IbJUII
`162RUF
`IBZSRV
`21600! 12
`iZTAST 12
`lfiuIFU 6
`
`66
`
`DIFFERNTL EC
`
`1A COMPUTER
`
`bEXACT
`IIOAUT
`
`12
`12
`
`b
`
`i69E||
`ZINPUT
`lbfilSC
`SSDATA
`GINA?
`108100
`lSOHEA
`ISBREL
`ITBSYH
`212511
`302L00 TZ
`366JET 6
`
`FIG. 4. Typical Indexing Products for Three Analysis Procedures.
`
`weight. The weights assigned to the concept numbers
`also change from method to method. Since no distinction
`is made in the evaluation procedure between retrieved
`and nonretrieved documents, the last indicator included
`in Fig.
`1
`(the number of retrieved documents per re-
`quest) must also be put into the proper perspective. A
`discussion of
`this point
`is postponed until after the
`evaluation measures are introduced in the next
`few
`
`paragraphs.
`
`0 Evaluation Measures
`
`1. Recall and Precision
`
`One of the most crucial tasks in the evaluation of re-
`
`trieval systems is the choice of measures which reflect
`systems performance.
`In the present context, such a
`measurement must of necessity depend primarily on the
`system’s ability to retrieve wanted information and
`to reject nonwantcd material, to the exclusion of opera-
`tional criteria such as retrieval cost, waiting time, input
`preparation time, and so on. The last mentioned factors
`
`° Preaision has also been called "relevance," notably in the literature
`of the ASLIB—Cranfield project.5
`'1' It has, however, been conjectured that an inverse relationship exists
`between recall and precision, such that high recall automatically Implies
`low precision and vice versa.
`
`may be of great practical importance in an operational
`situation, but do not enter, at least
`initially,
`into the
`evaluation of experimental procedures.
`A large number of measures have been proposed in the
`past for the evaluation of retrieval performance.‘1 Per-
`haps the best known of these are, respectively, recall and
`precision; recall is defined as the proportion of relevant
`material actually retrieved, and precision as the propor-
`tion of retrieved material actually relevant.* A system
`with high recall is one which rejects very little that is rele-
`vant but may also retrieve a large proportion of irrelevant
`material, thereby depressing precision. High precision, on
`the other hand, implies that very little irrelevant informs,
`tion is produced but much relevant information may be
`missed at the same time, thus depressing recall.
`Ideally,
`one would of course hope for both high recall and high
`precision.+
`Measures such as recall and precision are particularly
`attractive when it comes to evaluating automatic re-
`trieval procedures, because a large number of extraneous
`factors which cause uncertainty in the evaluation of con-
`ventional
`(manual)
`systems are automatically absent.
`The following characteristics of the present system are
`particularly important in this connection:
`
`(a) input errors in the conventional sense, due
`to faulty indexing or encoding, are eliminated
`since all
`indexmg operatlons are automatic;
`
`American Documentation — July 1965
`
`213
`
`5
`
`

`

`search
`reason, conventional
`the same
`(b) for
`errors arising from the absence of needed
`search terms are also excluded;
`(c) errors cannot be introduced in any transition
`between original search request and final ma-
`chine query,
`since this
`transition is now
`handled automatically and becomes
`indis-
`tinguishable from the main analysis operation;
`(d) inconsistencies introduced by a large number
`of different indexers and by the passage of
`time in the course of an experiment cannot
`arise; and
`(e) the role of human memory as a disturbance
`in the generation of retrieval measurements
`is eliminated (this factor can be particularly
`troublesome when source documents are to be
`retrieved in a conventional system by persons
`who originally perform the indexing task).
`
`In order to calculate the standard recall and precision
`measures the following important tasks must be under-
`taken:
`
`(a)
`
`relevance judgments must be made by hand
`in order to decide, for each document and for
`each search request, whether the given docu-
`ment is relevant to the given request;
`(b) the relevance judgments are usually all or
`nothing decisions so that a given document is
`assumed either wholly relevant or wholly
`irrelevant
`(in case of doubt relevance is as-
`sumed); and
`(c) a cut-off in the correlation between documents
`and search requests is normally chosen, such
`that documents whose correlation exceeds the
`cut-off value are retrieved, while the others
`are not retrieved.
`
`2. The Generation of Relevance Judgments
`
`A great deal has been written concerning the diffi-
`culties and the appropriateness of the various operations
`listed in part 1.5—8 The first task, in particular, which
`may require the performance of hundreds of thousands of
`human relevance judgments for document collections of
`reasonable size,
`is extremely difficult to satisfy and to
`control.
`.
`Two solutions have been suggested, each of which would
`base the relevance decisions on less than the whole docu-
`
`ment collection. The first one consists in using sampling
`techniques to isolate a suitable document subset, and in
`making relevance judgments only for documents included
`in that subset.
`If the results obtained for the subset,
`however, are to be applicable to the total collection, it
`becomes necessary to choose a sample representative of
`the whole. For most document collections, this turns out
`to be a difficult task.
`
`The other solution consists in formulating search re-
`quests based on specific source documents included in
`the collection, and in measuring retrieval performance
`for a given search request as a function of the retrieval
`of the respective source documents. This procedure suf-
`fers from the fact that search requests based on source
`
`214
`
`American Documentation —— July 1965
`
`documents are often claimed to be nontypical, thus intro-
`ducing a bias into the measurements which does not exist
`for requests reflecting actual user needs.
`Since the document collection used in connection with
`
`the present experiments is small enough to permit an
`exhaustive determination of relevance, the possible pit-
`falls inherent in the sampling procedure and in the use
`of source documents were avoided to a great extent.
`Many of the problems connected with the rendering of
`relevance judgments are, however, unresolved for gen-
`eral document collections.
`
`3. The (Tut-0])r Problem
`
`The other major problem is caused by the require-
`ment to pick a correlation cut-off value to distinguish re-
`trieved documents from those not retrieved. Such a cut-
`off introduces a new variable which seems to be extraneous
`
`to the principal task of measuring retrieval performance.
`Furthermore, in the SMART system, a different cut—off
`would have to be picked for each of the many process-
`ing methods if it were desired to retrieve approximately
`the same number of documents in each case.
`
`Because of these added complications, it was felt that
`the standard recall and precision measures should be
`redefined so as to remove the necessary distinction be-
`tween retrieved and nonretrieved information. For-
`
`tunately, this is not diflicult in computer-based informa-
`tion systems, because in such systems numeric coefficients
`expressing the similarity between each document and
`each search request are obtained as output of the search
`process. Documents may then be arranged in decreasing
`order of
`these similarity coefficients,
`as
`shown,
`for
`example, for the previously used request on differential
`equations in the center section of Fig. 5. It may be seen
`in the figure that document 384 exhibits the largest corre—
`lation with the search request, followed by documents
`360, 200, 392, and so on.
`An ordered document list of the kind shown in Fig. 5
`suggests that a suitable criterion for recall and precision
`measures would be the set of rank-orders of the relevant
`
`documents, when these documents are arranged in de—
`creasing correlation order. A function of the rank-order
`list which penalizes high ranks for relevant documents
`(and therefore low correlation coefficients) can be used
`to express recall, while a function penalizing low ranks
`of nonrelevant documents is indicative of precision.
`
`4. Normalized Recall and Normalized Precision"
`
`It is desired to use as a measure of retrieval effective-
`ness a set of parameters which reflects the standard recall
`and the standard precision, and does not depend on a
`distinction between retrieved and nonretrieved docu-
`
`ments. This suggests that one might take the average of
`the recall and the average of the precision obtained for
`
`’The measures described in this part were suggested by J. Rocchio.“
`
`6
`
`

`

`R0! CORRELAVIONS OF
`DIFFERNIL
`DIFFERNIL
`DIFFERNYL
`DIFFERNTL
`DIFFERNIL
`DIFFERNTL
`DIFFERNIL
`DIFFFRN'L
`DIFFERNIL
`DIFFERN'L
`DIFFERNIL
`DIFFERNIL
`DIFFERN'L
`DIFFERNYL
`DIFFERNIL
`DIFFFRNFL
`DlFFERNIl
`DIFFERNYL
`DIFFERN'L
`DIFFEENTL
`DIFFFRNIL
`DIFFERNIL
`DIFFERNIL
`DIFFERNYL
`DIFFFRNIL
`DIFFERN'L
`DIFFERNTL
`DIFFERNYL
`DIFFERNTL
`DIFFERNTL
`DlFFEINlL
`DIFFERNTL
`DIF‘FRNYL
`DIFFERNYL
`DIFFERNYL
`DIFFEKNTL
`DIFFIRNYL
`DIFFERNYL
`DIFFERNYL
`DIFFERNYL
`DIFFLRNYL
`DIFFERNTL
`DIFFERNTL
`DIFFERNYL
`DIFFERN'L
`DIFFEENTL
`DIEFERNIL
`DIFFERNIL
`DIEFERNIL
`DIFFERNTL
`DIFFERNYL
`DIFFERNIL
`DIFFERNYL
`DIF‘ERNIL
`DIFFEflNlL
`DIFFERN'L
`
`PMRASE-DOCUNENI
`E0
`1‘ COMPUIER
`ZHlCRO-PIDGR
`E0
`E0
`31HE RULE DF
`20
`‘1 MEN CLASS
`E0
`SANALVSIS OF
`E0
`GGENERALIZtD
`E0
`TAN [IPROVED
`E0
`lSNflRl-CUT I
`90PEIAIION A
`E0
`E0
`ICACCURAYE I
`60
`IZDIGIIAL CO
`E0
`lJNAtF-ADDEI
`E0
`leflN'ROL l?
`E0
`IIIMF FUNCII
`50
`18AM ACCURAY
`£0
`IQIESISTANEE
`ZQCIFFEKFNII
`20
`E0
`21AN ERROR-C
`E0
`ZZLI'CHING C
`50
`ZBFINIAYUIE
`E0
`I‘SUHE NOVEL
`E0
`25‘ NF" 'RAN
`E0
`ZESEHICONDUC
`ZTYEN HLGAPU
`E0
`E0
`ZUDESIGN 0F
`E0
`29lNVE5TIGAT
`50
`3CA IRANSIS'
`[0
`SIHAGNFIIC C
`JZANALGGUE l
`t0
`33INL USP Of
`E0
`JQEND‘FIRED
`50
`35A LOAD-SNA
`E0
`E0
`ZAFUNDAHENIA
`37A HIGH—SPt
`E0
`E0
`JEAUIOHAIIC
`E0
`AICUNHUNICAT
`02A DIRECI N
`50
`03TH: DAYA C
`E0
`E0
`l‘ACCUHACV C
`E0
`05A CALCULA'
`E0
`0001010 DIRE
`E0
`h7SPFClAL PU
`‘8‘ BUSINESS
`E0
`‘9‘ DUAL HAS
`E0
`501CCURACV C
`E0
`E0
`520A1NINA '
`E0
`53‘ CUMPUYEI
`£0
`ShAN AUVOHAI
`55‘UIUKAIIC
`E0
`56YNE CDHPUY
`E0
`E0
`51CASE SIUDV
`SUINE LARGES
`E0
`E0
`59DAVA FKDCE
`E0
`bOlNYELLIGEN
`E0
`blAN INPUT R
`620M PROGRAR
`E0
`
`HAIR]!
`0.1230
`0.0015
`0.0103
`0.00..
`0.0050
`0.1101
`0.2000
`0.0001
`0.0511
`0.110:
`0.0003
`0.0510
`0.033.
`0.0501
`0.1191
`0.0111
`0.1113
`0.01:5
`0.0051
`0-0307
`0.0193
`0.1000
`0.0051
`0.1004
`0.1315
`0.;019
`0.0130
`0.0515
`0.1203
`0.0012
`0.0050
`0.0311
`0.1392
`0.0300
`0.1003
`0.1185
`0.0030
`0.3333
`0.1300
`0.2950
`0.0000
`2.1200
`0.0000
`0.0511
`0.0101
`0.1030
`0.1321
`0.0103
`0.0766
`0.1513
`0.0950
`0.0250
`0.0302
`0.0031
`0.010.
`0.1101
`
`SEPTEIIEI ll.
`|960
`VAC! 73
`01rrea~11 so 300311011111 0.0015
`DIFFEINTL £0 30031nu1011N 0.3153
`DIFFEINTL E0 100301u110u 0.5003
`DIFFERMIL :0 39100 convur 0.5500
`015510011 E0 zucELIHINAII 0.5003
`DlFFtIMTL £0 103nuucs-xu1 0.5000
`DIFFERNTL £0 05M01s on an 0.0510
`otsrtuan £0 101501v1u0 1 0.0100
`DIFFERNVt
`:0 333510011110 0.3300
`017r10u11 10 1czo~ in: so 0.3030
`01Freau11 60 anisounounv 0.3360
`DIFFERNVL 50 101511015 r0 0.3300
`DIFFERNYL E0 119n11011 Pl 0.3505
`cirrinurL £0 0000000510 I 0.3051
`DIFFERNIL £0 151eunun :51 0.3311
`oissnaurt £0 1330~01000E 0.3110
`011551011 50 253IUUND-OFF 3.1151
`cirrtaurt t0 IOéALGuRIINH 0.3104
`DIFFERNYL £0 learnenastlc 0.3130
`DIFFEINTL £0 110:0000110 0.3030
`DIFFEIMTL 10 11000111 ..
`0.3010
`011510011 50 050 CALCULAV 0.1350
`ulFFEINIL £0 JQonours can 0.1000
`0111:0u11 :0 3000 nt1u00 0.1101
`anFEInIL :0 11300100011: 0.1153
`DIFFEINYL £0 zoostecvlunl 0.1150
`ulifsaurt 00 31|Fuon full 0.1101
`DIFFEIMTL so 103001u10111 0.1003
`011550011 10 zobuNIFVIMG 0.1601
`oirssautL £0 111s1nu10110 0.1003
`DIFFEINIL £0 3010a EXIONE 0.1001
`n1rren~11 £0 213101010110 0.1030
`DIFFEINVL £0 loosECANI In 0.1010
`01rssnu11 :0 303A nave 0» 0.1500
`DIFFEINIL 10 1010161111 c 0.1310
`011520011 :0 11133011 con 0.1315
`olrreau11 :0 Z‘IIEYNDD 10 0.1310
`011120011 10 10301NAlv Al 0.1311
`011511011 00 1510 c1155 0 0.1303
`DIFFERNYt £0 333000101031 0.1300
`DIFFERNYL £0 1101v3100110 0.1103
`DIFFERNIL 60 1100110 005' 0.2131
`011111011 50 31101100»:
`I 0.1101
`DIFFEuNTL £0 131rscnulc31 0.1100
`DIFFEINTL 50 3551 10011»: 0.1111
`01‘sean11 :0 1150101111 c 0.2153
`oxrrsnnvr :0 oecaueursas
`0.1103
`DIFFERNVL :0 ZOIIIEIAIIVE 0.1193
`DIFFEINTL :0 103001111c11 0.1190
`DIFFEINTL 50 30133101 con 0.2101
`DIFFERNTL £0 151sunvsv or 0.1101
`DIFFERNYL :0 1330150311uc 0.1100
`oisreiuvt £0 117COIFUVAII 0.1110
`01rsznu11 £0 10110 11111: 0.1131
`DIFFEINYl £0 100111£ne011 0.1111
`DIFFEINIL £0 13511111100 0.1093
`
`0.9Ioo n
`0-9600 0
`0-1000 0
`0.3100 0
`0.3000 0
`o-Iaoo o
`0.0000 0
`0-0300 0
`0.0100 0
`0.0000 0
`0-7800 0
`0.1000 0
`0-1300 o
`0.1200 0
`0.1000 0
`0.0000 0
`0.0000 1
`0.0000 1
`0.6200 l
`0.0000 1
`0.3000 1
`0.5000 3
`0.3300 0
`9.5100 6
`0.5000 0
`0.3000 0
`0.3000 0
`0.0000 1
`0.0100 1
`0.0000 0
`0.3000 11
`0.3000 11
`0.3000 10
`0.3100 15
`0.3000 11
`0.1000 23
`0.1000 33
`0.1000 30
`1.1100 01
`0.1000 00
`0.1000 13
`1.1050 01
`0.1000 100
`0.1100 135
`0.1000 100
`0.0000 200
`0.0000 251
`0.0000 300
`0.0100 300
`
`MINCREA$NG DOCUMENT
`ORDER
`
`leECREA$NG CORRELAHON
`ORDER
`
`c)HmTOGRAM
`
`FIG. 5. Correlations Between Search Request and Document Collection.
`
`to define a new pair of
`levels
`all possible retrieval
`measures, termed respectively normalized recall and nor—
`malizcd precision. Specifically,
`if Rm is
`the standard
`recall after retrieving j documents from the collection
`
`is equal to the number of relevant docu-
`(that is, if R11)
`ments retrieved divided by the total
`relevant
`in the
`collection, assuming j documents retrieved in all),
`then
`the normalized recall can be defined as
`
`1?nonn == 1‘?
`
`where N is the total number of documents in the col-
`lection.
`
`if Pm is the standard precision after re-
`Similarly,
`trieving j documents from the collection, then a normal-
`ized precision measure is defined as
`[J
`
`1 2 :
`I)norrn == ii?‘
`
`
`
`]__1
`
`13(1)
`
`Rnorm and I’

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket