`DirttJor1 Aslib Cranfield Rmarth Projut
`Depmy Dirulor1 Aslib Cranfield Rmarfh Projett
`One-dqy confermce, London, sth Febn~ary 196~
`THE evaluation of information retrieval systems has recendy become an
`important matter. In the past, however, most reports or proposals on this
`type of work appear largely to have ignored the efficiency of operation of the
`central core of an IR system, namely those operations concerned in the com(cid:173)
`pilation and use of the index. '!'he only aspects to receive consideration are the
`physical form of the index and the design of thesauri or classifications. The
`former activity has been slanted towards the usr;:: of computers and has tended to
`assume that this type of equipment will, ipso facto, give an improved performance
`but has made no attempt to justify cost factors which may be one hundred times
`that of conventional techniques. Work on thesauri and classifications, where it
`has been practical in nature, appears to consist of compiling lists of terms which
`go out of favour as quickly as any list of subject headings in the past; the more
`popular theoretical approach is the setting up of models or the use ofincreasingly
`abstruse and complex algebras. From the results and conclusions of the experi(cid:173)
`mental work at Cranfield, it would seem that many of these investigations are
`comparatively trivial.
`In this paper we set out the fundamental operations involved in compiling
`and using an index, show how the various factors can influence the operating
`efficiency, and consider the methods to be used in the present Aslib Cranfield
`As the analysis of indexing has become more detailed, there has been an in(cid:173)
`creasing requirement for the more precise definition of the various operations.
`We have endeavoured to use terms in their conventional meanings wherever
`possible, but it has frequently been necessary to modify, or to lind new terms.
`An infomtation retrieval .IJ'!Ielll is the complete organization for obtaining,
`storing and making available information. This could be a definition of a
`conventional library, but an IR system would be expected to exploit the infor(cid:173)
`mation in a positive manner and to have extra facilities such as people on the
`staff capable of evaluating information before it is passed to the inquirer. It
`would also be expected to have a Slliject it:dex to the items in the store, the index
`being the physical equipment which permits of the retrieval of references in the
`searches. The index may be in the form of a card catalogue, a printed list, a set
`of peek-a-boo cards, a computer, or any other convenient equipment. The
`arrangement within the index will depend upon the index /ang11age* and this may
`be a straightforward alphabetical arrangement of terms, or a classified arrange(cid:173)
`ment of terms, or any variation of these methods. The index language may be
`used in a pre-co-ordinate or post-co-ordinate manner. The former implies that the
`co-ordination of separate concepts is done at the time of indexing and the
`entries in the subject index will show this co-ordination. The latter implies that
`the co-ordination of concepts is done at the time of searching, so the entries in
`the subject index will refer only to single elements.
`The t•ocabt~lary of the index language is the complete collection of sought
`terms in the natural language, including all necessary synonyms, that are used
`in the set of documents and are therefore required for entry points to the index
`language. An index ter!JJ1 on the other hand, is an actual term or heading used
`in the index language, and may be a word or words, as with alphabetical subject
`indexes, uniterm indexes or zatocoding, or may be notational elements, such as
`a group of numbers in the Universal Decimal Classification, or may be non(cid:173)
`meaningful groups ofletters, as in the Western Reserve University Metallurgical
`Concept indexing is the intellectual process of deciding which are the concepts
`in a particular document that are of sufficient importance to be included in the'?
`subject index. Conventionally, this involves a 'Yes' or 'No' assessment, for a
`concept either is, or is not, considered worthy of inclusion in the subject index.
`It is possible for the indexer to indicate the relative importance of different
`concepts in a document by weighted indexing, which involves the assignment to
`each concept of a weighting number.
`The exha11sth·ity of the concept indexing is a comparative term; at a high level
`it implies that an entry is made for every possible concept in a document. At a
`low level it implies that a selection has been made and a smaller number of
`concepts have been used. Specificity is also a comparative term. A concept can
`be translated into an indexing language in such a way that the index term is
`co-extensive with a concept. This is a high level of specificity and implies that
`the index term covers the concept but nothing else besides the concept. Alter(cid:173)
`natively, the translation can be to a less specific (often called 'broader') index
`term which includes the concept being indexed as well as other concepts.
`Sy:tactic indexing implies the use of headings which display the relationship
`between the various elements, as distinct from those which merely show the
`existence of several attributes relevant to the subject indexed.
`* In recent papers we have been using tbe term 'descriptor language' following on the usage
`of l\lr B.C. Vickery. However, Mr Calvin Mooers has pointed out that the word 'descriptor',
`although now somewhat debased in common usage, originally had a precise meaning. We have
`agreed to restrict our use of the word to this precise meaning and have therefore decided upon
`the term 'index language'.
© Emerald Backfiles 2007


`VOL. I 5, NO. 4
`A search progratJJ!l~e is the formalization of the search request and it can show
`the same characteristics as outlined above for indexing, i.e. it entails a statement
`of the concepts, which can be at varying levels of exhaustivity, and can be
`translated into indexing terms of varying specificity.
`The operating e.fficienry of an index language will depend upon its performance
`as regards recall and relevance. Recall rafto equals ---c• where C equals the total
`number of documents in the collection which have an agreed standard of
`relevance to a given question, while R equals the number of those relevant
`documents retrieved in a single search. On the other hand, relet•ance ratio equals
`10~R, where L equals the total number of documents retrieved in a single
`search. Operating efficiency is affected by the exhaustivity and specificity of the
`indexing, as well as the search programme, and by varying any of these factors,
`one will obtain a performance curve which plots recall ratio against relevance
`ratio. Economic e.fficimry deals with the performance of the complete index.
`A set of domi!Jmls is any collection of documents which are, or will be, used as
`the basis of a single subject index. The set can be large or small, restricted to an
`organization's research papers, or be a heterogeneous collection of journal
`articles, research reports, patents, etc., in many different languages, but homo(cid:173)
`geneous in that they will be used as the raw material of a single index. A set of
`qmstio11s is a collection of questions to be put to a single subject index, either at
`present, or at any time in the future when the index is still intended to be
`Assuming there is agreement concerning the set of documents to be indexed,
`the following operations have to be carried out in compiling and using an index:
`x. Assess the subject matter of each document in relation to the requirements
`of the users, and decide which subjects should be included in the index.
`This is concept indexing, and is at present, and for the foreseeable future,
`an intellectual operation. With a pre-co-ordinate index, it is also necessary
`to decide on the appropriate combinations of concepts and, if the index
`language is to show relationships, the syntax.
`z.. Translate the subject concepts into the index language. This is a clerical
`task, except in those cases where a new term has to be added to the
`vocabulary of the index language.
`3· Place the indexing decisions into the index, which may involve preparing
`and filing catalogue cards, punching holes in a card or cards, or making
`marks on tape. This again is a clerical process.
`4· Make a concept analysis of the question and decide on the priority of
`alternative search programmes. As with concept indexing, this is an
`intellectual process.
`5· Translate the search concepts into the index language, a purely clerical
`6. Operate the physical retrieval mechanism of the index.
© Emerald Backfiles 2007


`APRIL I 96 3
`The Aslib Cranfield Project has been primarily concerned with operations
`1, z, 4 and 5, and it has only been due to the necessity of having the index in the
`physical form that we have been involved with 3 and 6. It is certain that these
`two latter points play no part in deciding on the operating efficiency, except in
`so far as that one technique might be more, or less, prone to clerical errors than
`another. They can, however, significantly affect the economic efficiency of an
`Involved in these operations is the variable of the index language. Whichever
`type of index language is used, it is certain that all the stages I to 6 have to be
`carried out. More important is it to note that the only two operations which
`have a true intellectual content are completely divorced from any consideration
`of the index language.* The basic concept analysis of the document and the
`basic concept analysis of the question, with the auxiliary decision of which
`concepts should be included in the index or the search programme, will be the
`same irrespective of which index language is used. It is probably the case that
`many indexers tend to think in the terms of the index language and their concept
`indexing decisions may be thereby influenced, but fundamentally it is true that
`concept indexing is a separate process which should not be affected by the index
`The common basic requirement of all index languages is a complete vocabu(cid:173)
`lary of all the sought terms, including all necessary synonyms, that are used in
`the indexing of a set of documents. This may be likened to an uncontrolled set
`of uniterms, and must be the basic structure for all index languages; and,
`whatever ultimate form an index language may take, it can only operate at
`maximum efficiency by having such a vocabulary. To this basic structure can be
`added a number of devices which are intended to improve the recall ratio or the
`relevance ratio. These devices (see Vickery1) can be listed as follows:
`A. Del'ices which, n•hm introd11ced into an llncontrolled twab11/ary of si!llple ter!lls, tend
`to broadm the class definition and so increase recall
`I. Confounding of true synonyms.
`z. Confounding of near synonyms; usually terms in the same hierarchy.
`3· Confounding of different word forms; usually terms from different

`4· Fixed vocabulary; usually takes the form of generic terms, but may use
`'metonymy' for example, representing a number of attributes by the
`thing possessing them.
`5. Generic terms.
`6. Drawing terms from categories and, within these, facets; this controls
`the generic level of terms, and to a certain degree controls synonyms.
`* It is, of course, true that the compilation and maintenance of the index language can fairly
`be said to be an intellectual task. Its use, however, within the context of an indexing operation
`is a separate matter which requires only clerical operations.
© Emerald Backfiles 2007


`VOL. I 5, NO. 4
`7· Representing terms by analytical definitions (semantic factors), in which
`inter-relations are conveyed by relational affixes or modulants; the
`generic level will usually be more specific than when control is by
`8. Hierarchical linkage of generic and specific terms, and, possibly, of
`co-ordinate terms.
`9· Multiple hierarchical linkage, i.e. linking each term to a number of
`different generic heads.
`It should be noted that devices 8 and 9 are not usually (as the others are)
`methods of class definition determining the structure or constituents of
`individual subject descriptions; they are ancillary devices (manifested as
`systematic sequence, or classified arrangement, as a thesaurus, as a
`network of see also references, etc.) indicating the existence of classes
`wider than these individual descriptions.
`Io. Bibliographical coupling, and citation indexes; these, also, are ancillary
`devices which indicate the existence of wider classes, the latter reflecting
`the use made of the documents and a probability of relevance arising
`from this.
`B. Dct·ices n·hich tmd to 11arro11J the class deji11ition a11d so i11crease re/em11ce
`I I. Correlation of terms: although implicit in some form in all practical
`indexing, this device is not inevitable; i.e. the use of a single term to
`define a class may retrieve quickly and economically, if the term is
`sufficiently rare in the context of the system.
`\Veighting, i.e. attempts to express the particular relevance of each
`concept used in indexing a document to the whole document. It may
`take two forms:
`r. An attempt to assess subjectively the relative 'information content'
`of each term within the context of the system;
`11. An objective measure, based on statistical counting of the word
`frequencies, etc.
`I3· Indicating connections between terms (interlocking):
`a. Without explicit expression of particular relations (interfixing); this
`may take at least three forms:
`i. Partitioning of the document; e.g. if the same document deals
`with the Conductivity of titani1,1m and the Hardness of copper at
`a particular temperature, partitioning makes it clear that the
`document has at least two separate 'themes'.
`ii. Interfixing within a theme (or 'information item'); e.g. Lead (I)
`Coating (1) Copper (z) Pipes (z) makes it clear that the subject is
`the Lead coating of copper pipes and not the Copper coating of
`lead pipes.
`iii. If terms are recorded physically in a linear sequence, a citation
`order (a regulated sequence in which terms from different
`categories are cited), will convey relations.
© Emerald Backfiles 2007


`b. With explicit expression of particular relations; this is necessary in
`cases where simple interfixing cannot cope with the possible
`ambiguities; e.g. to distinguish which particle is the projectile, which
`the target, and which the product, in a report in nuclear physics.
`Two forms are usually recognized:
`1. Role indicators: these are usually limited in number and express
`the basic or most common relations found in the subject field
`concerned; e.g. Product, Starting material.
`ii. Relational terms: these are more freely developed, the name of a
`relation being used in the same. way as any other term. It may,
`however, be cited only in the framework of a limited number of
`fundamental relations, as in Farradane's system.
`A third form sometimes to be found in conjunction with a, iii above
`is that of distinctive facet indicators which convey clearly the exact
`category to which the next term belongs.
`Any operating index language is an amalgam of some of these devices; for
`example a modern faceted classification uses hierarchical linkage within its
`facets, combines (correlates) terms from different categories (according to a
`strict citation order so as to maximize the mutual exclusiveness of its classes),
`provides a degree of multiple hierarchical linkage through its relative alphabetical
`index, controls synonyms, may control near synonyms, and may confound word
`forms to a mild degree.
`A machine-orientated retrieval system may use correlation as its basic device,
`but supplement this with confounding of synonyms, confounding of near
`synonyms, of word forms, show hierarchical and multiple hierarchical linkage
`to some degree (often via a thesaurus), and use links and roles.
`A simple, manual Uniterm system will use correlation as its basic device, but
`may add to this the confounding of true synonyms, of near synonyms and
`(possibly) of word forms, a modest degree of hierarchical linkage, and roles.
`The technique evolved in the Aslib Cranfield tests for the measurement of
`operating efficiency depends on two factors-the percentage of relevant
`documents retrieved as against the total of relevant documents in the collection
`(recall ratio), and the percentage of relevant documents among those actually
`retrieved (relevance ratio). The original Aslib Cranfield project established the
`former by conducting searches for questions which had been based on documents
`known to be in the collection, and the result of this showed that, on an average,
`about So per cent of the source documents were being retrieved. It could be
`said that this figure did not have particular significance, since it is obviously
`possible to obtain 100 per cent recall by looking at every document in the collec(cid:173)
`tion. Quite apart from going to this extreme, it would have been possible to
`improve the figure of So per cent by 'broadening' the search programme or, as
`we should say, by making the search programme less specific and/or less ex(cid:173)
`haustive. The only control on this figure of So per cent was that a limit was
© Emerald Backfiles 2007


`VOL. I 5, NO. 4
`put on the level of exhaustivity of the search programme. (See Cleverdon,2
`page 14), but this was rather crude, and (as discussed in chapter 6 of the same
`reference), an attempt was made to assess the relevance ratio. This proved to be
`exceedingly difficult, but it was obvious that it was an essential part of perform(cid:173)
`ance measurement, and, in the test of the \\7RU metallurgical literature3 the
`work was done more thoroughly, as is discussed later in this report. The result
`of a complete analysis of recall and relevance means that it is possible to plot a
`series of points, depending, for instance, on the search programme, and produce
`a performance curve as shown in Fig. 1a.
`A basic premise of the following arguments is that, for a given set of documents
`and a given set of questions, there will be a maximum retrieval performance;
`whatever type of index language is used, the performance curves will not be
`materially altered, and, in fact, the only reason for any major variation from this
`curve will be inadequacies of the intellectual performance in decisions concern(cid:173)
`ing subject concepts, either in indexing or in searching. For sets of documents
`in different subject fields, the resulting performance curves would probably
`differ, and if the same set of documents were used in two separate situations,
`since there could be two different sets of questions, this in turn could result in
`different performance figures. Tests at Cranfield have been concerned mainly
`with the subject fiel~s of science, engineering, and metallurgy, and with these
`types of document sets and question sets, it appears that the maximum retrieval
`performance would be as in Fig. I a. If the subject matter of the collection were
`organic chemistry, the resulting performance curve might be as Fig. 1b; for
`sociology it might be as Fig. Ic.
`It has to be assumed in this discussion that we are considering idealized
`conditions, and do not have to take into account losses due to human error.
`This has been investigated in the first project, and the general allowance to be
`made for this is known or, alternatively, can easily be ascertained in any given
`-' -'
`uJ a:
`25 50 75 100
© Emerald Backfiles 2007


`25 50 75 100
`situation, by using the Cranfield test procedure. We have postulated that there
`is a fixed maximum performance curve for any given set of documents and
`questions, and have suggested that, with the type of document and question
`sets tested at Cranfield, this might range from 100 per cent recall at less than
`1 per cent relevance to 50 per cent recall at jo per cent relevance. The important
`problem to investigate is how an index can be operated most efficiently at any
`particular point or within any particular range. As an example, it might be a
`requirement of an index that it should have a recall level of not less than 95 per
`cent. Alternatively, the requirement may be that the index should normally
`operate at a level of 2. 5 per cent relevance ratio, but that it should, when required,
`be capable of giving a recall figure of 90 per cent.
`In such an investigation, there are a number of factors to be considered, and
`of major importance is the concept indexing. Some groups working in this area
`seem to place more emphasis on search analysis, in finding how variations in
`search programmes will affect the efficiency, but this seems to be rather a
`cart-before-the-horse approach. It is at the stage of concept indexing that the
`future potential performance of the resulting indexes is determined; if a concept
`is not included, it will not be possible to recall the particular document by that
`concept, and vice versa. Concept indexing is, it must be emphasized, an intel(cid:173)
`lectual process that cannot be avoided. Current literature sometimes implies
`that keyword-in-context title indexing, for instance, is automatic indexing. It is
`automatic in its preparation of the physical index, but the intellectual stage of
`concept indexing is still there; the only difference is that in this case the concept
`indexer is the person who wrote the title.
`It is the decisions of concept indexing which result in the level of exhaustivity
`achieved. The extremes of exhaustivity range from where the whole text is
`included to where only a single concept is indexed for each document. The
`effect of high or low exhaustivity on the basic performance curve can be readily
`shown. Ioo per cent recall can always be obtained by doing no indexing but by
`having available for searching the text of every document in the collection. On
`the other hand, if only a single concept from each document is included in an
`index, then it is apparent that there will be a considerable drop in recall. We
`can, therefore, say that, if all other factors are held constant, the effect of lower-
© Emerald Backfiles 2007


`VOL. I 5, NO. 4
`ing the level of exhaustivity will be to drop the recall figure. This will in turn
`result in an improved relevance figure, and briefly it can be said that
`a high level of exhaustivity of indexing results in high recall and low relevance;
`a low level of exhaustivity of indexing results in low recall and high relevance.
`Little work has been done on the effect of exhaustivity of indexing, and
`practice varies widely (see chapter 7 of reference 4) without the appearance of
`any positive reasoning behind the decisions. Inevitably a high level of exhaus(cid:173)
`tive indexing results in higher input costs, and this could be a determining factor
`with certain physical forms of index. In the \Vestern Reserve University test an
`attempt was made to ascertain the effect of varying the exhaustivity with the
`facet index that was prepared at Cranfield. In this index the average number of
`entries originally included for each document was I 2!, and the effect of reducing
`this by stages to an average of three entries per document was investigated.
`Table I shows the effect that this had on recall and relevance of the documents.
`No. of mlrie.r No. of domlllmls
`Relemnce 2
`Relemnce 2 and 3
`· 16.3
`17. I
`73· 1
`This effect will operate irrespective of which index language is used, and, in
`passing, it should be noted that the same situation prevails in reverse with the
`decisions concerning concepts to be used in the search programme. However,
`before going into the matter of search programmes, we will consider the
`translation of the concept indexing into the index language.
`The aspect of the input stage which is concerned with index languages is
`specificity, and we suggest that, in so far as their operating performance is
`concerned, a fundamental difference-and probably the most important differ(cid:173)
`ence-between index languages is their hospitality for specificity; this we now
`consider in its simpler form and show its direct relationship to the index
`language. It is assumed that an index language includes, as a minimum, all the
`categories of the subject field, whether given in detail or not, of the documents
`being indexed.
`The first stage, i.e. concept indexing, with its decision concerning the level
`of exhaustivity of indexing, is assumed to have been completed for a given
`document and we have concepts A1, B1, C1, D 1, E 1 and F1• The next stage is
`translating these into the index language. Assume three hypothetical index
`languages, I, II, and III containing respectively, Io,ooo, I,ooo and 100 index
© Emerald Backfiles 2007


`terms. In index language I, there is a straight translation of the concepts into
`A1(I), B1(I), ~(I), D 1(I), E 1(I), F1(I). Index language II translates into A(II),
`B(Il), C(Il), D(II), E(II), F(II) (where A is the containing head for A1, A 2, A 3
`and B the containing head for B1, B2, etc.). Descriptor language III translates
`into ABC(III) and DEF(III) (where ABC is the containing head for A, B, C,
`A1, A 2, etc.).
`The level of specificity which can be given to the indexing is obviously very
`different in each of these three index languages; the result is that in a search
`where the requirements are for A1, B1 , ~' although index language III will
`retrieve all the relevant documents that are retrieved by index language I, it will
`also bring out, for example, documents coded A 2, B2, C2, resulting in a larger
`number of irrelevant documents and therefore a lower relevance ratio. In
`reverse, though the relevance ratio of a search in index language I will be
`relatively high, it is likely to miss some of the relevant references which would
`be found by the same search programme by index language III. It might,
`therefore, be the case that the normal operating level of index languages I, II
`and III, with all other conditions fixed, would be as shown by points I, z and 3
`in Fig. z. Again, one notes that a lowering of relevance results in higher recall,
`and of specificity it can be said that
`a high level of specificity in the index language results in high relevance and
`low recall;
`a low level of specificity in the index language results in low relevance and
`high recall.
`Considering again the basic performance curve (Fig. 3), we have shown that
`the maximum recall figure which can be obtained is limited by the level of
`exhaustive indexing; the maximum relevance figure is limited by the specificity
`possible in the index language. Therefore, a system which combines a high
`level of exhaustive indexing with the use of a highly specific index language
`would have the potentiality, dependent upon the search programme, of opera(cid:173)
`ting over the wide range of the curve between points A and B. However, a
`system which combines a low level of exhaustivity and a less specific index
`language might not be able to operate beyond the narrower limits shown by
`points C and D, whatever type of search programme was used. However, if
`this limited range gives a satisfactory performance figure, then it will be obvi(cid:173)
`ously more economical to operate. .
`It is now possible to consider the effects of exhaustivity and specificity in the
`search programme. Assume that the concept programme of a question requires
`that a search should be made for A1, B1, ~and D 1• If it is decided to 'broaden'
`the search, then this can be done in two ways. On the one hand, it is possible
`to drop one of the concepts, so that the search is now for A1, B1 and~· This
`we would describe as making the search less exhaustive. Alternatively, it is
`possible to make the search less specific, by substituting A for A1 or B for B1
`(where A is taken to be an inclusive term covering A1, A2, A3, etc.). It is
`obvious that by broadening the search in either of these ways, the recall ratio
`will be improved. In particular, however, we should consider the matter of
`specificity in indexing and specificity in the search programme. To return to the
© Emerald Backfiles 2007


`VOL. I 5, NO. 4
`index languages considered earlier, if one has coded according to index language
`III, then it is useless to attempt a programme that has a high level of specificity.
`In fact, it is impossible to do this, since the index language does not include
`75 100
`index terms of the required specificity. However, with index language I, it is
`possible to have a highly specific search programme, which would give a high
`relevance ratio but a low recall ratio (i.e. point 1 in Fig. z). If it is desired to
`improve the recall ratio, then the search programme can be made less specific,
`until it reaches a level of specificity of the term codes of index language III.
`By this time it can be assumed that the recall ratio will have risen to the same
`level as with index language III (i.e. point 3 in Fig. z). It is, therefore, shown
`that, as far as operating efficiency is concerned, there will be for every index
`language-depending upon its hospitality for specific indexing-a maximum
`relevance ratio which cannot be exceeded. It will, however, always be possible
`(by varying the search programme) to improve the recall ratio along the fixed
`performance curve up to its maximum level, this being dependent on the
`exhaustivity of the indexing.
`If, as is not wholly unreasonable, we consider as a median in the range of
`indexing languages, one which consists of nothing except an uncontrolled
`vocabulary of sought terms, then many of the indexing devices mentioned in an
`earlier section of this paper can be seen to be working in opposing directions.
`Any device which reduces the number of index terms is working towards
`improved recall, with the inevitable result that there will be a fall in relevance.
`Other devices, such as role indicators, are, in effect, increasing the number of
`index terms and are thereby improving relevance but decreasing recall. These
`devices have been discussed in an earlier section and the part they will play in
`the further work will be considered later, but we would now consider the effect
`of one type of device which is found in many operating index languages.
`In the introduction, we said that much of the present theoretical work being
`done on index languages was trivial. This comment is based on what appear to
© Emerald Backfiles 2007


`be logical deductions from the experimental work undertaken at Cranfield.
`The theoretical work which we have in mind is all concerned with what might
`be described as the grouping of the terms in the index language. This work is
`proceeding in many directions; facet classification, clumps, logical associations,
`word pairs, etc. To describe all this work as trivial shows that we have moved
`away from a main stream of development in information retrieval, and it may
`appear difficult to justify the statement. We would first make the point that we
`say that this work is trivial and do not say that it is useless. This could be
`translated into more precise figu

