throbber
THE TESTING OF INDEX LANGUAGE DEVICES
`CYRIL W.CLEVERDON
`DirttJor1 Aslib Cranfield Rmarth Projut
`and
`
`J.!.IILLS
`Depmy Dirulor1 Aslib Cranfield Rmarfh Projett
`
`One-dqy confermce, London, sth Febn~ary 196~
`
`IN'I'RODUCTION
`
`THE evaluation of information retrieval systems has recendy become an
`
`important matter. In the past, however, most reports or proposals on this
`type of work appear largely to have ignored the efficiency of operation of the
`central core of an IR system, namely those operations concerned in the com(cid:173)
`pilation and use of the index. '!'he only aspects to receive consideration are the
`physical form of the index and the design of thesauri or classifications. The
`former activity has been slanted towards the usr;:: of computers and has tended to
`assume that this type of equipment will, ipso facto, give an improved performance
`but has made no attempt to justify cost factors which may be one hundred times
`that of conventional techniques. Work on thesauri and classifications, where it
`has been practical in nature, appears to consist of compiling lists of terms which
`go out of favour as quickly as any list of subject headings in the past; the more
`popular theoretical approach is the setting up of models or the use ofincreasingly
`abstruse and complex algebras. From the results and conclusions of the experi(cid:173)
`mental work at Cranfield, it would seem that many of these investigations are
`comparatively trivial.
`In this paper we set out the fundamental operations involved in compiling
`and using an index, show how the various factors can influence the operating
`efficiency, and consider the methods to be used in the present Aslib Cranfield
`investigation.
`
`DEFINITIONS
`As the analysis of indexing has become more detailed, there has been an in(cid:173)
`creasing requirement for the more precise definition of the various operations.
`We have endeavoured to use terms in their conventional meanings wherever
`possible, but it has frequently been necessary to modify, or to lind new terms.
`An infomtation retrieval .IJ'!Ielll is the complete organization for obtaining,
`storing and making available information. This could be a definition of a
`
`106
`
`© Emerald Backfiles 2007
`
`EXHIBIT 2037
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`

`INDEX LANGUAGE DEVICES
`
`conventional library, but an IR system would be expected to exploit the infor(cid:173)
`mation in a positive manner and to have extra facilities such as people on the
`staff capable of evaluating information before it is passed to the inquirer. It
`would also be expected to have a Slliject it:dex to the items in the store, the index
`being the physical equipment which permits of the retrieval of references in the
`searches. The index may be in the form of a card catalogue, a printed list, a set
`of peek-a-boo cards, a computer, or any other convenient equipment. The
`arrangement within the index will depend upon the index /ang11age* and this may
`be a straightforward alphabetical arrangement of terms, or a classified arrange(cid:173)
`ment of terms, or any variation of these methods. The index language may be
`used in a pre-co-ordinate or post-co-ordinate manner. The former implies that the
`co-ordination of separate concepts is done at the time of indexing and the
`entries in the subject index will show this co-ordination. The latter implies that
`the co-ordination of concepts is done at the time of searching, so the entries in
`the subject index will refer only to single elements.
`The t•ocabt~lary of the index language is the complete collection of sought
`terms in the natural language, including all necessary synonyms, that are used
`in the set of documents and are therefore required for entry points to the index
`language. An index ter!JJ1 on the other hand, is an actual term or heading used
`in the index language, and may be a word or words, as with alphabetical subject
`indexes, uniterm indexes or zatocoding, or may be notational elements, such as
`a group of numbers in the Universal Decimal Classification, or may be non(cid:173)
`meaningful groups ofletters, as in the Western Reserve University Metallurgical
`Index.
`Concept indexing is the intellectual process of deciding which are the concepts
`in a particular document that are of sufficient importance to be included in the'?
`subject index. Conventionally, this involves a 'Yes' or 'No' assessment, for a
`concept either is, or is not, considered worthy of inclusion in the subject index.
`It is possible for the indexer to indicate the relative importance of different
`concepts in a document by weighted indexing, which involves the assignment to
`each concept of a weighting number.
`The exha11sth·ity of the concept indexing is a comparative term; at a high level
`it implies that an entry is made for every possible concept in a document. At a
`low level it implies that a selection has been made and a smaller number of
`concepts have been used. Specificity is also a comparative term. A concept can
`be translated into an indexing language in such a way that the index term is
`co-extensive with a concept. This is a high level of specificity and implies that
`the index term covers the concept but nothing else besides the concept. Alter(cid:173)
`natively, the translation can be to a less specific (often called 'broader') index
`term which includes the concept being indexed as well as other concepts.
`Sy:tactic indexing implies the use of headings which display the relationship
`between the various elements, as distinct from those which merely show the
`existence of several attributes relevant to the subject indexed.
`* In recent papers we have been using tbe term 'descriptor language' following on the usage
`of l\lr B.C. Vickery. However, Mr Calvin Mooers has pointed out that the word 'descriptor',
`although now somewhat debased in common usage, originally had a precise meaning. We have
`agreed to restrict our use of the word to this precise meaning and have therefore decided upon
`the term 'index language'.
`
`107
`
`© Emerald Backfiles 2007
`
`

`

`ASLIB PROCEEDINGS
`
`VOL. I 5, NO. 4
`
`A search progratJJ!l~e is the formalization of the search request and it can show
`the same characteristics as outlined above for indexing, i.e. it entails a statement
`of the concepts, which can be at varying levels of exhaustivity, and can be
`translated into indexing terms of varying specificity.
`The operating e.fficienry of an index language will depend upon its performance
`as regards recall and relevance. Recall rafto equals ---c• where C equals the total
`•
`IOCR
`number of documents in the collection which have an agreed standard of
`relevance to a given question, while R equals the number of those relevant
`documents retrieved in a single search. On the other hand, relet•ance ratio equals
`10~R, where L equals the total number of documents retrieved in a single
`search. Operating efficiency is affected by the exhaustivity and specificity of the
`indexing, as well as the search programme, and by varying any of these factors,
`one will obtain a performance curve which plots recall ratio against relevance
`ratio. Economic e.fficimry deals with the performance of the complete index.
`A set of domi!Jmls is any collection of documents which are, or will be, used as
`the basis of a single subject index. The set can be large or small, restricted to an
`organization's research papers, or be a heterogeneous collection of journal
`articles, research reports, patents, etc., in many different languages, but homo(cid:173)
`geneous in that they will be used as the raw material of a single index. A set of
`qmstio11s is a collection of questions to be put to a single subject index, either at
`present, or at any time in the future when the index is still intended to be
`operating.
`
`THE PREPARATION OF AN INDEX
`Assuming there is agreement concerning the set of documents to be indexed,
`the following operations have to be carried out in compiling and using an index:
`x. Assess the subject matter of each document in relation to the requirements
`of the users, and decide which subjects should be included in the index.
`This is concept indexing, and is at present, and for the foreseeable future,
`an intellectual operation. With a pre-co-ordinate index, it is also necessary
`to decide on the appropriate combinations of concepts and, if the index
`language is to show relationships, the syntax.
`z.. Translate the subject concepts into the index language. This is a clerical
`task, except in those cases where a new term has to be added to the
`vocabulary of the index language.
`3· Place the indexing decisions into the index, which may involve preparing
`and filing catalogue cards, punching holes in a card or cards, or making
`marks on tape. This again is a clerical process.
`4· Make a concept analysis of the question and decide on the priority of
`alternative search programmes. As with concept indexing, this is an
`intellectual process.
`5· Translate the search concepts into the index language, a purely clerical
`task.
`6. Operate the physical retrieval mechanism of the index.
`
`xo8
`
`© Emerald Backfiles 2007
`
`

`

`APRIL I 96 3
`
`INDEX LANGUAGE DEVICES
`
`The Aslib Cranfield Project has been primarily concerned with operations
`1, z, 4 and 5, and it has only been due to the necessity of having the index in the
`physical form that we have been involved with 3 and 6. It is certain that these
`two latter points play no part in deciding on the operating efficiency, except in
`so far as that one technique might be more, or less, prone to clerical errors than
`another. They can, however, significantly affect the economic efficiency of an
`index.
`Involved in these operations is the variable of the index language. Whichever
`type of index language is used, it is certain that all the stages I to 6 have to be
`carried out. More important is it to note that the only two operations which
`have a true intellectual content are completely divorced from any consideration
`of the index language.* The basic concept analysis of the document and the
`basic concept analysis of the question, with the auxiliary decision of which
`concepts should be included in the index or the search programme, will be the
`same irrespective of which index language is used. It is probably the case that
`many indexers tend to think in the terms of the index language and their concept
`indexing decisions may be thereby influenced, but fundamentally it is true that
`concept indexing is a separate process which should not be affected by the index
`language.
`
`INDEX LANGUAGES
`The common basic requirement of all index languages is a complete vocabu(cid:173)
`lary of all the sought terms, including all necessary synonyms, that are used in
`the indexing of a set of documents. This may be likened to an uncontrolled set
`of uniterms, and must be the basic structure for all index languages; and,
`whatever ultimate form an index language may take, it can only operate at
`maximum efficiency by having such a vocabulary. To this basic structure can be
`added a number of devices which are intended to improve the recall ratio or the
`relevance ratio. These devices (see Vickery1) can be listed as follows:
`
`A. Del'ices which, n•hm introd11ced into an llncontrolled twab11/ary of si!llple ter!lls, tend
`to broadm the class definition and so increase recall
`I. Confounding of true synonyms.
`z. Confounding of near synonyms; usually terms in the same hierarchy.
`3· Confounding of different word forms; usually terms from different
`categories.

`
`4· Fixed vocabulary; usually takes the form of generic terms, but may use
`'metonymy' for example, representing a number of attributes by the
`thing possessing them.
`5. Generic terms.
`
`6. Drawing terms from categories and, within these, facets; this controls
`the generic level of terms, and to a certain degree controls synonyms.
`* It is, of course, true that the compilation and maintenance of the index language can fairly
`be said to be an intellectual task. Its use, however, within the context of an indexing operation
`is a separate matter which requires only clerical operations.
`
`© Emerald Backfiles 2007
`
`

`

`ASLIB PROCEEDINGS
`
`VOL. I 5, NO. 4
`
`7· Representing terms by analytical definitions (semantic factors), in which
`inter-relations are conveyed by relational affixes or modulants; the
`generic level will usually be more specific than when control is by
`categories.
`8. Hierarchical linkage of generic and specific terms, and, possibly, of
`co-ordinate terms.
`9· Multiple hierarchical linkage, i.e. linking each term to a number of
`different generic heads.
`It should be noted that devices 8 and 9 are not usually (as the others are)
`methods of class definition determining the structure or constituents of
`individual subject descriptions; they are ancillary devices (manifested as
`systematic sequence, or classified arrangement, as a thesaurus, as a
`network of see also references, etc.) indicating the existence of classes
`wider than these individual descriptions.
`Io. Bibliographical coupling, and citation indexes; these, also, are ancillary
`devices which indicate the existence of wider classes, the latter reflecting
`the use made of the documents and a probability of relevance arising
`from this.
`
`Iz.
`
`B. Dct·ices n·hich tmd to 11arro11J the class deji11ition a11d so i11crease re/em11ce
`I I. Correlation of terms: although implicit in some form in all practical
`indexing, this device is not inevitable; i.e. the use of a single term to
`define a class may retrieve quickly and economically, if the term is
`sufficiently rare in the context of the system.
`\Veighting, i.e. attempts to express the particular relevance of each
`concept used in indexing a document to the whole document. It may
`take two forms:
`r. An attempt to assess subjectively the relative 'information content'
`of each term within the context of the system;
`11. An objective measure, based on statistical counting of the word
`frequencies, etc.
`I3· Indicating connections between terms (interlocking):
`a. Without explicit expression of particular relations (interfixing); this
`may take at least three forms:
`i. Partitioning of the document; e.g. if the same document deals
`with the Conductivity of titani1,1m and the Hardness of copper at
`a particular temperature, partitioning makes it clear that the
`document has at least two separate 'themes'.
`ii. Interfixing within a theme (or 'information item'); e.g. Lead (I)
`Coating (1) Copper (z) Pipes (z) makes it clear that the subject is
`the Lead coating of copper pipes and not the Copper coating of
`lead pipes.
`iii. If terms are recorded physically in a linear sequence, a citation
`order (a regulated sequence in which terms from different
`categories are cited), will convey relations.
`
`IIO
`
`© Emerald Backfiles 2007
`
`

`

`INDEX LANGUAGE DEVICES
`
`b. With explicit expression of particular relations; this is necessary in
`cases where simple interfixing cannot cope with the possible
`ambiguities; e.g. to distinguish which particle is the projectile, which
`the target, and which the product, in a report in nuclear physics.
`Two forms are usually recognized:
`1. Role indicators: these are usually limited in number and express
`the basic or most common relations found in the subject field
`concerned; e.g. Product, Starting material.
`ii. Relational terms: these are more freely developed, the name of a
`relation being used in the same. way as any other term. It may,
`however, be cited only in the framework of a limited number of
`fundamental relations, as in Farradane's system.
`A third form sometimes to be found in conjunction with a, iii above
`is that of distinctive facet indicators which convey clearly the exact
`category to which the next term belongs.
`
`Any operating index language is an amalgam of some of these devices; for
`example a modern faceted classification uses hierarchical linkage within its
`facets, combines (correlates) terms from different categories (according to a
`strict citation order so as to maximize the mutual exclusiveness of its classes),
`provides a degree of multiple hierarchical linkage through its relative alphabetical
`index, controls synonyms, may control near synonyms, and may confound word
`forms to a mild degree.
`A machine-orientated retrieval system may use correlation as its basic device,
`but supplement this with confounding of synonyms, confounding of near
`synonyms, of word forms, show hierarchical and multiple hierarchical linkage
`to some degree (often via a thesaurus), and use links and roles.
`A simple, manual Uniterm system will use correlation as its basic device, but
`may add to this the confounding of true synonyms, of near synonyms and
`(possibly) of word forms, a modest degree of hierarchical linkage, and roles.
`
`PERFORMANCE MEASUREMENT
`The technique evolved in the Aslib Cranfield tests for the measurement of
`operating efficiency depends on two factors-the percentage of relevant
`documents retrieved as against the total of relevant documents in the collection
`(recall ratio), and the percentage of relevant documents among those actually
`retrieved (relevance ratio). The original Aslib Cranfield project established the
`former by conducting searches for questions which had been based on documents
`known to be in the collection, and the result of this showed that, on an average,
`about So per cent of the source documents were being retrieved. It could be
`said that this figure did not have particular significance, since it is obviously
`possible to obtain 100 per cent recall by looking at every document in the collec(cid:173)
`tion. Quite apart from going to this extreme, it would have been possible to
`improve the figure of So per cent by 'broadening' the search programme or, as
`we should say, by making the search programme less specific and/or less ex(cid:173)
`haustive. The only control on this figure of So per cent was that a limit was
`
`III
`
`© Emerald Backfiles 2007
`
`

`

`ASLIB PROCEEDINGS
`
`VOL. I 5, NO. 4
`
`put on the level of exhaustivity of the search programme. (See Cleverdon,2
`page 14), but this was rather crude, and (as discussed in chapter 6 of the same
`reference), an attempt was made to assess the relevance ratio. This proved to be
`exceedingly difficult, but it was obvious that it was an essential part of perform(cid:173)
`ance measurement, and, in the test of the \\7RU metallurgical literature3 the
`work was done more thoroughly, as is discussed later in this report. The result
`of a complete analysis of recall and relevance means that it is possible to plot a
`series of points, depending, for instance, on the search programme, and produce
`a performance curve as shown in Fig. 1a.
`
`ASSUMPTION FOR FUTURE WORK
`A basic premise of the following arguments is that, for a given set of documents
`and a given set of questions, there will be a maximum retrieval performance;
`whatever type of index language is used, the performance curves will not be
`materially altered, and, in fact, the only reason for any major variation from this
`curve will be inadequacies of the intellectual performance in decisions concern(cid:173)
`ing subject concepts, either in indexing or in searching. For sets of documents
`in different subject fields, the resulting performance curves would probably
`differ, and if the same set of documents were used in two separate situations,
`since there could be two different sets of questions, this in turn could result in
`different performance figures. Tests at Cranfield have been concerned mainly
`with the subject fiel~s of science, engineering, and metallurgy, and with these
`types of document sets and question sets, it appears that the maximum retrieval
`performance would be as in Fig. I a. If the subject matter of the collection were
`organic chemistry, the resulting performance curve might be as Fig. 1b; for
`sociology it might be as Fig. Ic.
`It has to be assumed in this discussion that we are considering idealized
`conditions, and do not have to take into account losses due to human error.
`This has been investigated in the first project, and the general allowance to be
`made for this is known or, alternatively, can easily be ascertained in any given
`
`75
`
`-' -'
`~50
`uJ a:
`
`25
`
`75
`
`-'
`-'
`~50
`uJ
`a:
`
`25
`
`o~~2~s--~5o~-=7~s~,~oo
`RELEVANCE
`
`0
`
`25 50 75 100
`RELEVANCE
`
`FIGURE Ja
`
`FIGURE Jb
`
`ll2
`
`© Emerald Backfiles 2007
`
`

`

`INDEX LANGUAGE DEVICES
`
`FIGURE IC
`
`0
`
`25 50 75 100
`RELEVANCE
`
`situation, by using the Cranfield test procedure. We have postulated that there
`is a fixed maximum performance curve for any given set of documents and
`questions, and have suggested that, with the type of document and question
`sets tested at Cranfield, this might range from 100 per cent recall at less than
`1 per cent relevance to 50 per cent recall at jo per cent relevance. The important
`problem to investigate is how an index can be operated most efficiently at any
`particular point or within any particular range. As an example, it might be a
`requirement of an index that it should have a recall level of not less than 95 per
`cent. Alternatively, the requirement may be that the index should normally
`operate at a level of 2. 5 per cent relevance ratio, but that it should, when required,
`be capable of giving a recall figure of 90 per cent.
`In such an investigation, there are a number of factors to be considered, and
`of major importance is the concept indexing. Some groups working in this area
`seem to place more emphasis on search analysis, in finding how variations in
`search programmes will affect the efficiency, but this seems to be rather a
`cart-before-the-horse approach. It is at the stage of concept indexing that the
`future potential performance of the resulting indexes is determined; if a concept
`is not included, it will not be possible to recall the particular document by that
`concept, and vice versa. Concept indexing is, it must be emphasized, an intel(cid:173)
`lectual process that cannot be avoided. Current literature sometimes implies
`that keyword-in-context title indexing, for instance, is automatic indexing. It is
`automatic in its preparation of the physical index, but the intellectual stage of
`concept indexing is still there; the only difference is that in this case the concept
`indexer is the person who wrote the title.
`It is the decisions of concept indexing which result in the level of exhaustivity
`achieved. The extremes of exhaustivity range from where the whole text is
`included to where only a single concept is indexed for each document. The
`effect of high or low exhaustivity on the basic performance curve can be readily
`shown. Ioo per cent recall can always be obtained by doing no indexing but by
`having available for searching the text of every document in the collection. On
`the other hand, if only a single concept from each document is included in an
`index, then it is apparent that there will be a considerable drop in recall. We
`can, therefore, say that, if all other factors are held constant, the effect of lower-
`
`113
`
`© Emerald Backfiles 2007
`
`

`

`ASLIB PROCEEDINGS
`
`VOL. I 5, NO. 4
`
`ing the level of exhaustivity will be to drop the recall figure. This will in turn
`result in an improved relevance figure, and briefly it can be said that
`
`a high level of exhaustivity of indexing results in high recall and low relevance;
`a low level of exhaustivity of indexing results in low recall and high relevance.
`
`Little work has been done on the effect of exhaustivity of indexing, and
`practice varies widely (see chapter 7 of reference 4) without the appearance of
`any positive reasoning behind the decisions. Inevitably a high level of exhaus(cid:173)
`tive indexing results in higher input costs, and this could be a determining factor
`with certain physical forms of index. In the \Vestern Reserve University test an
`attempt was made to ascertain the effect of varying the exhaustivity with the
`facet index that was prepared at Cranfield. In this index the average number of
`entries originally included for each document was I 2!, and the effect of reducing
`this by stages to an average of three entries per document was investigated.
`Table I shows the effect that this had on recall and relevance of the documents.
`
`TABLE I
`
`No. of mlrie.r No. of domlllmls
`relriend
`
`RECALL AND RELEVANCE RATIO IN FACET CATALOGUE OF WRU TEST
`AT VARYING INDEX ENTRIES
`Relemnce 2
`Relemnce 2 and 3
`Recall
`Re/ez•aflce
`Relet•ance
`Recall
`ratio
`ratio
`ratio
`ratio
`%
`%
`%
`%
`· 16.3
`64.1
`89I
`I2!
`83
`34·6
`8o.6
`8
`6o.6
`824
`17. I
`35-5
`38
`643
`50·7
`19·9
`5
`73· 1
`64.6
`23
`49I
`41.3
`3
`4Z.I
`This effect will operate irrespective of which index language is used, and, in
`passing, it should be noted that the same situation prevails in reverse with the
`decisions concerning concepts to be used in the search programme. However,
`before going into the matter of search programmes, we will consider the
`translation of the concept indexing into the index language.
`The aspect of the input stage which is concerned with index languages is
`specificity, and we suggest that, in so far as their operating performance is
`concerned, a fundamental difference-and probably the most important differ(cid:173)
`ence-between index languages is their hospitality for specificity; this we now
`consider in its simpler form and show its direct relationship to the index
`language. It is assumed that an index language includes, as a minimum, all the
`categories of the subject field, whether given in detail or not, of the documents
`being indexed.
`The first stage, i.e. concept indexing, with its decision concerning the level
`of exhaustivity of indexing, is assumed to have been completed for a given
`document and we have concepts A1, B1, C1, D 1, E 1 and F1• The next stage is
`translating these into the index language. Assume three hypothetical index
`languages, I, II, and III containing respectively, Io,ooo, I,ooo and 100 index
`
`II4
`
`© Emerald Backfiles 2007
`
`

`

`INDEX LANGUAGE DEVICES
`
`terms. In index language I, there is a straight translation of the concepts into
`A1(I), B1(I), ~(I), D 1(I), E 1(I), F1(I). Index language II translates into A(II),
`B(Il), C(Il), D(II), E(II), F(II) (where A is the containing head for A1, A 2, A 3
`and B the containing head for B1, B2, etc.). Descriptor language III translates
`into ABC(III) and DEF(III) (where ABC is the containing head for A, B, C,
`A1, A 2, etc.).
`The level of specificity which can be given to the indexing is obviously very
`different in each of these three index languages; the result is that in a search
`where the requirements are for A1, B1 , ~' although index language III will
`retrieve all the relevant documents that are retrieved by index language I, it will
`also bring out, for example, documents coded A 2, B2, C2, resulting in a larger
`number of irrelevant documents and therefore a lower relevance ratio. In
`reverse, though the relevance ratio of a search in index language I will be
`relatively high, it is likely to miss some of the relevant references which would
`be found by the same search programme by index language III. It might,
`therefore, be the case that the normal operating level of index languages I, II
`and III, with all other conditions fixed, would be as shown by points I, z and 3
`in Fig. z. Again, one notes that a lowering of relevance results in higher recall,
`and of specificity it can be said that
`a high level of specificity in the index language results in high relevance and
`low recall;
`a low level of specificity in the index language results in low relevance and
`high recall.
`Considering again the basic performance curve (Fig. 3), we have shown that
`the maximum recall figure which can be obtained is limited by the level of
`exhaustive indexing; the maximum relevance figure is limited by the specificity
`possible in the index language. Therefore, a system which combines a high
`level of exhaustive indexing with the use of a highly specific index language
`would have the potentiality, dependent upon the search programme, of opera(cid:173)
`ting over the wide range of the curve between points A and B. However, a
`system which combines a low level of exhaustivity and a less specific index
`language might not be able to operate beyond the narrower limits shown by
`points C and D, whatever type of search programme was used. However, if
`this limited range gives a satisfactory performance figure, then it will be obvi(cid:173)
`ously more economical to operate. .
`It is now possible to consider the effects of exhaustivity and specificity in the
`search programme. Assume that the concept programme of a question requires
`that a search should be made for A1, B1, ~and D 1• If it is decided to 'broaden'
`the search, then this can be done in two ways. On the one hand, it is possible
`to drop one of the concepts, so that the search is now for A1, B1 and~· This
`we would describe as making the search less exhaustive. Alternatively, it is
`possible to make the search less specific, by substituting A for A1 or B for B1
`(where A is taken to be an inclusive term covering A1, A2, A3, etc.). It is
`obvious that by broadening the search in either of these ways, the recall ratio
`will be improved. In particular, however, we should consider the matter of
`specificity in indexing and specificity in the search programme. To return to the
`
`II5
`
`© Emerald Backfiles 2007
`
`

`

`.ASLIB PROCEEDINGS
`
`VOL. I 5, NO. 4
`
`index languages considered earlier, if one has coded according to index language
`III, then it is useless to attempt a programme that has a high level of specificity.
`In fact, it is impossible to do this, since the index language does not include
`
`0
`
`100
`
`0
`
`75 100
`50
`25
`RELEVANCE
`FIGURE 2.
`
`so
`75
`25
`RELEVANCE
`FIGURE 3
`index terms of the required specificity. However, with index language I, it is
`possible to have a highly specific search programme, which would give a high
`relevance ratio but a low recall ratio (i.e. point 1 in Fig. z). If it is desired to
`improve the recall ratio, then the search programme can be made less specific,
`until it reaches a level of specificity of the term codes of index language III.
`By this time it can be assumed that the recall ratio will have risen to the same
`level as with index language III (i.e. point 3 in Fig. z). It is, therefore, shown
`that, as far as operating efficiency is concerned, there will be for every index
`language-depending upon its hospitality for specific indexing-a maximum
`relevance ratio which cannot be exceeded. It will, however, always be possible
`(by varying the search programme) to improve the recall ratio along the fixed
`performance curve up to its maximum level, this being dependent on the
`exhaustivity of the indexing.
`
`AIDS TO EFFICIENCY
`If, as is not wholly unreasonable, we consider as a median in the range of
`indexing languages, one which consists of nothing except an uncontrolled
`vocabulary of sought terms, then many of the indexing devices mentioned in an
`earlier section of this paper can be seen to be working in opposing directions.
`Any device which reduces the number of index terms is working towards
`improved recall, with the inevitable result that there will be a fall in relevance.
`Other devices, such as role indicators, are, in effect, increasing the number of
`index terms and are thereby improving relevance but decreasing recall. These
`devices have been discussed in an earlier section and the part they will play in
`the further work will be considered later, but we would now consider the effect
`of one type of device which is found in many operating index languages.
`In the introduction, we said that much of the present theoretical work being
`done on index languages was trivial. This comment is based on what appear to
`
`II6
`
`© Emerald Backfiles 2007
`
`

`

`INDEX LANGUAGE DEVICES
`
`be logical deductions from the experimental work undertaken at Cranfield.
`The theoretical work which we have in mind is all concerned with what might
`be described as the grouping of the terms in the index language. This work is
`proceeding in many directions; facet classification, clumps, logical associations,
`word pairs, etc. To describe all this work as trivial shows that we have moved
`away from a main stream of development in information retrieval, and it may
`appear difficult to justify the statement. We would first make the point that we
`say that this work is trivial and do not say that it is useless. This could be
`translated into more precise figu

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket