throbber
EXHIBIT
`2CQ9
`.C>i..f.2to!'-/
`
`Enhancement of Text
`Representations Using Related
`Document Titles:!:
`
`G. Salton*
`Y.Zhangt
`TR 86-728
`January 1986
`
`Department of Computer Science
`Cornell University
`Ithaca, NY 14853
`
`:f: This study was supported in part by the National Science Foundation grant IST 83-16166.
`* Department of Computer Science, Cornell University, Ithaca, NY 14853.
`t Institute of Computer Technology, China Academy of Railway Sciences, Beijing, China.
`
`EXHIBIT 2009
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`

`

`
`
`
`
`
`
`

`

`Enhancement of Text Representations Using Related Document Titles
`
`G. Salton
`
`*
`
`and Y. Zhang
`
`**
`
`Ab stract
`
`Various attempts have been made over the years to construct enhanced
`
`document representations by using thesauruses of related terms, term associa-
`
`tion maps, or knowledge frameworks that can be used to extract appropriate
`
`terms and concepts.
`
`None of the proposed methods for the improvement of docu-
`
`ment representation has proved to be generally useful when applied
`
`to
`
`a
`
`variety of different retrieval environments.
`
`Some recent work by Kwok sug-
`
`gests that document indexing may be enhanced by using title words taken from
`
`bibliographically related items.
`
`An evaluation of the process shows that many
`
`useful content words can be extracted from related document titles, as well as
`
`many terms of doubtful value.
`
`Overall, the procedure is not sufficiently
`
`reliable to warrant incorporation into operational automatic retrieval sys-
`
`tems.
`
`*Department of Computer Science, Cornell University. Ithaca, NY 14853.
`
`**
`
`Institute of Computer Technology, China Academy of Railway Sciences, Beijing,
`China.
`
`This study was supported in part by the National Science Foundation under
`grant IST 83-16166.
`
`

`

`
`
`
`
`
`
`

`

`1.
`
`Term and Document Relations
`
`Most existing methods for the automatic content analysis of written texts
`
`are based in part on the extraction of certain words contained in the original
`
`document texts.
`
`While many words appearing in ordinary text are in fact use-
`
`ful for content representation, it is often believed that the use of text
`
`words does not provide a complete description of text meaning.
`
`For this rea-
`
`son, various additional content analysis tools have been introduced in the
`
`hope of obtaining more complete text representations.
`
`Among these tools are
`
`hajse that contain groupings of related words [1,2], automatically con-
`
`structed
`
`association
`
`ps based on co-occurrences of words in the texts
`
`of documents [3,4], and knowledg.
`
`frameworks representing the facts and rela-
`
`tionships that characterize particular subject areas. [5-7]
`
`Various methodologies have been suggested to help in the construction of
`
`the content analysis tools, including for example probabilistic theories of
`
`information processing that account for the use of term relationships and
`
`associations [8-10]. methods that include syntactic considerations for the
`
`construction of term phrases [li-13], and finally interactive procedures in
`
`which individual users may suggest term relationships of importance in their
`
`application based on a dialogue between user and system conducted from a user
`
`terminal. [14-16]
`
`Two main problems arise when term associations are proposed for text
`
`identification and processing:
`
`a)
`
`No theory exists which would help in distinguishing valuable term
`
`associations from less valuable ones, and no obvious help is available
`
`to aid in the construction of useful thesauruses, association maps,
`
`

`

`and knowledge bases.
`
`In practice, these tools are often tailored to
`
`particular text collections, and built for the occasion using ad-hoc,
`
`relatively nontransparent methods.
`
`b)
`
`The available evaluation results indicate that the single term text
`
`identification methods which use no term association theories at all
`
`are preferable, on average, to the more complex indexing methods that
`
`include various kinds of term relationships. 117,18]
`
`More specifi-
`
`cally, it appears easy to show advantages for certain term association
`
`methods when specially constructed vocabulary tools are available that
`
`have been tailored to specific collections or environments.
`
`tJnfor-
`
`tunately, most of the term relationships appear to be of local value
`
`only, because the performance improvements are not maintained when the
`
`test conditions and retrieval procedures change. [19]
`
`One conceptually simple method for the identification of term associa-
`
`tions consists in identifying relationships among the documeits of a col-
`
`lection, and using these to infer appropriate term associations.
`
`Specifi-
`
`cally, a document clustering operation can be performed, leading to the
`
`recognition of groups of related documents, and these groupings can be used
`
`in turn to determine relationships among the terms assigned to particular
`
`document groups.
`
`Instead of performing formal clustering operations, rela-
`
`tions between documents might be ascertained by utilizing for this purpose
`
`certain bibliographic citation links.
`
`In particular, two documents are
`
`bibliographically related when a citation link exists between them, that
`
`is, when one document cites the other, or when both documents are jointly
`
`cited by, or jointly cite, a third item.
`
`In these circumstances, one can
`
`assume that the items cover similar subjects, and hence the vocabulary of
`
`

`

`-4-
`
`one item might help in describing a bibliographically related item.
`
`This
`
`notion is further explored in the remainder of this study.
`
`2.
`
`Use of Related Title Words for Document Representation
`
`Bibliographic citations attached to texts and documents have been used
`
`for many years for the generation of document relationships, the determina-
`
`tion of influential bibliographic items, and the representation of changes
`
`in scientific disciplines over time. [20-25]
`
`Attempts have also been made
`
`to use bibliographic citations directly for document indexing, [26,27]
`
`In a recent series of papers, Kwok has suggested that the content
`
`analysis and indexing of texts might be improved by using in addition to
`
`the standard content identifiers certain words extracted from the titles
`
`bibliographically related documents.
`
`That is, if item A is bibliographi-
`
`cally related to item B, then certain title words from B might be used to
`
`index document A, or vice versa. [28-30]
`
`Consider, as an example, a particular document A as represented in
`
`Fig. 1.
`
`With respect to document A, four different types of document rela-
`
`tionships may be distinguished:
`
`Document A may refer to another document C by having C included in
`
`the reference list attached to A; C is then a cited document with
`
`respect to A.
`
`Document A may itself be cited by some other document B. if A is
`
`included in the reference list of B; B is then a citing document
`
`with respect to A.
`
`

`

`c)
`
`Document A may be cited in common with another document A' by a
`
`third document B'; in that case A and A' are cocited documents.
`
`d)
`
`Finally, documents A and A" may both refer to some common third
`
`document C; in that case A and A" are bibliographically coupled.
`
`The various document relationships with respect to document A are illus-
`
`trated in Fig. 1.
`
`Kwok's basic notion consists in taking each item A included in a col-
`
`lection, and adding to the index terms normally used to represent A. cer-
`
`tain new terms taken from the titles of bibliographically related docu-
`
`ments.
`
`In the experiments which follow, four types of citation relation-
`
`ships are examined using the following modified document collections:
`
`the citing collection, where the document terms are supplemented by
`
`the title words from all citing documents (A is supplemented by
`
`tenus from B)
`
`the cited collection, where the document terms are supplemented by
`
`the title words from all cited documents (A is supplemented by
`
`terms from C)
`
`the citing + cited collection. where the document terms are supple-
`
`mented by the union of the two previous sets (A is supplemented by
`
`terms from B and C)
`
`the cocited collection where the added terms are taken from docu-
`
`ments that are cocited with an item (A is supplemented by terms
`
`from
`
`

`

`When terms taken from related documents are added as identifiers for
`
`particular documents, two kinds of changes may occur in the original term
`
`set used to identify the documents.
`
`First, a certain number of terms may
`
`actually be found that did not appear in the original terni set used for a
`
`given document.
`
`In that case, the new term set identifying the item is
`
`larger than the original set.
`
`Second, a number of terms may be found in
`
`related documents that were already contained in the original set of terms
`
`prior to the term modification.
`
`In that case, no new terms may be added to
`
`the existing identifying term sets.
`
`However the weights of the originally
`
`available terms may be appropriately changed.
`
`In the experiments which follow, two kinds of term weights are used:
`
`A term Irequency (tE) weight, where the weight of a term is defined
`
`as the frequency of occurrence of the term in the document in ques-
`
`tion (or in an appropriately defined document excerpt such as an
`
`abstract);
`
`a
`
`frequency times inverse document frequency (tf
`
`X
`
`idf)
`
`weight, where the weight of a term is defined as the product of the
`
`term frequency multiplied by an inverse function of the document
`
`frequency (the number of documents in a collection to which a term
`
`is assigned).
`
`A typical value for
`
`idf might
`
`be obtained as
`
`log (N-n.)/n., where N is the collection size, and n. is the number
`
`of documents with term i.
`
`When terms chosen from related documents are added to the term sets
`
`characterizing particular documents,
`
`the text or
`
`text excerpts of the
`
`related documents are merged with the text of the original document.
`
`The
`
`

`

`frequency of occurrence of the terms in the merged documents will therefore
`
`be different from the occurrence frequencies in the original documents, and
`
`the term frequency weights will change.
`
`For example, if a term exhibits a
`
`frequency weight of 3 in document A and a frequency weight of 2 in a
`
`related document B, the final weight of the term will be 5 after merging of
`
`the documents.
`
`3.
`
`Experimental Results
`
`Two different document collections are used to evaluate the indexing
`
`process based on the use of title words from bibliographically related
`
`documents: the CACM collection consisting of
`
`3204
`
`articles
`
`originally
`
`appearing in the Communications of the ACM between 1957 and 1979, and the
`
`CISl collection consisting of 146 documents in the field of documentation
`
`and library science originally received from the Institute for Scientific
`
`Information in Philadelphia.
`
`The searches for both collections were car-
`
`ried out with collections of 47 queries each, and the search results are
`
`averaged for the 47 queries used in each case.
`
`The following information was originally available for each of the two
`
`sample collections:
`
`CACM:
`
`original documents plus references to cited and citing docu-
`
`ments; all cited and citing documents used are included in
`
`the original document collection of 3204 items.
`
`CISl:
`
`original documents plus references to cocited documents; the
`
`cocited documents are not necessarily included in the base
`
`collection of 146 documents.
`
`

`

`To perform a document expansion using title words for citing, cited
`
`and cocited documents, it was necessary to obtain the cocitations for CACM,
`
`and the cited and citing documents for Cisl.
`
`Since all citations for CACM
`
`were internal to the collection, the set of cocitations could be generated
`
`from the available citing documents for that collection.
`
`For CISl, only
`
`cocitations were originally available, and the sets of citing and cited
`
`documents were extraneous to the basic collection.
`
`In that case, the set
`
`of citing and cited documents pertaining to each of the 146 collection
`
`items had to be obtained from the citation index and the source index parts
`
`of the Social Science Citation Index, respectively. [31]
`
`The collection statistics for the CACM and CISl collections are sum-
`
`marized in Tables 1 and 2.
`
`For CACN, between 30 and 40 percent of the
`
`documents were actually altered by adding title words from bibliographi-
`
`cally related items; one of the modification methods (citing + cited)
`
`affected over 50 percent of the collection items.
`
`Approximately similar
`
`expansion percentages apply to CISl, except that for the cocited method a
`
`large majority of
`
`the
`
`collection
`
`(85
`
`percent
`
`of
`
`the documents)
`
`was
`
`affected.
`
`Approximately one tenth of the original document terms (normally
`
`between 3 and 5 terms per document) received changed term weights in the
`
`expansion process.
`
`More substantial changes were introduced by addition of
`
`new title terms not originally present in the documents.
`
`The figures shown
`
`at the bottom of Tables 1 and 2 show that about 6 to 8 new terms were added
`
`on average per document for CACM, except for the cocitation method which
`
`supplied nearly 27 new terms to each altered document.
`
`More substantial
`
`term additions occurred for CISl: over 30 terms were added to each altered
`
`document for the citing and citing + cited methods, and over 20 terms were
`
`added through cocitations.
`
`

`

`9
`
`The actual recall-precision search results appear in Tables 3 and 4
`
`for the CACM and CISl collections, respectively.
`
`The tables show search
`
`precision results averaged over 47 searches in each case, computed at ten
`
`levels of the recall for recall levels from 0.1 to 1.0 in steps of 0.1.
`
`Two types of term veightings are used including the tf weights ín the upper
`
`half of each table, and the tf X idf weight in the lower half.
`
`The CACM results of Table 3 show clearly that the addition to the
`
`document terms of title words from bibliographically related documents is
`
`beneficial, since the retrieval effectiveness improves by about 30 percent
`
`on average for the citing + cited method using tf weights, and by 15 per-
`
`cent for the citing + cited method using the more powerful tf
`
`X
`
`idf
`
`weights.
`
`An average effectiveness improvement above 10 percent is normally
`
`considered important enough to warrant
`
`serious
`
`attention,
`
`assuming of
`
`course that the needed bibliographically related documents are in fact
`
`accessible in practice.
`
`Unfortunately, this optimistic conclusion is not maintainable when the
`
`CISl results of Table 4 are considered.
`
`In that case, none of the biblio-
`
`graphic expansion method proved beneficial for the more effective tf
`
`X idt
`
`weighting system, the deterioration in effectiveness ranging from about 1
`
`percent to as much as 7 percent for the citing + cited method.
`
`Even for
`
`the less effective tf weights, only the cocitations afford a modest perfor-
`
`mance improvement.
`
`It is obvious that for CISl many of the terms added in
`
`the document expansion process are in fact poor terms that do not help in
`
`retrieval.
`
`Since most of the term expansion processes lengthen the CISl
`
`documents by about 50 percent, one can conjecture that the changes in docu-
`
`ment indexing are too extensive and too sweeping in that case.
`
`The altera-
`
`

`

`- 10 -
`
`tions in the identifying term sets were much more modest for CACH, and the
`
`effectiveness of the procedures was correspondingly greater.
`
`The performance of the document expansion process may be illustrated
`
`by considering positive and negative examples for each of the two collec-
`
`tions.
`
`Examples of an ideal document expansion process are shown in Tables
`
`5 and 6 for the CACN and CiSl collections.
`
`In the example of Table 5, a
`
`relevant document number 2150 receives an initial retrieval rank of 700
`
`with respect to query 7 (that is. 699 other documents are retrieved ahead
`
`of that particular document).
`
`In its original form, the document exhibits
`
`only two word stems in conmion with the query (PROCES and CONCUR), and both
`
`terms have low term frequency weights.
`
`When the document is expanded using
`
`bibliographical relationships,
`
`the
`
`important initial term CONCUR (from
`
`"concurrence") has a quadrupled weight, and another important term SYNCHRON
`
`(from "synchronization") is added to the term set. As a result, the output
`
`rank of document 2150 improves from 700 to 27.
`
`The same phenomenon may be noted in Table 6 for CISl document 60 and
`
`query 34.
`
`In that case, the output rank improves from 48 to 15 because the
`
`important term INDIC (from "indexing") is added to the document in the
`
`expansion process with a high weight of 6.
`
`The examples of Tables 7 and 8 demonstrate that the terms obtained
`
`from
`
`bibliographically
`
`related
`
`items
`
`may
`
`also
`
`produce
`
`substantial
`
`deterioration in performance.
`
`The output rank of relevant document 2902
`
`with respect to query 27 falls from 18 to 37, because only one new term
`
`(SYSTEN) is added to the document, and that term is not especially useful
`
`for content identification in the CACM collection.
`
`The term SYSTEM is
`
`again added to document 113 of Table 8. and the expanded document has a
`
`

`

`lower retrieval rank (33) than the original (27).
`
`The evaluation results of Tables 3 to 8 lead to the conclusion that
`
`the term association process based on bibliographically related title words
`
`is not reliable.
`
`Important terms may be supplied in some instances, pro-
`
`ducing substantial performance improvements; in other cases, the process
`
`adds indifferent or poor terms to the content descriptions of the docu-
`
`ments.
`
`Since no obvious way exists for distinguishing the positive from
`
`the negative effects, the citation methodology cannot be recommended for
`
`inclusion in practical retrieval environments.
`
`It appears that any term
`
`association process, whether based on statistical word co-occurrence cri-
`
`teria or on intellectually constructed vocabulary aids, must include strict
`
`syntactical and/or semantic controls if the generation of inappropriate
`
`related term groups is to be prevented.
`
`Until a usable theory of term
`
`association is developed,
`
`it appears best to maintain the single term
`
`automatic indexing methods which are simple to implement, and which are
`
`known to produce reasonable retrieval performance.
`
`

`

`- 12 -
`
`References
`
`Construction and Mainte-
`£ 1] D. Soergel, Indexing Languages and Thesauri:
`nance, Melville Publishing Company, Los Angeles, CA, 1974.
`
`G. Salton, Experiments in Automatic Thesaurus Construction for mf or-
`mation Retrieval, in Information Processing 71, North Holland Publish-
`ing Company, Amsterdam, 1972, 115-123.
`
`L.B. Doyle, Semantic Road Maps for Literature Searchers, Journal of
`the ACM, 8. 1961. 553-578.
`
`C 4] V.E. Giuliano, Automatic Message Retrieval by Associative Techniques,
`in Joint Man-Computer Languages, Mitre Corporation Report SS-lO, Bed-
`ford, M, 1962, 1-44.
`
`E 53 M. Minsky, A Framework for Representing Knowledge, P.11. Winston, edi-
`tor, The Psychology of Computer Vision, McGraw Hill Book Company, NY,
`1975, 211-277.
`
`R.C. Schank and LP. Abelson, Scripts. Plans, Goals and Understanding,
`Lawrence Eribaum Associates, Hillsdale, NJ, 1977.
`
`R.J. Brachman and B.C. Smith, Special Issue on Knowledge Representa-
`tion, SIGART Newsletter, No. 70, February 1980.
`
`E 8] C.J. van Rijsbergen. Information Retrieval, Second Edition, Butter-
`worths. London. 1979.
`
`E 93 C.J. van Rijsbergen. A Theoretical Basis for the Use of Cooccurrence
`Data in Information Retrieval. Journal of Documentation, 33. 1979,
`106-119.
`
`C.T. Yu, C. Buckley, K. Lam and G. Saltan. A Generalized Term Depen-
`Information
`Technology:
`Information
`Retrieval,
`Model
`dence
`in
`Research and Development. 2, 1983, 129-154.
`
`P.11. Klingbiel, A Technique for Machine-Aided Indexing, Information
`Storage and Retrieval, 9:9, 1973, 477-494 and 9:2. 1973. 79-84.
`
`A Fully Automatic Syntactically Based
`[123 M. Dillon and A.S. Gray, FASIT:
`Indexing System. Journal of the ASIS, 34:2, 99-108, 1983.
`
`in Automatic
`Automatic
`Phrase
`Readings
`Saltan,
`[13] G.
`Matching,
`in
`Language Processing, D.G. Hays, editor. Am. Elsevier Publishing Com-
`pany, NY, 1966. 169-188.
`
`[143 W.B. Croft. An Expert Assistant for a Document Retrieval System. Proc.
`RIAO-85 Conference. Grenoble, France. March 1985, 131-149.
`
`

`

`- 13 -
`
`£15] B.W. Ballard, J.C. Lusth and N.L. Tinkham, LCD-1:
`A Transportable,
`Knowledge Based Natural Language Processor for Office Environments,
`ACM Transactions on Office Information Systems. 2:1. January 1984, 1-
`25.
`
`A Transportable Natural Language Interface System,
`B.J. Grosz, TEMI:
`of Applied Natural Language Processing Conference, Association
`Proc.
`for Computational Linguistics, Santa Monica, CA, 1983, 39-45.
`
`C.W. Cleverdon and E.M. Keen, Factors Determining the Performance of
`Indexing Systems, Vol. 1:
`Design, Aslib Cranfield Research Project,
`Cranfield, England, 1966.
`
`G. Salton and N.E. Lesk, Computer Evaluation of Indexing and Text Pro-
`cessing. Journal of the ACM, 15:1, January 1968, 8-36.
`
`[19] M.E. Lesk, Word-Word Associations in Document Retrieval Systems, Amer-
`ican Documentation, 20:1, January 1969, 27-38.
`
`N.M. Kessler, Bibliographic Coupling Between Scientific Papers, Ameri-
`can Documentation, 14:1, January 1963, 10-25.
`
`E. Garfield, Citation Indexes for Science, Science, 122:3159, 15 July
`1955. 108-111.
`
`J. Nargolis, Citation Indexing and Evaluation of Scientific Papers.
`Science, Vol. 155, 10 March 1967, 1213-1219.
`
`J.H. Westbrook, Identifying Significant Research. Science, 132:3435,
`28 October 1960, 1229-1234.
`
`A New Measure of
`H. Small, Cocitation in the Scientific Literature:
`the Relationship between Two Documents, Journal of the ASIS, 24:4,
`July-August 1973, 265-269.
`
`J. Bichteler and E.A. Eaton, The Combined Use of Bibliographic Cou-
`pling and Cocitation in Document Retrieval, Journal of the ASIS. 31:4,
`July 1980, 278-282.
`
`M.M. Kessler, Comparison of Results of Bibliographic Coupling and Ana-
`lytic Subject Indexing, Am. Documentation, 16:3, July 1965, 223-233.
`
`G. Salton, Automatic Indexing using Bibliographic Citation8, Journal
`of Documentation, 27:2, June 1971, 98-110.
`
`K.L. Kwok, A Probabilistic Theory of Indexing and Similarity Measure
`Based on Cited and Citing Documents, Journal of the ASIS, 36:5. 1985,
`342-351.
`
`K.L. Kwok, A Document-Document Similarity Measure Based on Cited
`Titles and Probability Theory and its Application to Relevance Feed-
`back Retrieval, in Research and Development in Information Retrieval,
`C.J. van Rijabergen. editor, Cambridge University Press, 1984. 221-
`232.
`
`

`

`- 14 -
`
`K.L. Kwok, The Use of Titles and Cited Titles as Document Representa-
`tiolis for Automatic Classification, Information Processing and Manage-
`ment, Vol. 11, 1975, 201-206.
`
`E. Garfield, Citation Indexing - Its Theory and Application in Sci-
`ence, Technology and Humanities, J. Wiley and Sons, New York, 1979.
`
`

`

`Document A"
`Refers to C
`
`r
`) Documents A, A" are
`\ Bibliographically
`Coupled
`
`Document B refers
`to Document A
`
`Citing Document B
`
`Base Document
`A
`
`Document C
`is cited by A
`
`Documents A, A' are
`Cocited
`
`Cited Document C
`
`Document A'
`is cited by B
`
`Document Pairs
`
`Bibliographic Relation
`
`B - A
`A - C
`A - A"
`A - A'
`
`citing-cited
`citing-cited
`bibliographically coupled
`cocited
`
`Citation Relations between Documents
`
`Fig. 1
`
`

`

`- 16 -
`
`Cited
`Collection
`
`Citing
`Collection
`
`Cocited
`Collection
`
`Cited
`Citin
`Collect j
`
`1145
`
`1111
`
`985
`
`1751
`
`35.9%
`
`34.7%
`
`30.1%
`
`54.7%
`
`2652
`
`2652
`
`12050
`
`5304
`
`16951
`
`16951
`
`16951
`
`16951
`
`Number of Documents Which Were
`Altered by Citation Process
`
`Proportion of Documents Which Were
`Altered by Citation Process
`
`Total Number of Bibliographically
`Related Documents Used
`for Document Vector Alteration
`
`Number of Distinct Terms in
`Çollection
`
`Mean Number of Terms per Document
`
`38.9
`
`39.3
`
`45.0
`
`41.5
`
`Totàl Number of Terms with Changed
`Term Weights
`
`Mean Number of Terms with Changed
`Weights Among Altered Documents
`
`Mean Number of Terms with Changed
`Weights for All Document
`
`Total Number of New Terms Added
`
`Mean Number of Added Terras Among
`Altered Documents
`
`Mean Number of Added Terms For
`All Documents
`
`3580
`
`3108
`
`4375
`
`6205
`
`3.12
`
`1.12
`
`7154
`
`6.23
`
`2.80
`
`4.44
`
`3.54
`
`0.97
`
`1.37
`
`1.94
`
`8187
`
`7.38
`
`26589
`
`26.99
`
`15321
`
`8.75
`
`2.23
`
`2.56
`
`8.30
`
`4.78
`
`CACH Collection Statistics
`(3204 documents, 47 queries)
`
`Table 1
`
`

`

`- 17 -
`
`Cited
`Collection
`
`Citing
`Collection
`
`Cocited
`Collection
`
`Cited +
`Citing
`Collection
`
`Number of Documents Which Were
`Altered by Citation Process
`
`Proportion of Documents Which Were
`Altered by Citation Process
`
`Total Number of Bibliographically
`Related Documents Used
`for Document Vector Alteration
`
`Number of Distinct Terms in
`Collection
`
`36
`
`25%
`
`96
`
`54
`
`37%
`
`555
`
`124
`
`85%
`
`787
`
`54
`
`37%
`
`651
`
`2074
`
`2273
`
`2044
`
`2297
`
`Mean Number of Terms per Document
`
`54.8
`
`64.1
`
`71.0
`
`65.5
`
`514
`
`9.52
`
`Total Number of Terms with Changed
`Term Weight8
`
`Mean Number of Terms with Changed
`Weights Among Altered Documents
`
`Mean Number of Terms with Changed
`Weights for All Document
`
`138
`
`3.83
`
`467
`
`529
`
`8.65
`
`4.27
`
`0.95
`
`3.20
`
`3.62
`
`3.52
`
`Total Number of New Terms Added
`
`Mean Number of Added Terms Among
`Altered Documents
`
`Mean Number of Added Terms For
`All Documents
`
`261
`
`7.25
`
`1.79
`
`1624
`
`30.07
`
`2628
`
`21.19
`
`1829
`
`33.87
`
`11.12
`
`18.00
`
`12.53
`
`CISl Collection Statistics
`(146 documents, 47 queries)
`
`Table 2
`
`

`

`- 18 -
`
`Recall
`
`Original
`Collection
`
`Original
`+ Cited
`
`Original
`+ Citing
`
`Original
`+ Cocited
`
`Original
`+ Cited
`+ Citing
`
`0.1
`0.2
`0.3
`0.4
`0.5
`0.6
`0.7
`0.8
`0.9
`1.0
`
`.3445
`.2957
`.1982
`.1454
`.1057
`.0752
`.0559
`.0440
`.0242
`.0172
`
`.3881
`.3279
`.2378
`.2796
`.1214
`.0973
`.0741
`.0480
`.0277
`.0184
`
`.4073
`.3320
`.2548
`.1869
`.1350
`.0880
`.0568
`.0451
`.0287
`.0181
`
`+18%
`+12%
`-i-29%
`+29%
`+28%
`+17%
`+ 2%
`+ 3%
`+19%
`+ 5%
`
`.3529
`.2968
`.2437
`.1837
`.1246
`.0985
`.0714
`.0436
`.0244
`.0185
`
`.4388
`.3549
`.2729
`.2076
`.1507
`.1074
`.0706
`.0496
`.0298
`.0198
`
`+27%
`+20%
`+38%
`+43%
`+43%
`+43%
`+26%
`+13%
`+23%
`+15%
`
`a) CACM Collection (term frequency weights)
`(citing +16%, cited + citing + 29%, cocited + 13.6%)
`
`Recall
`
`Original
`Collection
`
`Original
`+ Cited
`
`Original
`+ Citing
`
`Original
`+ Cocited
`
`Original
`+ Cited
`+ Citing
`
`0.1
`0.2
`0.3
`0.4
`0.5
`0.6
`0.7
`0.8
`0.9
`1.0
`
`.5274
`.4408
`.3721
`.2939
`.2344
`.1895
`.1290
`.0953
`.0598
`.0418
`
`.5549
`.4684
`.4069
`.3337
`.2641
`.2282
`.1534
`.1144
`.0694
`.0464
`
`.5504
`.4566
`.4078
`.3345
`.2756
`.2215
`.1418
`.1042
`.0687
`.0397
`
`+ 4%
`+ 4%
`+10%
`+14%
`+18%
`+17%
`+10%
`+ 6%
`+15%
`- 5%
`
`.5051
`.4466
`.3723
`.3143
`.2476
`.2046
`.1425
`.1070
`.0696
`.0444
`
`.5454
`.4636
`.4169
`.3426
`.2837
`.2340
`.1663
`.1131
`.0736
`.0410
`
`+ 3%
`+ 5%
`+12%
`+17%
`+21%
`+23%
`+29%
`+19%
`+23%
`- 2%
`
`b)
`
`ACH Collection (term frequency times inverse document frequency)
`(citing + 9.3%
`cited + citing + 15%, cocited + 7.1%)
`
`Document Term Expansion - CACM
`
`Table 3
`
`

`

`- 19 -
`
`Recall
`
`Original
`Collection
`
`Original
`+ Cited
`
`Original
`+ Citing
`
`Original
`+ Cocited
`
`Original
`+ Cited
`+ Citing
`
`0.1
`0.2
`0.3
`0.4
`0.5
`0.6
`0.7
`0.8
`0.9
`1.0
`
`.3793
`.3414
`.3014
`.2818
`.2547
`.1844
`.1728
`.1461
`.1231
`.1139
`
`.3702
`.3299
`.2853
`.2707
`.2380
`.1837
`.1722
`.1457
`.1217
`.1123
`
`.3683
`.3410
`.2880
`.2669
`.2477
`.1838
`.1714
`.1513
`.1242
`.1148
`
`- 3%
`0%
`- 4%
`- 5%
`- 3%
`0%
`- 1%
`+ 4%
`+ 1%
`+ 1%
`
`.4237
`.3816
`.3291
`.2994
`.2731
`.1911
`.1708
`.1501
`.1309
`.1263
`
`.3638
`.3348
`.2895
`.2690
`.2452
`.1842
`.1717
`.1526
`.1269
`.1166
`
`- 4%
`- 2%
`- 4%
`- 5%
`- 4%
`0%
`- 1%
`+ 4%
`+ 3%
`+ 2%
`
`CISl Collection (term frequency weights)
`a)
`(citing -1%. cocited + 6.9%. citing + cited -1.1%)
`
`Recall
`
`Original
`Collection
`
`Original
`+ Cited
`
`Original
`+ Citing
`
`Original
`+ Cocited
`
`Original
`+ Cited
`+ Citing
`
`0.1
`0.2
`0.3
`0.4
`0.5
`0.6
`0.7
`0.8
`0.9
`1.0
`
`.4909
`.4308
`.3634
`.3290
`.3092
`.2114
`.1933
`.1755
`.1523
`.1469
`
`.4762
`.4260
`.3572
`.3213
`.3045
`.2108
`.1862
`.1691
`.1499
`.1375
`
`.4937
`.4372
`.3590
`.3221
`.3028
`.2160
`.1718
`.1522
`.1301
`.1230
`
`+ 1%
`+ 1%
`- 1%
`- 2%
`- 2%
`+ 2%
`-11%
`-13%
`-15%
`-16%
`
`.4484
`.3928
`.3332
`.3122
`.2898
`.2358
`.1899
`.1814
`.1589
`.1529
`
`.4955
`.4349
`.3596
`.3274
`.3100
`.2078
`.1642
`.1451
`.1223
`.1139
`
`+ 1%
`+ 1%
`- 1%
`0%
`0%
`- 2%
`-15%
`-17%
`-20%
`-22%
`
`b)
`
`CISl Collection (term frequency times inverse document frequency)
`(citing -5.6%. citing + cited -7.5%, cocited -16%)
`
`Document Term Expansion - CiSl
`
`Table 4
`
`

`

`- 20
`
`AI: Query Z., Document 21.5Q
`
`Query 7
`
`I am interested in distributed algorithms - concurrent programs in
`which processes communicate and synchronize by using message passing.
`Areas of particular interest include fault-tolerance and techniques
`for understanding the correctness of these algorithms.
`
`Document:
`2150
`
`.T (Title)
`Concurrent Control with "Readers" and "Writers"
`.W (Abstract)
`The problem of the mutual exclusion of several independent processes
`from simultaneous access to a "critical section" is discussed for
`the case where there are two distinct classes of processes known as
`"readers" and "writers."
`The "readers" may share the section with
`each other, but the "writers" must have exclusive acce8s.
`Two
`solutions are presented:
`one of the case where we wish minimum
`delay for the readers; the other for the case where we wish writing
`to take place as early as possible.
`.B (Citation)
`CACM October, 1971
`.A (Authors)
`Courois, P. J.
`Heymans, F.
`Parnas, D. L.
`.K (Keywords)
`mutual exclusion, critical section, shared access to resources
`.0 (Computing Reviews Categories)
`4.30 4.32
`
`Original
`(Output Rank 700)
`
`Modified Document
`(Output Rank 27)
`
`Query Terms
`in Document
`
`Weights
`
`Query Terms
`in Document
`
`Weights
`
`Remarks
`
`PROCES
`CONCUR
`
`2.00
`1.00
`
`PROCES
`PROGRAM
`SYNCHRON
`CONCUR
`
`2.00
`2.00
`2.00
`4.00
`
`new indifferent term
`new good term
`good term with increased weight
`
`Positive Document Modification (CACM Collection)
`
`Table 5
`
`

`

`- 21 -
`
`CISl Query j4. J.ocument
`
`Query 34:
`
`Methods of coding in computerized indexing systems
`
`Document:
`60
`
`What Is It?
`
`.T (Title)
`Information Science:
`.A (Author)
`Borko, H.
`.W (Abstract)
`In seeking a new sense of identity, we ask, in this article, the
`What is information science?
`What does information
`question:
`Tentative answers to these questions are given in the
`science do?
`hope of stimulating discussion that will help clarify the nature
`of our field and our work
`
`Original Document
`(Output Rank 48)
`
`Modified Document
`(Output Rank 15)
`
`Query Terms
`in Document
`
`Weights
`
`Query Terms
`in Document
`
`Weights
`
`Remarks
`
`SYSTEM
`COMPUT
`
`2.00
`2.00
`
`SYSTEM
`COMPUT
`INDIC
`
`9.00
`1 00
`6.00
`
`weight increase
`
`new term
`
`Positive Document Modification (Cisl Collection)
`
`Table 6
`
`

`

`- 22 -
`
`CACN Query 27, Documenl 2902
`
`Query 27:
`
`Memory management aspects of operating systems
`
`Document:
`2902
`
`.T (Title)
`Dynamic Memory Allocation in Computer Simulation
`.W (Abstract)
`This paper investigates the performance of 35 dynamic memory allocation
`algorithms when used to service simulation programs as represented
`Algorithm performance was measured in terms of
`by 18 test cases.
`processing time, memory usage, and external memory fragmentation.
`Algorithms maintaining separate free space lists for each size of
`memory block used tended to perform quite well compared with other
`Simple algorithms operating on memory ordered lists
`algorithms.
`(without any free list) performed surprisingly well.
`Algorithms
`employing power-of-two block sizes had favorable processing require-
`ments but generally unfavorable memory usage.
`Algorithms employing
`LIFO, FIFO, or memory ordered free lists generally performed poorly
`compared with others.
`.B (Citation)
`CACM November, 1977
`.A (Author)
`Nielsen, N. R.
`.K (Keywords)
`algorithm performance, dynamic memory allocation, dynamic memory
`management, dynamic storage allocation, garbage collection, list
`processing, memory allocation, memory management, programming
`techniques. simulation, simulation memory management, simulation
`techniques, space allocation, storage allocation
`
`Original Document
`(Output Rank 18)
`
`Modified Document
`(Output Rank 37)
`
`Query Terms
`in Document
`
`Weights
`
`Query Terms
`in Document
`
`Weights
`
`Remarks
`
`OPER
`MEN
`
`1 .00
`8.00
`
`SYSTEM
`OPER
`MEN
`
`2.00
`1.00
`9.00
`
`new unimportant term
`
`slight weight increase
`
`Negative Document Modification (CACM Collection)
`
`Table 7
`
`

`

`Cisl Query 19, Document 113
`
`Query 19:
`
`Techniques of machine matching and machine searching systems.
`Coding and matching methods.
`
`Document:
`119
`
`Problems in the
`
`.T (Title)
`Measuring the Quality of Sociological Research:
`Use of the Science Citation Index
`.A (Author)
`Cole, S.
`.W (Abstract)
`The problem of assessing the "quality" of scientific publications
`has long been a major impediment to progress in the sociology of
`Most researchers have typically paid homage to the belief
`science.
`that quantity of output is not the equivalent of quality and have
`then gone ahead and used publication counts anyway (Cole, 1963;
`There seemed to be no
`Crane, 1965; Prince, 1963; Wilson, 1964).
`practicable way to measure the quality of large numbers of papers
`The invention
`or the life's work of large numbers of scientists.
`of the Science Citation index (SCI) a few years ago provides a
`new and reliable tool

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket