`Docun1ent Retrieval System(cid:173)
`An Illustration
`
`G. SALTON AND j\.f. E. L]lSK
`Har-vard lhdve:rsily*, Cambridge, Mass.
`
`A fully automatic document retrieval system operating on
`t11e IBM 7094 is described. The system is characterized by the
`fact that several hundred different methods ere available to
`analyze documents and search requests. This feature is used
`in the retrieval process by leaving the exact sequence of
`operations initially unspecified, and adapting
`the search
`strategy to the needs of individual users.
`The system is used not only to simulate en actual operating
`environment, but also to test tl1e effectiveness of the various
`available processing metl1ods. Results obtained so far seem
`to indicate that some combination of analysis procedures can
`in general be relied upon to retrieve the wonted information.
`A typical search request is used as en example in the present
`report to illustrate systems operations and evaluation pro(cid:173)
`cedures.
`
`I. Introduction
`
`ln 1957, Luhn suggested a fully automatic procedure
`for the processing of written texts, based on the frequency
`of occmTence of words within the texts [1]. Specifically,
`use of high-frequency words was advocated for purposes
`of content identification, and documertt rettieval was to
`be effected by manipulation of the corresponding word(cid:173)
`~rcquency lists. The suggested pt·ocedure, even t.hough
`tmperfect, is still used as the basis for many automatic
`text-processing systems.
`In the SMART retrieval system, an attempt is made to
`go beyond t;he original wot·d matching procedures by
`generating more effective content indicators to identify
`documents and search request;s. This is accomplished in
`part by generating wol'd stems fl'om the original word
`forms, by introducing synonym dictionaries to lessen the
`effe<:ts of vocabnlaty variations, and, most importantly,
`by identifying relations bet.wccn cet·tain words to be used
`
`H. KOLLER, Editor
`
`as content indicators in conjunction with the surrounding
`words.
`Stored documentB and search requests are processed
`without any prior manual analysis by one of several hun(cid:173)
`dred possible methods, and those documents which most
`nearly match a given s~.arch request are extracted from
`the document file. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode; the user is then free to analyze the output
`obtained, and depending on his further requirements a
`reprocessing of the request may be ordered under new
`conditions. The new output can again be examined, and
`the process iterated until such time as the right kind and
`amount of information are retrieved.
`SMART is thus designed to overcome many of the short(cid:173)
`comings of presently available automatic retrieval sys(cid:173)
`tems, and may serve as a reasonable prototype for fully
`automatic document retrieval. The following summarizing
`characteristics may be of principal interest.
`(a) The information analysis is believed to be suffi(cid:173)
`ciently deep and refined to ensure the identification of
`most relevant material in answer to most search requests.
`(b) The varying needs of individual users are recog(cid:173)
`nized by enabling each user to call on many different text
`processing modes, and by choosing a suitable sequence
`of procedures, eventually to obtain satisfactory retrieval
`performance.
`(c) The system can serve as a means for evaluating the
`effectiveness of a large vatiety of automatic analysis pro(cid:173)
`cedures, in that the same search requests can be processed
`against the same document collection in many different
`ways and results compared.
`The present report illustrates the capabilities of the
`system by using a typical search request and exhibiting
`some of the processing steps as well as the retrieval results.
`The detailed systems organization as well as the main
`programming aspects are not included.'
`
`2. Processing Options
`The SMART retrieval system is designed around a
`general supervisory system which can in turn call on many
`different subroutines. The supervisor accepts input in(cid:173)
`structions to specify the iype of operation to be performed,
`as well as control data to choose the subroutines which
`are t.o be called. At present, eight basic input instructic:ms
`
`.The \\'ode described in this study was suppol'ted by the N:Ltional
`Science l!'cmnd:Ltion under gr11.nt GN-245.
`* Con1putation L:Lbomtory.
`
`Volume R I Number 6 I .Tune, 1965
`
`'Additional, detniled descriptions of the SMART system may
`be found in[2, 3].
`EXHIBIT 2035
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`Communications of the ACM
`
`391
`
`
`
`The SMAll'r Autornatic
`Docurne nl l{etricval System(cid:173)
`An Illustration
`
`G. SAI!l'O:\ c\::\D l\1. K LJ•;sK
`Harvard L'nz'uersily*, Cambriclr;e, Lliass.
`
`A fully automatic document retrieval system operating on
`the IBM 7094 is described. The system is characterized by the
`fact that several hundred different methods are available to
`analyze documents and search requests. This feature is used
`in the retrieval process by leaving the exact sequence of
`operations
`initially unspecified, and adapting
`the search
`strategy to the needs of individual users.
`The system is used not only to simulate an actual operating
`environment, but also to test the effectiveness of the various
`available processing methods. Results obtained so far seem
`to indicate that some combination of analysis procedures can
`in general be relied upon to retrieve the wanted information.
`A typical search request is used as an example in the present
`report to illustrate systems operations and evaluation pro(cid:173)
`cedures.
`
`I. Introduction
`
`In 19ii7, Luhn suggested a fully automatic procedure
`for the processing of written texts, based on the frequency
`of occurrence of words within the texts [1]. Specifically,
`use of high-frequency words was advocated for pmposes
`of r:ontenL identification, and document retrieval was to
`be effected by manipulation of the corresponding word(cid:173)
`~requeney lists. The suggested procedure, even though
`tmperfect, is still used as Lhe basis for many automatic
`tcxt-proeessing systems.
`In the S:V!ART retrieval system, an attempt is made to
`go beyond the original word matching procedures by
`generating more nffcetive content indicators to identify
`documentt~ and search requests. This is accomplished in
`~art by generating word sLcms from the Ol'iginal word
`forms, by introdw~ing synonym dietionaries to lessen the
`effeets of vocabulary variations, and, most importantly,
`by identifying relations between certain words to be used
`
`H. KOLLER, Editor
`
`as content indicators in conjunction with the surrounding
`words.
`Stored documents and search requests are processed
`without any prior manual analysis by one of several hun(cid:173)
`dred possible methods, and those documents which most
`nearly match a given search request are extracted from
`the document file. The system may be controlled by the
`user in that a search request can be processed first in a
`standard mode; the user is then free to analyze the output
`obtained, and depending on his further requirements a
`reprocessing of the request may be ordered under new
`conditions. The new output can again be examined, and
`the process iterated until such time as the right kind and
`amount of information are retrieved.
`s;vrART is thus designed to overcome many of the short(cid:173)
`comings of presently available automatic retrieval sys(cid:173)
`tems, and may l:lerve as a rcai:lonable prototype for fully
`automatic document retrieval. The following summarizing
`characteristics may be of principal interest.
`(a) The information analysis is believed to be suffi(cid:173)
`ciently deep and refined to ensure the identification of
`most relevant material in answer to most search requests.
`(b) The varying needs of individual users are recog(cid:173)
`nized by enabling each user to call on many different text
`processing modes, and by choosing a suitable sequence
`of procedures, eventually to obtain satisfactory retrieval
`performance.
`(c) The system can serve as a means for evaluating the
`effectiveness of a large variety of automatic analysis pro(cid:173)
`cedures, in that the same search requests can be processed
`against the same document collection in many different
`ways and results compared.
`The present report illustrates the capabilities of the
`system by using a typical search request and exhibiting
`some of the processing steps as well as the retrieval results.
`The detailed systems organization as well as the main
`programming aspects are not included. 1
`
`2. Proeessing Options
`The S:\1ART retrieval system is designed around a
`general supervisory system which can in turn eall on many
`different subroutines. The supervisor accepts input in(cid:173)
`structions to spceify the type of operation to be performed,
`as well as control data to choose tho subroutines which
`are to be called. At present, eight basic input instructions
`
`The work deseribed in this study was supported by the National
`Sewn,.<, Foundation under grant UN-245.
`*Computation Laboratory.
`
`I Additional, detailed descriptions of the SMAltT system may
`be found in [2, 3].
`
`Voltuue a I N uruhet· (i I .June, 1965
`
`Cornrnunieations of the ACiVI
`
`391
`
`
`
`are available, in addition to thirty-five different processing
`options and six variable parameter settings.
`Five basic dictionaries, or tables, are incorporated into
`the system: an alphabetic-stem dictionary, also known as
`the thesaurus, designed to supply each word stem with a
`number of syntactic and semantic codes; an alphabetic··
`suffix table to obtain syntactic codes for word suffixes; a
`numeric-concept hierarehy to represent various relations
`between semantic categories; a syntaeLic (criterion) phrase
`dictionary to aid in the syntactic processing; and a statis(cid:173)
`tical-phrase list.
`The following principal facilities are used in the system
`for purposes of information analysis and general proc(cid:173)
`essing:
`(a) a system for separating English words into l!tenLS
`and affixes, using a dual left-to-right and right-to-left
`scanning procedure, and incorporating extensive English
`morphological rules to detect doubling of consonants,
`deletion of final "e", and "y" to "i" changes, before addi-·
`tion of a suffix;
`(b) a thesaurus look-up method using list-tracing meth(cid:173)
`ods to replace word stems by "concept" numbers (the
`present thesaurus includes about 500 concepts in the
`computer literature corresponding to about 3000 English
`stems);
`(e) a so-called "vacuous" thesaurus in which the origi(cid:173)
`nal word stems included in a text function as concepts;
`(d) a hierarch£cat arrangement of the concepts included
`in the thesaurus, and a eomplete set of list-processing
`methods which make it possible, given any concept num(cid:173)
`ber, to find its "parent" in the hierarchy, its "sons", its
`"brothers", and any of a set of possible cross-references;
`(c)
`statistical procedures to compute stem (or concept)
`similarity coefficients based on CO··oecurrence of terms within
`the sentences of a given document, or within the documents
`of a given collection; association factors between docu(cid:173)
`ments can also be determined, as ean clusters (rather than
`only pairs) of related documents or related concepts;
`syntactic-phrase-matching procedures which make
`(f)
`it possible to match the syntaetieally analyzed sentences
`of documents and seareh requests with a precoded dic(cid:173)
`tionary of "criterion" phrases; the phrase matcher recog(cid:173)
`nizes a large number of semantically equivalent but syn(cid:173)
`tactically quite different constructions, and assigns the
`same concept numbers to all such equivalent construetions
`(as, for example, to "information retrieval", "the retrieval
`of information", "the retrieval of documents", "text
`processing", and so on); a dietionary of about 120 criterion
`phrases corresponding to several thousand English con(cid:173)
`structions in tho computer literature, is used at present,
`and two phrases arc defined as equivalent if concept num(cid:173)
`bers and syntaetie indieators match, and if the syntactic
`dependencies between concepts arc preserved;
`(g)
`"statistical phm8e" rncdching procedures which
`operate like syntactic phrases except that no syntaetie
`analysis is performed, and syntactie dependencies are
`disregarded (n statistical phrase is thus in fact equivalent
`
`to a set of eonccpLs co occurring in a :-:l'lllr'IICI' of a doeu.
`ment);
`(h) a complete sd of uprlat inq roll/ incs designed to
`alter the five principal die\.ionarir•s includ<~d in Lho system
`(stem t.hesaurus, suffix dic\iona(l', <'OW'l'pL hierarchy,
`statistical phrases and syrtlal'l ic critct·ion phmsos);
`
`/Cor'r'p·Jisory OperotJons
`/ OptiOr'ICI Operotior~s
`
`To Request
`P'OCeSSifl9
`
`FIG, l, Preprocessing of input text inellHiing main content
`analysis procedures
`
`PRE-PROCESSED DOCUMENTS
`
`A2, A5,
`A6 AS 86
`
`/Compulsory
`~/ 0pt1Q0()1
`
`llequest-document processing including ltierarehicnl
`FIG, 2.
`processing and statistical eorTelntion;.;
`
`392
`
`Communications of the ACM
`
`Volume a I Nurnht"' ()I .June, 1965
`
`
`
`(i) a snpr:n>isory system, culled CHIEF, designed to
`decode a large va.ridy of' inpuL im;Lnwt.iom; and t.o arrange
`the pror:e;.;sing S<'(]LWII<'<' in ac:cordalJ(:c with the instruc(cid:173)
`tions given.
`A flowchart. or the <·on1plctc ;.;yste1u (<:xelusive of t.l!e
`didionary updating I'OUI inc;.;) is :-:hoWil in Figmes 1 and 2
`respcet.ivcly. Vigurc l de:-:crilH:s th<: proeessing which i~
`in general performed only one<: for eaeh document or
`search reqw:st., including Llw lookup operations in the
`various dictionaries and the :,;ynt.ac:tie analysis. The re(cid:173)
`quest processing proper, eonsistinp; of the matching of
`requests and preprocessed doeument veetors is shown in
`Figure 2. This chart also illustrates the hierarchical pro(cid:173)
`cedures, as well as the statist.ieal term-tcrrn and docu(cid:173)
`ment-document correlations. Most of the procedures
`described in Figures 1 and 2 arc optional, as shown by the
`dotted lines. The principal tape assignments included in
`the flowcharts are decoded in Table 1.
`
`TABLE 1. PRINCIPAL TAPE AssiGNMEN'rs
`
`Tape I\~wnber
`
`A2
`
`A3
`A5
`A6
`A7
`A8
`B1
`B2
`135
`
`B6
`
`Function
`input
`text, manual
`
`relevance
`
`Input parameters,
`judgments
`Output print tape for later printing
`Partially assembled concept vectors
`Preprocessed document vectors
`Merged concept vectors
`(preanalyzed and new)
`Input to and output from syntax analyzer
`'Vords not found in dictionary
`Correlations between document and request vectors
`Library
`tape
`ineluding thesaurus, hierarchy and
`phrase dictionaries
`Scratch tape and merged concept vectors
`
`3. Input Specifications
`A typical analysis is best described by using an infor(cid:173)
`mation search actually run on the computer as an illus(cid:173)
`tration. Figure 3 shows the introduction of a new document
`;>~t. differential equations (code: DIFFERNTL EQ).
`lln~ document serves as search request, and is compared
`agamst a previously stored collection of 40.5 document
`abstracts.z
`The complete doeumenL is printed at the top of Fio·me 3.
`I
`""
`Th
`e resu ts of the thesaurus lookup, shown in the second
`part of
`that one word stem
`the
`figure,
`indicate
`(RESPECT), occurring in sentence 2, word 14, of the docu(cid:173)
`~ner~t could not be found in the st.em dictionary. This word
`lS effectively ip;nored in the remainder of the process.
`Before a run is undertaken, instruct-ions are given to the
`SL~pervisor concerning the processing mode and the type
`of output desired. This infonna(.ion is shown in short
`f~n·m in the third part of Figure :~. A longer, decoded ver(cid:173)
`Sion of a set of processing instruetions is ineluded in Figure
`4, deseribed later in the report. Following the processing
`
`2 The stored <'olli~etion ('OttRists of the 405 abst.raets of doeu(cid:173)
`~r:euts in the computer literature published during 195!) in the
`'1 ransactions of the flU~ on Eli!clronic Computers. The document
`abstraets arc numbered from 1 to 405 for identificatimL
`
`Volume II I Number 6 / June, 1965
`
`~nstructions, new documents may be introduced from the
`mput tape, and previously available documents are proc(cid:173)
`essed from a separate data tape. This is done under con(cid:173)
`tr:ol of a special instruction set recognized by the super(cid:173)
`VIsor.
`The instructions listed in the lower part of Figure 3
`receive the following interpretation:
`
`identifier (DIF(cid:173)
`The. document whose 12-character
`FERNTL I~q) follows, appears next on the input tape
`in binary form.
`The designated document is identified as a request for
`later rnaLching with other documents.
`The current time is printed.
`The supervisor switches from the input tape to the data
`tape and treats this tape as if it were mwther input.
`A comment follows next.
`The document file is introduced starting with document
`1 (first five documents shown in Figure 3).
`
`.'I'IME
`*TAPE
`
`*NOTE
`*LIST
`
`A typical processing record is reproduced in decoded
`form in Figure 4. It may be noted that in the example
`shown, version 2 of the regular (Harris) thesaurus is used
`to normalize the vocabulary. No statistical processing is
`done, but a syntactic analysis is performed instead, and
`the syntactic phrases detected by matching the incoming
`sentences with the criterion tree dictionary arc weighted
`3 to 1 compared with ordinary (nonphrase) concepts.
`Concepts derived from the titles of documents are weighted
`equally with all other concepts. The "cosine" funetion is
`used to correlate the analyzed request with the document
`identifications, and all documents whose correlation eo(cid:173)
`efficient exceeds 0.:3.1) are printed out as answers.
`
`4. Information Analysis
`
`As an example of the kind of information analysis
`included in the SMART system, Figure 5 shows an excerpt
`of the output print produced by the syntactic matching
`process. The first sentence of the request DIFFERNTL
`EQ is processed: the sentence structure diagram produced
`by the syntactic: analysis of the sentence is shown at the
`top of Figure 5. The format reflects the syntac:tie depend(cid:173)
`ency structure of the sentence, and is produced by the
`Kuno-Oettinger syntactic analyzer incorporated into the
`SMART system [4, 5].
`The matching process between the syntactically ana(cid:173)
`lyzed sentence and the set of criterion trees ineluded in the
`syntactic-phrase dictionary is illustrated in the center
`part of Figure 5. H is seen that the combination of sen(cid:173)
`tence nodes 12 and 14, corresponding to concept numbers
`181 and 274, or equivalently to the words "equations"
`aud "differential", respectively, match a criterion tree
`labeled DIFEQU with serial 47. Similarly, sentence nodes
`~) and 7 in combination mateh the tree NUMERI with
`serial 87. The concept numbers corresponding to these
`two trees are therefore attached to the search request on
`"differential equations" which is being analyzed.
`The bottom part, of Figure 5 shows this last operation.
`Specifically, concept 274 originally introduced by thesaurus
`lookup through the word "differential", and concept 181
`
`Communications of the AClVI
`
`393
`
`
`
`corresponding to "equations", arc replaced by the new
`phrase concept 379 corTesponding to "differential equa(cid:173)
`tions". Concepts 13 ("number") and 11 ("analysis")
`are replaced by 37 5 ("numerical analysis"). These two
`phrase concepts· are weighted three times more heavily
`than were the original component concepts.
`In order to obtain a match between a criterion tree and a
`sentence part, it is necessary to compare the concept num(cid:173)
`bers attached to the components, the syntactic indicators
`and the syntactic dependency structures. A graph match(cid:173)
`ing process is used for this purpose which has previously
`been described in detail [6, 7].
`Further aspects of the information analysis procedures
`included in this system_ are shown in Figures 6 and 7.
`Figure 6 is a composite print showing the index1!ng products
`obtained for two documents (the original search request,
`and document number 1 from the abstract collection) by
`each of three different analysis methods. These automati(cid:173)
`cally generated indexing products arc effectively equivalent
`to the manually assigned keyword sets which are common
`in ordinary coordinate-indexing systems; it is the compari(cid:173)
`son between the indexing products representing doeu(cid:173)
`ments and search requests which is used to obtain the
`similarity coefficients needed for retrieval. In the SMART
`system, hundreds of different indexing products may be
`obtained by suitable alterations of the analysis procedures.
`The three analysis methods illustrated in Figure 6 cor(cid:173)
`respond l'espcctively to the usc of the regular thesaurus
`to obtain concept numbers from word stems, the use of the
`"null" thesaurus (that is, of original word sterns with
`weights), and the usc of the regular thesaurus followed by
`a statistical phrase detection. In the center section of
`Figure 6, up to six alphabetic characters are printed for
`each word stem together with a weight (multiplied by 12
`for internal reasons). For the two analysis procedures
`which make use of a thesaurus lookup, the word stems
`are replaced by concept numbers and the remainder of the
`six-character field is filled out by mnemonic characters to
`provide a clue to the significance of the corresponding
`concept.
`A eornparison of the index vectors produced by the
`various methods shows that new concepts may be intro(cid:173)
`duced by switching to a new analysis procedure; further(cid:173)
`more, eonccpts common to two or more of the indexing
`systems may nevertheless be weighted differently in each
`one. Consider, for example, the index vectors for the docu(cid:173)
`ment DI.FF'ERNTL EQ. Using the null thesaurus, the
`stem DIFFEREN (listed as DIFFER and obtained from
`the word "differential") is assigned a weight of 24. The
`corresponding concept number 274 obtained from the
`thesaurus is listed with a weight of 36. Using the statistical
`phrase matcher, a new concept ~H9 is created to represent
`the phrase "differential equations", with a weight of 72.
`'I'hus, the shift from word stems, to thesaurus, to statistical
`phrases, assigns an increasingly larger importance to the
`notion of "differential equations".
`Figure 7 shows an example of a change produced in the
`
`index veetor for DIFFERNTL EQ by using the himarc:hy.
`The ve<:tors are shown both before and after expansion; a
`comparison indicates that new eoueeptK are introduced
`through Lhe hicmrehy, and that some existing concepts
`receive a change in emphasis. l 1'or example, concept 110
`is assigned [L weight of 12 before expansion, and of ;)(i after(cid:173)
`ward8.
`The usefulness of the various indexing produets, and
`therefore of the analysis procedures which give rise to
`them, must be determined by an evaluation procedure.
`This is further discussed in Section 6 of this report.
`
`.5. Information Uctrieval
`Following the information analysis, the index vectors
`derived from documents and search requests arc compared
`in order to obtain for each document a coefficient of simi(cid:173)
`larity with each search request. Figure 8 shows, as an ex(cid:173)
`ample, the request-document correlations for the regular
`t;hesaurus run, obtained by using the "cosine" function to
`correlate the request DIFFERNTL EQ with the stored
`document collection. The output of Figure 8 is presented
`in three different forms: first in increasing document order,
`next in decreasing correlation order, and finally as a type
`of histogram.. The histogram shows, for example, that
`exactly one document had a correlation with the request
`equal to or greater than 0.60, 8 documents exceeded 0.40,
`60 documents exceeded 0.20, 165 documents exceeded
`0.10, and so on.
`The correlations shown in Figure 8 are produced for
`analysis purposes and are not intended to be given to the
`user. The user receives his "answers" in one of two forms,
`shown in Figures 9 and 10. In either case, the documents
`which receive the highest correlation with the search
`request are listed in decreasing correlation order down to
`the cutoff specified in the processing instruetions (sec
`Figure 4). The shorter version of the output provides one
`line per document, including only the document identifica(cid:173)
`tion, correlation coefficient and document tiLle; the ex(cid:173)
`ample of Figure 9 shows the output obtained for the "titles
`only" run, where the text of each document is disregarded
`and titles only are used in the analysis.
`The more complete form of the output is shown in Fig(cid:173)
`ure 10. Here the search request is reproduced at the top
`of the page, as in Figure 9; however, complete bibliographic
`citations are given for each document. The regular thesau(cid:173)
`rus was used to obtain the output of Figure 10; this run
`was previously illustrated also in Figure 8.
`A very condensed form of the output, consisting for
`each document listed of only one twelve-character identi(cid:173)
`fier, ineluding document number and the first few char(cid:173)
`acters of the title, is also produced. Figure 11 shows ex(cid:173)
`amples of the short-form answers obtained with a variety
`of processing methods for the request on differential equa(cid:173)
`tions. The documents are again listed in decreasing cor(cid:173)
`relation order, but correlation coefficients are omitted.
`The cutoff, which may be different from one processing
`method to the other, applies as before.
`
`394
`
`Communications of the ACM
`
`Volume 8 / Number 6 / June, 1965
`
`
`
`E'iGLl$H Ti::'XT Po{OVlOEO F(l't DOCUMENT OIFFERNTL EO
`
`SEPTEMBER 28r 1964
`
`PAGE 345
`
`< :..
`= a e
`
`~
`
`------2.
`
`"' -~
`
`e-.
`
`l.lVE ALGORITHMS USEFUl FOR THE NUMERICAL SOlUTION OF ORDINARY OIFFER(cid:173)
`fQUATIONS ANC PARTIAl DfFFEREI'.ITJAl.. EQUATIONS ON OIGJT,!Il
`I_,'H!AL.
`EVALUATE TH!: VAR WUS
`(TRY
`IN Tf:GRAT I 01'1. PROCEDURES
`CDMPUH:RS
`a
`KUi'lGt'-KUffAr i'!lLNE-S METHOD) WITH RESPECT TO ACCURACY, STABit..ITYr AND
`SPEED •
`
`WUROS, lN DOCUMENT OIFHRNTL EO NOT FOUND IN THESAURUS
`
`WORD NOT FOUND
`
`KIND
`
`LOC
`
`NUflll
`
`SENTENCE AND WORD NUMBERS
`
`l
`8
`STEM
`KESPECT
`JlJB COMPLEft. PRINT A6 UNOER PROGRAM CONTROL
`
`2r
`
`14
`
`INSTRUCTION CARDS TO SJolART SUPERVISOR
`
`------?
`~
`>C e-.
`
`;.tl
`
`ANSWER REQUESTS, FORMAT JUJ180, Pft SCORES YES, THESHR 2t MAXCON 511,
`REQUEST CCRR.ElATIO~St CORM03 COS, CUTOF3 3500, TEXTS PROCESSED
`
`•LIST D!FHRNTL EO NUMERICAL DIGITAl SOLN OF DIFH:RfNTIAL EQUATIONS
`
`•LIKE D1FFERNTL ~C
`
`•T JME
`72.1 MINUTES. YOU WILL REMEMBER THAT START-OF-JOB
`TtiE CURRENT TIME
`IS
`WAS AT
`69.9 MINUTES, WH(LE THE CLOCK READ
`71.8 WHt:N EXECUTION BEGAN.
`
`•TAPf
`
`•NOTE THIS IS THE
`
`t'ARRlS THESAURUS
`
`tf'ERSION TWO LOOKUP
`
`•LIST lA COM,PUTER CRIENTt:D TOWARD SPATIAL PROBLEMS •
`
`•LtST 2MICR0-PROGRAP!MING
`
`•UST 3THE RClE OF
`
`lARGE MEMORIES IN SCIENTIFIC COMMUNICATIONS
`
`•LIST 4A NEW CLASS OF OIGrTAt DIVISION METHODS
`
`•LIST SANALYSIS Of SHIFT REGISTE~ COUNTERS •
`
`FIG. 3. Typical SMART processing instructions
`
`ORIGINAL
`REQUEST
`
`THESAURUS
`LOOK UP
`
`SHORT FORM
`} PROCESSING
`INSTRUCTIONS
`
`} SEARCH
`INSTRUCTION
`
`} TIME
`INFORMATION
`
`PARTIAL
`LIST OF
`DOCUMENT
`TITLES
`
`IHESE ARE THE TEXT,
`
`&lVE
`ALGrJI.tl THMS
`USEFUL
`FUR
`THf.
`NUMtRICAl
`SOLUT 1 ON
`Of
`ORDINARY
`DlFfb~ENTlAL
`tJUATIUN$
`A">C
`PARTIAL
`DiFFE:Kl:NTlAL
`Er~U!trlO~S
`
`"·' DIG l TAl
`
`CO,..PUTt::ri.S
`
`lq64
`
`PAGE 347
`
`SYNTACTIC
`ANALYSIS
`OUTPUT
`
`SEP'fEMBER 28 1
`NODE NUMBERS, AND STRINGS OF SENTENCE NOa CCOOOl
`w
`30
`30A
`~OA.PR
`30APOA
`30APOA
`30APO
`30A.POPR
`30APOPOA
`3QAPOPOA
`30APOPO
`30AP+
`30APllA
`30APOA
`30APO
`30APOPR
`lOAPUPOA
`lOAPOPO
`
`• 8
`
`q
`7
`11
`13
`14
`12
`5
`16
`17
`15
`19
`21
`20
`
`NO(.E CrlRRF5PO~OENCES dF- TREE RITH INDEX =DlfEQU, SE:RtAL NOa
`StNH-NO::
`fRft.
`l
`12 - KE:Y
`2
`14
`
`~t-7, AND OUTPUT CONCEPT NOS
`
`.~dCE CORRt:.SPU~DENCES nF
`I Rt t
`St:NTLNCE
`q
`2
`1
`7 - KEY
`
`fREt: WITH
`
`lNOEX =NUf'IERt, SERIAL NOa
`
`87, AND OUTPUT CONCEPT NOS
`
`ft~l:- CRlffk.l,-JN R8UTINE HAS PROCESSlO
`
`l S£NTE,...Ct:S, HAVING
`
`2 MATCHES OF
`
`2 DISTINCT INDICES~
`
`TRt:FS OEHCrEC SYNTACTICAllY'
`
`IN DOCUMENT OlFFERNTl EO
`
`Trl.tf
`
`CU\OCE'PT
`
`OCCURREC
`
`CO~PONI::NT CONCEPTS
`
`OIFt:QU
`NUMtRl
`
`H'IOlF
`37S~ll~
`
`2740lF l81QUA
`13CALC llAL YS
`
`FIG. 5. Syntactic phrase matching
`
`l
`
`SYNTACTIC
`TREE
`OUTPUT
`
`UCCURRtNCES OF CONCEPTS AND PHRASES
`
`IN OOCUI'IENTS
`
`SEPTEMBER 26, 1964
`
`DOCUMb'lT
`
`CUNCt:PT 1 CCCURS
`
`PAGE 17
`
`OlfFERNTl EO
`
`4EX,1CT 12
`llOllUT 12
`269Ell
`4
`4285 TB
`4
`
`8ALGUR 12
`143UT1 12
`2740IF 36
`505APP 24
`
`13CALC 1 a
`l16SOL 12
`356Vfl 12
`
`llEVAL
`6
`179STO 12
`357YAW
`4
`
`920JGI 12
`181QUA 24
`384HG 12
`
`21NPUT
`3161 T
`57DSCB 15
`87fNBl 12
`llOAUT
`.36
`143UT1 12
`l62RilF
`6
`182SAV
`4
`276GEM 18
`6
`346JET
`
`5LCCAT 12
`32REQU
`3
`59AMNT 24
`930Rl:R 10
`ll20Pt.:
`6
`146J0t'l 18
`163EAS 12
`187DIR 12
`327AST 12
`1501Fll
`6
`
`1CALPH 12
`41MCHC
`8
`72CXEC
`l06NOU
`l l ~A.UT a
`l47SYS 12
`1680RC
`4
`21JOUT
`4
`332SEf 12
`419GE11
`6
`
`1SBASE
`47CHNG
`77LIST
`1070G"'
`121MEM
`l49POG
`l76$0l
`212SIZ
`33BMCH
`SOJ.ORD
`
`4
`
`36
`12
`12
`8
`4
`
`3~ !~6~~ 1!
`
`16BASC
`SlOAT A
`8311AP
`
`6
`
`158REL 12
`178SYM 18
`21600M 12
`3
`340LET
`SDBACT
`b
`
`REGULAR
`THESAURUS
`
`DIGIT 121
`
`MfTHOO 12
`RUNGE- 12
`VARIE
`12
`
`INSTRUCTION CARDS TO SMART SUPERVISOR
`
`SEPTEMBER 28, l'il64
`
`PAGE H5
`
`lA COMPUTER
`
`2
`llBRARY USED WAS VERSION NUMBER
`THESAURUS 01 SCR IMINATES NOT MORE THAN
`ENGLISH TEXTS WERE PRINTED DURING lOOKUP.
`WORDS NOT fOUND WERE PR INTEO.
`
`511
`
`OF THE HARRIS THESAURUS ..
`CONCEPTS.
`
`STATISTICAL INTRA-DOCUMENT PROCESSING --
`
`STATISTICAl PHRASES
`
`NONE.
`
`NONE.
`
`CRITERION TREES --
`A s·YNTACT1C ANALYSIS NAS PRINTED FOR EACH SENTENCE.
`CRITERION TREES DETECTED WERE PRINTED.
`NODE CORRESPONDENCES OF TREES TO SENTENCES WERE PRINTED.
`SYNTACTIC PHRASES HAD WEIGHT OF
`3 .. 0
`
`Tilt ABO•E OATA WAS SUPPliED BY THE PROGRAMMER AND MAY BE INCORRECT.
`
`THE FOllOWI"G DATA IS FROM INSTRUCTIONS FOR THIS RUN WHICH OEFINITElY WERE EXECUTED.
`
`TITLES WERE GIVEN A WEIGHJ·of
`
`1.0
`
`DOCUMENT CORRELATION --
`REQUEST CORRELATIOKS, WERE PRINTED.
`CORRElAHOH MODE USED WAS
`COSINE.
`CUTOFF WAS 0. 3500
`
`HIERARCHY
`
`NONE.
`
`CONCEPT PROCESSING --
`
`NONE.
`
`REQUE·STS WERE ANSWERED.
`AUTD-EVALUATlON WAS REQUESTED AND WILL BE ATTEMPTED.
`
`(")
`0
`
`~ .... ;;·
`Ill ... ;·
`= "' g,
`
`;
`> (")
`~ ....
`
`~
`
`'-C _, ....
`
`DIFFERNTl EQ
`
`1A COMPUTER
`
`DIFFEKNTL EG
`
`LA COMPUTER
`
`t.CCUR
`tQU
`NUMER
`SOLUT
`
`12 ALGO~I 12
`24
`f:VAUJ
`1?
`12 OROPol
`12
`12
`SPE\:t::
`12
`
`COHPUT 12 DIFFER 24
`12
`GP/E
`INTEGR 12
`PARTI
`12 PROCEO 12
`STA6ll 12 USL
`12
`
`BI\S
`12
`OlREr.T 12
`GIVE
`12
`MACHIN 24
`POS
`12
`SCANN
`12
`TECH~I 12
`
`CHARli.C 12
`P~ABLE 12:
`HANDLE 12
`OPER
`12
`11
`PO.S.S
`SII'!PLE 12
`TOWARD 12
`
`COMPUT
`ESTIM
`I LLUST
`ORO
`PROBLE
`SIZE
`TRANSf
`
`36 OESCRI 12
`12 EXPLA I 12
`12
`INDEPE 12
`12 ORIENT 12
`'\6 PROGRA 36
`24 STORE
`12
`12 USING
`12
`
`81\lGUR 12
`4EXACT 12
`l43UT J 12
`lLOAUT 12
`269H I
`4 2740IF '36
`~ 3R4THi 12
`
`13CALC 18
`l76SOL 12
`356VEL 11
`4
`42BSTB
`
`6
`71l:VAL
`l79STO 12
`35 7YAW
`4
`505APP 24
`
`2 INPUT
`166ASC
`530ATA
`83HAP
`108LOO
`130i<!EA
`1 58;{El
`178S.Y'M
`2125 IZ
`302LOO
`l46JET
`
`4
`
`1 OAlPH 12
`5LOCAT 12
`14COOR 72
`41HCHO
`8
`32RECU
`3
`"3
`3HHT
`72E-XE:.C
`6
`'>9AMNT 24
`'i 70SCB 15
`87ENijl 12 <noRD~ 10
`6
`l06NQU
`6
`8
`12 11JAUT 36 1120PE
`6
`ll9AUT
`4
`143UTI 12
`146JOFi 18
`147SYS 12
`12
`l62RJf
`16lEAS 12
`l660RD 4
`b
`18
`182SAV
`4
`lB7DIR 12
`20UDA- 72
`12
`21600M 12 ~
`276GEM 1 B
`f2
`"327ASJ 12 ~
`13BMCH
`8
`419GEM
`6
`6
`35 ... 1FtJ
`6
`5010RD
`4
`
`FIG. 4. Typical processing record (long form)
`
`FIG. fi. Document vectors generated by three analysis methods
`
`12
`PLANE
`RECCGN 12
`STRUCT 12
`WRI TT
`12
`
`9201Gl 12
`18 lQUA 24,
`375NUM 36
`
`l STAT
`6 STAT.
`! PHRASES[ PHRASE
`LOOK·UP
`•
`
`DESIGN 12 j NULL
`
`FORM
`
`12
`
`INFORM 12
`
`THESAURUS
`
`
`
`~· :-.
`
`""\
`
`~
`
`X.
`
`~ ....
`::
`....,
`... ....
`(') .,
`~ ,...
`
`()
`
`(r•A..\1,;< '> PJ OlJClJMF;,f-CONClPT Ml\fR[X
`
`lHIWUGtl USI: OF HlfKARCHY
`
`Sf.PHHtHR 2'8, 196-4
`
`Ali~WfH.S Ttl RlOUI-STS fOR DOCUHHITS ON Sf'ECIFlED TOPICS
`
`SEPTEMBER 2 6, 1964
`
`PAGE 169
`
`(XPA!\j:SlON
`(tJNCI.-PT-I•HKA)!:IRUGHTIVf:CfOR KEf-OR!:
`l'!ffl,{'I/Jl
`tQ
`4l~IIC!{ l/1 81\LGORI 121
`lJCAU .. I HO 7LtVAL(
`bl 9201GI( 121
`l76$0tl l2i l79STOI ll, 18lQUAI Z-4J 2b9Elll
`l'dUff( 121
`