throbber
H. P. Luhn
`
`A Statistical Approach to Mechanized Encoding
`and Searching of Literary Information*
`
`Abstract: Written communication of Ideas is carried out on the basis of statistical probability in that a writer
`chooses that level of subject specificity and that combination of words which he feels will con.vey the most
`meaning. Since this process varies among individuals and since similar ideas are therefore relayed at differ(cid:173)
`ent levels of specificity and by means of different words, the problem of literature searching by machines
`still presents major difficulties. A statistical approach to this pr.oblem will be outlined and the various steps
`of a system based on this approach will be described. Steps incl11de the statistical analysis of a collectiorl'of
`documents in a field of interest, the establishment of a set of "notions" and the vocab11lary by which. they
`are expressed, the compilation of a thesa11rus-type dictionary and index, the a11tomatic encoding of docu(cid:173)
`ments by machine with the aid of such a dictionary, the encoding of topological notations (such as branched
`str~Jcturesl, the recording of the coded information, the establishment of a searching pattern for finding
`pertinent information, and the programming of appropriate machines to carry out a search.
`
`1. Introduction
`The essential purpose of literature searching is to :find
`those documents within a collection which have a bearing
`on a given topic. Many of the systems and devices, such
`as classifications and subject-heading lists, that have been
`developed in the past to solve the problems encountered
`in this searching process are proving inadequate. The
`need for new solutions is at present being intensified by
`the rapid growth of literature and the demand for higher
`levels of searching efficiency.
`Specialists in the literature searching field are optimis(cid:173)
`tic about the future application of powerful electronic
`devices in obtaining more satisfactory results. A success(cid:173)
`ful mechanical solution is unlikely, however, if such
`modern devices are to be viewed merely as agents for
`accelerating systems heretofore fitted to human capabili(cid:173)
`ties. The ultimate benefits of· mechanization will be
`realized only if the characteristics of machines are better
`understood and systems are developed which exploit
`these characteristics to the fullest. Rather than subtilize
`the artful classificatory schemes now in use, new systems
`would replace them in large part by mechanical routines
`based on rather elementary reasoning.
`The major technical effort involved in substituting
`mechanical for intellectual means must, of course, be
`justified by the improved results obtained. However, if
`partial mechanica1 substitution for human effort cannot
`
`•Presentt."d ::at Aml"ricnn Chemical Socir"ty merting in Miami, April 8, 1957.
`
`be found in automation, there is a real danger that the
`demand for professiona1 talent will become too great to
`fill. In view of the foreseeable strain, the most efficient
`use of talent will have to be made even by automatic
`systems. The operating requirements of these systems
`will, above all, have to be well adapted to the degree of
`education and experience of generally available personnel.
`Language difficulties, too, will have to be met. The
`problems stemming from the mere volumes of 1iterature
`to be searched are being continually. aggravated by the
`increasing accession of foreign-language documents that
`rate consideration on an equal level with domestic
`material. To be of real value, future automatic systems
`will have to provide a workable means of overcoming
`the language barrier.
`
`• Complexity levels of information systems
`The general terms in which the problem of literature
`searching has been treated might indicate the possibility
`of a general, or universal, solution. It would be unreal(cid:173)
`istic to assume that such is practical or desirable. It is
`quite important to establish some differentiating criteria
`by which information reference arrays may be distin(cid:173)
`guished and graded as to their make-up, objectives, and
`uses. It will then he possible to recognize better the exist(cid:173)
`ence of different levels and the necessity of applying
`appropriately different techniques to their mechanization.
`The following list of six information systems in order of
`
`309
`
`EXHIBIT 2036
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`IBM JOURNAL • OCTOBER 1957
`
`

`

`increasing complexity, insofar as mechanical solution is
`concerned, may he of use in indicating the differences in
`information levels:
`1. Ready reference look-up systems of facts such as
`indexes, dictionaries, and parts catalogues.
`2. Systems of limited and narrowly defined categories
`especially where, as in lists of specifications, categories
`are repetitive. (Personnel histories, medical case his(cid:173)
`tories, etc.)
`3, Systems of the kind found commonly in chemistry that
`deal with inventories of uniquely definable structures
`and their interrelations and transformations.
`4. Systems of mathematics, logic, and law that are based
`on disciplined concepts of human intellect.
`5. Systems dealing with
`the exploitation of natural
`phenomena and objects, as in the applied sciences and
`technology.
`6. Systems, of which pme fiction is the extreme, dealing
`with unrestricted association of human notions.
`While there may be criteria other than those listed
`above according to which the spectrum of information
`can be graduated, the important fact to recognize here is
`that there are radically different classes of information. It
`therefore makes little sense to discuss a literature search(cid:173)
`ing system without also identifying the portion of the
`spectrum to which the system is to be applied.
`
`• Distribution of human effort
`
`Since the graduation of the above list ranges from explicit
`factual listings to the abstract concepts of creative writ(cid:173)
`ing, it seems unavoidable that the efficiency of recognition
`of desired information will decrease in this direction. The
`various systems might therefore be characterized by their
`recognition potential and the amount and distribution of
`human and machine effort required. It seems to be an
`inescapable fact that the less disciplined the language, the
`greater the human effort that must be expended some(cid:173)
`where in the system.
`There are four distinct phases of human effort in(cid:173)
`volved:
`1. The design, setup, and maintenance of the system
`proper.
`2. The interpretation and introduction of information
`into the system.
`3. The programming of wanted information for mechan(cid:173)
`ical recognition.
`4. The interpretation of selected records to determine
`whether they are relevant to the wanted information.
`To arrive at an optimum process for a given informa(cid:173)
`tion level, the question of the quality and proportion of
`human effort to be expended at each of these phases
`must be answered.
`
`• Time considerations
`
`The introduction of time as an additional variable will
`change proportions quite considerably. If, for instance,
`any kind of information must be located in a matter of
`minutes, the possible maximum of skilled effort will have
`to be spent at the input phase of the system and in an
`
`310
`
`IBM .JOURNAL" OCTOBER .1957
`
`equal degree on every entry into the system. If, however,
`time requirements are less pressing, input procedures
`that require medium skill and minimum effort may be
`chosen so that the skilled effort can be concentrated at
`the output phase on only a small fraction of the records
`of the collection. In the latter case, the fact that only a
`small fraction of the records of a collection will ever be
`selected should result in a reduction of the overall effort.
`Time may affect a system in another way that makes
`the shift of skilled effort to the output phase more desir(cid:173)
`able. Excessive editing obviously increases the likelihood
`of bias due to current interests, experiences, and points of
`view. In consequence the usefulness of the system will be
`reduced as emphases and interests change. It would there(cid:173)
`fore appear that the less information is classified and
`contracted at the input, the more it will lend itself to
`dynamic interpretation at the output phase.
`
`• A proposed solution hy statistical methods
`
`The following paragraphs will present the basis and
`organization of a literature searching system which util(cid:173)
`izes statistical methods in conjunction with a high degree
`of mechanization. The principles involved represent an
`extension and refinement of those discussed in an earlier
`paper.' Although the specific system described is primar(cid:173)
`ily designed to satisfy the requirements of information
`level 5 of the foregoing list, it may also be found adapt(cid:173)
`able to levels 4 and 6.
`Generally speaking, the proposed system is based on
`what are variously referred to as cross-indexing, multidi(cid:173)
`mensional-indexing, coordinate-indexing, multiple-aspect(cid:173)
`indexing and encoded-abstract techniques. Actual prac(cid:173)
`tices vary from lifting key words of a text by manual
`editing to interpretive analysis by logical formulas of
`well-defined concepts. M. R. Hyslop has given a general
`description and a bibliography of methods using these
`techniques.:~
`
`2. The statistical aspects of communicating ideas
`Communication of ideas by way of words is carried out
`on the basis of statistical probability. We speculate that
`by using certain words we will be able to produce in
`somebody else's mind, a mood and disposition resembling
`our own state of mind which resulted from an actual
`experience or a process of thought. ln order to communi(cid:173)
`cate an idea, we break it down into a series of little ideas,
`i.e. more elementary ideas for which previous and com(cid:173)
`mon experience might have led to an agreement of
`meaning. We extend this process until we feel that we
`have reached a level of conventional notions, a level at
`which communication can he accomplished. This level
`may vary depending on the degree of similarity of com(cid:173)
`mon experiences. The fewer experiences we have in
`common, the more words we must use.
`A picture of this process. if it can be drawn at all,
`might look something like the triangular portion of Fig. 1.
`The process of communicating ideas is dynamic when
`it can be performed by means of the spoken word. ln the
`first place, the addressor can size up the addressee and
`
`

`

`Figure 1 Communication of ideas.
`Breakdown of basic idea into elementary concepts on experiential level common to reader and writer.
`
`adjust the process of subdividing his idea to the level of
`common experience which most probably exists between
`the two. Secondly, guided by the feedback of the ad(cid:173)
`dressee's reactions and questions, the addressor may re(cid:173)
`adjust to a reasonably optimum level or change his
`strategy of composition.
`The process assumes static qualities as soon as ideas
`are expressed in writing. Here the addressor has to make
`certain assumptions as to the make-up of the potential
`addressee and as to which level of common experience
`he should choose. Since the addressor has to rely on some
`kind of indirect feedback, he might therefore be guided
`by the degree to which the written expressions of ideas
`of others has raised the level of common experience
`relative to the concepts he wishes to communicate.
`The most general such guidance is furnished by the
`dictionary. Here the verbal expressions of ideas at a
`given level of common experience are defined in terms
`of verbal expressions at other levels so that a broad
`domain of common experience is assured. Thus, the dic(cid:173)
`tionary is a periodical report to word users on the ideas
`which currently are most often conveyed by the words
`in use. The level at which the lexicographer breaks off
`his reporting may vary and is, of course, dictated by
`
`economical factors. This is so because in the extreme
`he would have to quote substantial parts of the current
`literature to explain the slightest differentiations of ideas.
`It is the task of special dictionaries to bring more remote
`areas of experience into the common domain by explain(cid:173)
`ing ideas at higher levels.
`In writing, the addressor may then take the special
`dictionary of his field of interest as a next approximation
`of a level of common experience on which to communi(cid:173)
`cate with the addressee. However, since the lexicographer
`can never be up to date, there still remains a gap which
`the addressor will have to fill to permit the addressee to
`adjust himself to the desired level. This he may do by
`referring the addressee, by means of a bibliography, to
`that portion of the literature which the lexicographer has
`not as yet analyzed.
`It may be assumed that the means and procedures just
`mentioned permit communication to be accomplished in
`a satisfactory manner. If it is possible to establish a level
`of common experience, it seems to follow that there is
`also a common denominator for ideas between two or
`more individuals. Thus the statistical probability of com(cid:173)
`binations of similar ideas being similarly interpreted
`must be very high.
`
`311
`
`IBM JOURNAL • OCTOBER 1957
`
`

`

`If it were possible to recognize idea building blocks
`irrespective of the words used to evoke them, these
`building blocks might be considered the elements of a
`syntax of notions. Communication could then be carried
`out by relaying these notions by means of agreed-upon
`symbols. Since these symbols would be independent of
`style and language, they would help to overcome lan(cid:173)
`guage barriers. A symbol system of this kind would be
`most useful in facilitating the process of information
`recognition by automatic means.
`
`3. Possible building blocks for a statistical system
`The lack of uniformity of structure and the arbitrariness
`of word usage make literature an unwieldy subject for
`automation. Its information content must first be repre(cid:173)
`sented and organized into a form that can be operated
`on by a machine, for only then can the degree of
`similarity between any two records be automatically
`determined. The most efficient means of transforming
`information for machine interpretation would be those
`that permit the application of a minimum of logical
`machine instructions for access to relevant information.
`The very nature of free-style renderings of informa(cid:173)
`tion seems to preclude any system based on precise rela(cid:173)
`tionships and values, such as has been developed in the
`field of mathematics. Only by treatment of this problem
`as a statistical proposition is a systematic approach possi(cid:173)
`ble. The objectives of a system based on this proposition
`would be first to transform information into arrays of
`normalized idea building blocks and then to discover
`similarities in the respective building-block patterns of
`these arrays by means of a statistical analysis. It could be
`reasonably assumed that the more closely two arrays are
`matched, the greater the probability that the records they
`represent contain similar information.
`It is true that the principle of pattern matching has
`been applied previously in searching systems.o The em(cid:173)
`phasis here, however, is on the use of notions as a basis
`for pattern derivation. Where such non-precise elements
`can be used as building blocks, the possibility of creating
`a practical information retrieval system is substantially
`increased.
`In the process of communicating ideas, an author
`pursues a certain plan of organizing his ideas. The ex(cid:173)
`ternal evidence of such a plan is the grouping of his
`ideas into chapters, paragraphs, and sentences. Figure 1
`illustrates how this organization may come about. No(cid:173)
`tions are most closely and specifically related to each
`other within a sentence. One sentence immediately fol(cid:173)
`lowing another might either be related in its entirety to
`previous notions or serve to relate these notions to new
`ones. The same might be said of succeeding sentences.
`However, a significant new argument is usually intro(cid:173)
`duced in a new paragraph. A still more decisive change
`of aspects might be denoted by the start of a new chapter.
`This conscious division by the author furnishes one
`key to the relatedness of his notions, which although not
`always accurate, may generally be accepted as a signifi(cid:173)
`cant and meaningful element of the information he is
`
`attempting to relay. We may therefore consider several
`degrees of relationship; namely, the first-order relation(cid:173)
`ship of notions within a sentence, a second-order rela(cid:173)
`tionship between sentences within a paragraph and their
`respective notions, a third-order relationship between
`paragraphs within a chapter, and still higher orders for
`larger divisions.
`The sum of the relationships and divisions, as far as
`the author is concerned, is the entirety of his message or
`paper. However, since it is desirable to make the paper
`or document comparable with other similar documents,
`a still higher level of grouping is indicated, and this is the
`level of common experience previously discussed. It was
`argued that a level, or field, of common experience was a
`requirement for communication. It follows that the more
`specific the field, the closer will be the agreement among
`the notions used in the mental process of people associ(cid:173)
`ated with that field. It therefore seems important and
`helpful to recognize these fields and to establish them as
`a next order or level of division.
`
`• Notions and "technese"
`
`Communications at this specialized level are made as
`though in a foreign tongue, in that people in various
`specific fields each speak a "native" technical language.
`However, since notions are here to be considered inde(cid:173)
`pendently of their implementation by words, we are
`referring to the syntax of notions of the specialist. This
`syntax of notions might be called technese, ior lack of a
`suitable existing term. We may talk, for example, about
`the technese of the chemist, the lawyer, or the electrical
`engineer.
`For each kind of technese, a notion may be expressed
`in the words of any desired language. The association of
`words and notions will of course be typical of a given
`field, and the more specialized the field, the more com(cid:173)
`plex may be the notion expressed in a single word. It
`must be emphasized that language per se remains inci(cid:173)
`dental. The notions, which are the essential elements in
`all technese, are assumed to be independent of any
`language.
`After individual special fields have been established,
`a final grouping would be required to embrace the totality
`of special fields. The notions to be applied at this level
`would necessarily be more general and the process of
`matching would be carried out by way of appropriately
`broader notions.
`In addition to the hierarchical organization just de(cid:173)
`scribed, there is another kind of division which should
`be introduced to facilitate the adjustment of a system to
`the constant expansion of knowledge and the associated
`adaptations and changes of language. This may be done
`by starting a new division or "age class" of documents at
`given intervals, as time progresses. For each new interval
`the system would be updated to reflect, for the ensuing
`period, the changes during the preceding period. The
`process of searching would then be performed first for
`the current period, then for the preceding period and so
`on, and to the extent dictated by the results obtained.
`
`312
`
`IBM JOURNAL • OCTOBER 1957
`
`

`

`The use of age classes seems to be the only method by
`which a collection may be divided into mutually exclu(cid:173)
`sive sections. The searching of a collection in retrogres(cid:173)
`sive steps or by predetermined age groups is bound to
`shorten the average time of a search. It also appears
`useful in many instances to search the most recent litera(cid:173)
`ture first.
`The above system of notions and their degree of re(cid:173)
`latedness is not necessarily the sole system by which
`comparable patterns may be derived. Certain classes of
`information elements such as names or symbolism of
`structure, e.g., chemical structures, flow diagrams, circuit
`diagrams, road maps, etc., might demand rather specific
`identifications. The notations used to represent these
`elements would assume the same status as that accorded
`to notions.
`
`4. The limitations of serial communication
`
`The process of communicating notions by means of
`words can only be performed in serial fashion. In order
`to overcome this basic limitation, intricate devices have
`to be incorporated into a language to instruct the ad(cid:173)
`dressee how to relate notions in ways other than those
`given by the linear sequence of words. By means of
`additional words, the addressee is told how to construct
`a mental image of the multidimensional conceptions of
`the idea being communicated. Since these instructions
`may become rather involved and subject to misinterpre(cid:173)
`tation, it is advantageous to utilize pictorial presentations.
`When thus supplemented, serial language lends itself
`much more readily to the investigation and description
`of multidimensional relationships.
`This limitation of serial communication and its asso(cid:173)
`ciated problems also inheres in data-processing machines.
`Communication is carried out on a serial basis in the
`same sense as among humans. For pictorial representa(cid:173)
`tions, the machine is at a disadvantage, at least at the
`present stage of the art. The best that can be done is to
`instruct the machine to create a multidimensional array
`and to further instruct the machine to analyze all the
`many relationships contained in this array. For a ma(cid:173)
`chine to do this, it must have an internal memory where
`it can store the representation and analyze it over and
`over again in accordance with a specific program.
`The organization and recording of information capa(cid:173)
`ble of being analyzed in the above fashion, as well as the
`development of programs directing the machine to do
`this, is a very exacting procedure. The machine, having
`only logic to its credit, cannot function unless informa(cid:173)
`tion and instructions are given it in strictly logical lan(cid:173)
`guage. A system in which relationships between notions
`were to be given and explicitly recognized would there(cid:173)
`fore be dependent upon a major intellectual effort for
`interpreting meanings and relationships and translating
`them into unique notations. As with current classificatory
`schemes, this effort would have to be repeated for each
`new document. When it came to the searching operation,
`inqumes would have to be similarly interpreted and
`encoded. The machine would then have to recognize
`
`similarity of representations through an iterative process
`of identifying and comparing each of the specific rela(cid:173)
`tionships given.
`The question arises whether similarity of multidimen(cid:173)
`sional representations might not be established by more
`direct methods without reliance on an internal memory
`machine. It might be argued that, while it is true that a
`given number of various notional or pictorial elements
`could theoretically be related in countless patterns, only
`a very limited number of these patterns represent mean(cid:173)
`ingful information. Moreover, each additional pattern,
`in association, further limits the number of meaningful
`interpretations applicable in the particular case.
`On these grounds it would be possible to disregard
`specific and explicit relationships and merely investigate
`whether certain elements happen to be associated and
`to what degree. Such a substitution of statistical for
`critical criteria would facilitate the establishment of sim(cid:173)
`ilarity by matching. The more two representations agreed
`in given elements and their distribution, the higher would
`be the probability of their representing similar informa(cid:173)
`tion. The actual matching process would be performed
`through a serial scanning of records. The machine used
`need not be capable of temporarily storing blocks of
`information in an internal memory.
`As will be seen, the type of scanning suggested is
`applicable to the statistical searching system presented in
`the following sections. The system concerns itself mainly
`with information represented by the written word. In the
`above discussion, however, reference was made to pic(cid:173)
`torial representations without indicating how these might
`be organized either for the purpose of exhaustive analysis
`by machines with internal memory or scanning and
`matching on a statistical basis. By way of example, the
`reader will find the first kind of system presented in a
`paper by Ascher Opler4 and the second kind in a paper
`by the author."
`
`5. The organization of a statistical searching system
`
`• Objective
`
`The primary objective of the proposed system is the
`minimization of the intellectual effort of professionals at
`the document encoding stage of the system so that this
`day-by-day routine may be performed by automatic
`means and with a minimum of professional personnel, in
`accordance with a few simple rules. The intellectual
`effort of professionals, who are relieved of the routine
`encoding task, is now shifted to and concentrated at the
`creative stage of setting up the system itself. This effort
`will be quite substantial when a system is first installed
`and will call for above-average talent. Thereafter, a
`moderate effort will be required periodically to update
`the system.
`
`• Creating a dictionary of notions
`
`The procedure to be described is similar to the one used
`by P. M. Roget for compiling his Thesaurus of English
`Words." Roget created categories of words that had a
`
`313
`
`IBM JOURNAL • OCTOBER 1957
`
`

`

`the system is to be a dynamic one, such a sample should
`consist of the "youngest" age group, comprising all ac(cid:173)
`cessions from the present back to a judiciously selected
`date. The choice of these data should in part be governed
`by the number of useful documents obtained.
`The next step consists of transcribing the sample docu(cid:173)
`ments into punched or magnetic tape, i.e., into a form
`which will permit subsequent mechanical operations on
`the information. Inasmuch as certain grammatical fea(cid:173)
`tures of words should be recognized in subsequent steps
`of the procedure, it would be advantageous to identify
`certain classes such as nouns, adjective qualifiers, and
`names by special symbols. Eventually these differentia(cid:173)
`tions may be determined by machine, as they will have
`to be when the art of machine translation is perfected.
`The third step is the preparation of a card index of all
`transcribed sentences. A concordance worked out with
`these cards will then result in the grouping of words of
`similar or related meaning into "notional" families. This
`is so similar to the work required for the creation of a
`thesaurus such as Roget's, that the basic organization of
`such books may well serve as the skeleton for this process.
`The formation of notional families constitutes a major
`intellectual effort to be undertaken by experts thoroughly
`familiar with the habits of communication among people
`associated with the special field of the subject literature.
`These experts would endeavor to differentiate the no(cid:173)
`tional families so as to resolve the material in terms of
`an optimum number of equally weighted elements. For
`instance, in a field that specializes in electricity the
`notion "electricity" would be common to most docu(cid:173)
`ments and therefore worthless as a discriminating ele(cid:173)
`ment. On the other hand, in this same field, the notion
`"butterfly" would be entirely too specific for a separate
`notional family. Instead, the notion "electricity" would
`have to be broken down into an appropriate number of
`subnotions in accordance with qualifying adjectives that
`might accompany it, while the notion "butterfly" would
`have to be relegated to a notional family of broader
`aspects, such as the notions "insects," "animals," or
`"living things," depending on the overall frequency of
`occurrence of such notions.
`In one of many possible systems, for instance, it is
`assumed that nouns (including gerunds) and, where
`necessary, qualified nouns are capable of providing an
`effective set of discriminating notions. Such nouns would
`then be grouped into notional families in accordance
`with the principles established by Roget. The physical
`result would be a dictionary in two parts. The first part
`would be the listing in some systematic order of the
`notional families, each identified by an index symbol
`such as a number or key word. Each of these would
`represent a listing of the words from the sample docu(cid:173)
`ments which are related with respect to the notion they
`express. If more than one language is involved, the words
`within a family might be segregated by language. The
`second part would be an alphabetic index of the words
`occurring in the first part, giving the key word and index
`number of the one or several notional families of which
`
`STATISTICAL
`BREAKDOWN
`
`DEFINITIONS
`
`Figure 2 Information searching system-creation of
`a dictionary of notions.
`
`family resemblance on a conceptual level. He arrived at
`approximately 1000 of these categories for the entirety
`of experience. Under such a category as "space" he lists
`all words and phrases that include any notion of spatial(cid:173)
`ity. This procedure, as adapted here, also relies greatly
`upon the techniques used in the preparation of con(cid:173)
`cordances of significant works in literature such as have
`been applied in connection with the complete works
`of St. Thomas Aquinas as described in a recent article. 7
`The virtue of such a procedure is that it provides for the
`greatest possible extent of mechanization. In the form
`presented it is most applicable to a collection of docu(cid:173)
`ments embracing a specialized field that would normally
`be pertinent to a research activity serving an industrial
`concern.
`The first step in the procedure is the establishment of
`a basic sample drawn from the collection (Fig. 2). Since
`
`314
`
`IBM JOURNAL • OCTOBER 1957
`
`

`

`the given word is a member. This index may also be
`segregated by language.
`As far as intellectual effort is concerned, the establish(cid:173)
`ment of notional categories at the word level, as prac(cid:173)
`ticed in this system, appears to have advantages over the
`development of classification or subject headings. It is
`estimated that the number of notional categories required
`will be less than a thousand and that this number will
`grow at a very low rate. In the case of classifications or
`subject headings, the number and growth rate would
`probably be substantially higher. Also it is likely that
`it is easier to establish notional family membership for
`the words than to define exact classes of subject headings
`and their subsequent interpretations. Lastly, the reduc(cid:173)
`tion in the effort required to maintain and update the
`system should prove significant.
`
`• The encoding of documents
`
`The encoding of the documents of the sample may now
`be carried out with the aid of the dictionary of notions.
`This process consists of recording each document in
`terms of notional elements and thereby creating patterns
`which later will serve as a means of recognizing con(cid:173)
`ceptual similarity in varying degrees between documents.
`This process might best be carried out on the basis
`of the prevalent patterns of literary organization. Implied
`here is the capability for recognizing various levels of
`the relatedness of notions as reflected by the author's
`formation of sentences, paragraphs, chapters, et cetera.
`There is also the probability that the more frequently a
`notion and combination of notions occur, the more im(cid:173)
`portance the author attaches to them as reflecting the
`essence of his overall idea.
`At this point the question arises: To what degree of
`specificity must notions and their relationships specifically
`be encoded to arrive at a practical measure of compari(cid:173)
`son? The answer probably cannot be given with a back(cid:173)
`ground of practical experience. A practical method is to
`start with a broad system and to determine by experience
`whether and where refinements are needed. If, on the
`other hand, a s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket