throbber

`
`
`
`001
`
`Facebook Inc. Ex. 1204
`
`

`

`Datenverarbeitung im Recht
`Archiv fur die gesamte Wissenschaft der Rechtsinformatik, der Rechtskybernetik
`und der Datenverarbeitung in Recht und Verwaltung.
`Zitierweise: OVA
`
`Herausgeber:
`Dr. jur. Bernt Buhnemann, Wissenschaftlicher Oberrat an der Universitat Hamburg
`Professor Dr. jur. Dr. rer. nat. Herbert Fiedler, Universitat Bonn/Gesellschaft fur
`Mathematik und Datenverarbeitung, Birlinghoven
`Dr. jur. Hermann Heussner, Vorsitzender Richter am Bundessozialgericht, Kassel,
`Lehrbeauftragter an der Universitat GieBen
`Professor Dr. jur. Dr. phil. Adalbert Podlech, Technische Hochschule Darmstadt
`Professor Dr. jur. Spiro.s Simitis, Universitat Frankfurt a. M.
`Professor Dr. jur. Wilhelm Steinmuller, Universitat Regensburg
`Dr. jur. Sigmar Uhlig, Regierungsdirektor im Bundesministerium der Justiz, Bonn
`(Geschaftsflihrender Herausgeber)
`
`Beratende Herausgeber und standige Mitarbeiter:
`Dr. Helene Bauer Bernet, Service juridique commission C. E., Brussel - Pierre
`Catala, Professeur a Ia Faculte de Droit de Paris, Directeur de l'lnstitut de Recher(cid:173)
`ches et d'Etudes pour le Traitement de !'Information Juridique de Montpellier(cid:173)
`Prof. Dr. jur. Wilhelm Dodenhoff, Vors. Richter am Bundesverwaltungsgericht,
`Berlin- Dr. Aviezri S. Fraenkel, Department of Applied Mathematics, The Weizman
`Institute of Science, Rehovot - Prof. Dr. jur. Dr. phil. Klaus J. Hopt, M. C. J.,
`Universitat Tubingen - Prof. Ejan Mackaay, Director of the Jurimetrics Research
`Group, Universite de Montreal - mr. Jan Th. M. Palstra, Nederlandse Economische
`Hogeschool, Rotterdam - Professor Dr. Jurgen Redig t. Universitat GieBen -
`Direktor Stb. Dr. jur. Otto Simmler, Administrative Bibliothek und Osterreichische
`Rechtsdokumentation im Bundeskanzleramt, Wien- Professor Dr. Lovro Sturm,
`Institute of Public Administration, University in Ljubljana- Professor Dr. jur. Dieter
`Suhr, Freie Universitat Berlin - Professor Colin F. Tapper, Magdalen College,
`Oxford -
`lie. jur. Bernhard Vischer, UNIDATA AG, Zurich- Or. Vladimir Vrecion,
`Juristische Fakultat der Karls-Universitat in Prag.
`
`Geschaftsfu h render Herausgeber:
`Dr. Sigmar Uhlig, Ander Dune 13, D-5300 Bonn-Tannenbusch,
`Telefon 0 22 21/66 13 78 (privat); 0 22 21/5 81 oder 58 48 27 (dienstlich)
`Redaktioneller Mitarbeiter:
`Dieter Hebebrand, Fliederweg 1, D-3501 Niestetal, Telefon 05 61/52 46 31 (privat);
`05 61/30 73 62 (Bundessozialgericht)
`
`Manuskrlpte, redaktlonelle Anfragen und Besprechungsexemplare werden an den Ge(cid:173)
`schiftsfiihrenden Herausgeber erbeten, geschiftllche Mitteilungen an den Verlag. Fiir
`unverlangt elngesandte Manuskrlpte wlrd kelne Gewihr gelelstet.
`Die Beltrige werden nur unter der Voraussetzung aufgenommen, daB der Verfasser
`denselben Gegenstand nlcht glelchzeltlg In elner anderen Zeltschrlft behandeH. Mit der
`Oberlassung des Manuskrlpts iibertrigt der Verfasser dem Verlag auf die Dauer des
`urheberrechtllchen Schutzes auch das Recht, die Herstellung von photomechanlschen
`Vervlelfiltlgungen In gewerbllchen Unternehmen zum lnnerbetrlebllchen Gebrauch zu
`genehmlgen, wenn auf jedes Photokopleblatt elne Wertmarke der lnkassostelle des
`Borsenvereins des Deutschen Buchhandels, GroBer Hlrschgraben 17/19, 6000 Frankfurt
`a. M., nach dem jewell& geltenden Tarlf aufgeklebt wird.
`
`002
`
`Facebook Inc. Ex. 1204
`
`

`

`Colin F. H. Tapper
`
`Citation Patterns in Legal Information Retrieval
`
`Obersicht
`
`A. State of the Art
`1. The Established Method and Its
`Defects
`2. Improvements
`3. An alternative
`4. Defects of the Alternative
`
`A. State of the Art1
`
`B. Citation Patterns
`1. Citations and Research
`2. Citations and Information Retrieval
`3. Citations and Vectors
`4. Some Problems
`5. Uses of Citation Vectors
`6. Weighting
`
`1. The Established Method and Its Defects
`
`In the late 1950's and early 1960's it first became possible to contemplate the use
`of computers to assist in the retrieval of legal information. Those years saw the
`formation of a special sub-committee of the American Bar Association, the
`launching of a number of journals specifically devoted to computer applications in
`law and predominantly to legal information retrieval,2 and the establishment of a
`number of research programs.3 Largely owing to the work of John Horty at the
`University of Pittsburgh the direction taken by most of these experiments, at least
`in the Anglo-American legal world, was towards the use of the full-text of legal
`documents for retrieval systems. There were a number of reasons for this. The
`main one was a distrust of any screen put between the lawyer confronted by the
`problem and the information made available to him. Indexing and abstracting are
`essentially methods of reducing the bulk of the information presented to the
`lawyer so as to make it of manageable magnitude.
`Information regarded by the indexer or abstracter as less important is either
`discarded altogether or restated in a more general and more concise form. It is
`then possible for the lawyer to scan this reduced version and to select those parts
`which seem relevant to his problem. With the advent of the computer it seemed
`that things had changed. The machine could scan any amount of information in a
`very short space of time, and so long as its judgment of relevance was satisfactory
`could save the lawyer work by presenting him with all and with only relevant
`
`1 See generally Tapper ,Computers and the Law' chs. 5-7.
`2 Jurimetrics Journal (formerly Modern Uses of Logic in Law), Law and Computer Techno(cid:173)
`logy, Rutgers Journal of Computers and Law, Datenverarbeitung im Recht.
`3 Among the earliest experiments were those of Professor Horty at the University of
`Pittsburgh, Colin Tapper at Magdalen College, Oxford and Aviezri Fraenkel at the
`Weizman Institute of Science.
`
`003
`
`Facebook Inc. Ex. 1204
`
`

`

`250
`
`Colin F. H. Tapper
`
`material. It was argued that this method optimised the interaction of man and
`machine by restricting the machine to the mechanical job of searching for
`matches between words specified by the lawyer, and yet still allowed full scope for
`the creativity of the lawyer in selecting the words to be matched.
`
`At first some felt that this approach was best suited to statutory materials because
`their volume was smaller and the use of words was relatively more precise than in
`case-law. Others took the view that case-law was more suitable than statute on the
`basis that the greater volatility of statutory materials more than compensated for
`their relatively smaller bulk, and that the relative poverty of statutory vocabulary
`could lead to failure to retrieve relevant information. Conversely it was felt that the
`trend towards ever cheaper storage of information reduced the force of the
`argument based on bulk. In fact the best argument in favour of concentrating on
`statutory materials within full-text systems was hardly ever deployed. This is that
`the sort of search which is commonly employed in the statutory area corresponds
`much more closely to the scanning technique of full-text than does the sort of
`search commonly required in the area of caselaw. Another argument is simply that
`there is less scope for reduction in statutory materials where every word is
`authoritative than in case-law where only the rule in a case can ever be authoritati(cid:173)
`ve. As usual however the decisive arguments were economic ones. No system
`offering access to statutes alone is economically viable or psychologically accep(cid:173)
`table in an environment in which both statutes and caselaw have to be used. So
`working systems developed for widespread use catered for both statute and
`case-law as a data-base.
`
`These systems were essentially full-text systems based on the principle of word
`matching. This requires the lawyer to state his problem in terms of the words and
`combinations of words which he would expect to find in any document relevant to
`his problem. All of these terms are pregnant with difficulty. What is to count as a
`word? When is one word different from another? What sorts of combination are
`allowed? What principle of individuation is to be applied to legal documents?
`These questions have received pragmatic solutions. In general a word is equated
`with a string of characters terminated either by a space or by some punctuation,
`though there are exceptions to this rough definition. Different strings are regarded
`as different words. Strings can be combined into lists and lists into final search
`formulations by Boolean operators and semantic distance measured in terms of
`document, sentence and words. In legislation the section is usually regarded as
`the basic unit, in case-law the case. In general also so as to reduce storage
`requirements in the concordances used by such systems ,common words' are
`omitted. At first such systems were operated in batch mode, but increasingly they
`are offered on an interactive basis. This means that the user types in the words
`which are to characterise the answers to his problem, and is given the opportunity
`to review his characterisation in the light of interim results. The results are
`commonly expressed first as a numerical value representing the number of
`documents which satisfy the user's characterisation. The user then has the option
`of having the whole or part of the text of those documents displayed, or of
`modifying his characterisation. This process goes on until the user is satisfied by
`
`004
`
`Facebook Inc. Ex. 1204
`
`

`

`Citation Patterns in Legal Information Retrieval
`
`251
`
`the responses obtained from the system at which stage he can get a hard copy,
`i.e., one printed on paper, of either the references to or text of those documents
`which satisfy his final characterisation.
`It is plain that these methods harbour a number of defects. These may be classified
`into two broad groups. The first includes those which affect the level of perform(cid:173)
`ance in terms of the quality of the material produced, and the second those which
`otherwise affect the acceptability of the system to users. So far as the first is
`concerned it is easier to propound theoretical reasons for it than to produce
`empirical evidence, since there has been no published report of any systematic
`test of the more advanced systems now being offered commercially. Such empiri(cid:173)
`cal evidence as there is relates only to cruder and earlier experimental systems.4
`That evidence is somewhat equivocal in suggesting that while machine perform(cid:173)
`ance does tend to retrieve relevant information which cannot be recovered in any
`other way, it does so only at the expense of recovering a vast amount of irrelevant
`information also. The explanation lies in a combination of several factors. First it
`occurs because the nature of the procedure relies upon occurrence and co-occur(cid:173)
`rence of character strings as a unique indicator of meaning. This process has a
`number of drawbacks. In the first place, very similar meanings can be encapsulat(cid:173)
`ed in very different character strings, so all must be specified. Secondly, the same
`character string can encapsulate very differe_nt meanings in different contexts. The
`former raises problems of synonyms and, much more potently, of levels of
`abstraction. The latter that of homologues. Thus ,auto', ,automobile', and ,car'
`although all different character strings have m~anings which are substantially
`similar; so, too, ,Chevrolet', ,car' and ,vehicle' can easily have the same meaning in
`the context of some legal problems though they may not do so in all; and then
`,jury' in the context of trial by jury has a different meaning from ,jury' in the sense
`of a temporary maritime repair although the character strings are identical. This
`means that in order to characterise his meaning uniquely and accurately the user
`must specify all possible synonyms, particularisations and generalisations (inclu(cid:173)
`ding all their different grammatical forms), and exclude all identical character
`strings having different meanings. The latter task can only be accomplished by
`way of the context in which the character strings appear. So some lists of strings
`must be combined and some use made of combinatorial logic. It follows that the
`user must not only be able to specify in advance all the different ways of
`expressing the meaning he wishes to include, but also all the different ways of
`expressing the meaning of a least one other meaning which he wishes to find
`associated with the original meaning and the way in which the two are to be linked.
`It will be a great help to him in doing this to be able to think of the strings most
`likely to be associated with the other unwanted meanings of the string he wishes to
`use so as to be sure that they are excluded from the combined list. Thus if a lawyer
`wishes to find cases dealing with temporary maritime repairs, he must not only ask
`for occurrences of the string ,jury' which if specified alone would deluge him with
`unwanted references to jury trial, but must also specify some association with, for
`
`4 Summarised in Tapper op. cit. ch. 6.
`
`005
`
`Facebook Inc. Ex. 1204
`
`

`

`252
`
`Colin F. H. Tapper
`
`example, ,sail' or ,mast' or ,rig', remembering in the last case to be very careful
`about position so as to exclude cases referring to jury rigging in the unintended
`sense. It should by now be apparent where in the difficulty lies. In order to retrieve
`all the relevant information the user must be able to specify all the possible
`synonyms, particularisations and generalisations, whereas in order to retrieve only
`relevant information he must combine his strings in such a way as to exclude every
`possible ambiguity of every string in every list. In practice these two conflicting
`tasks have to be balanced against each other so that the user has to be content
`with as much of the relevant information as is compatible with not getting too
`much irrelevant material, or more commonly vice versa.
`
`It is at this point that the question of the general acceptability of these methods to
`potential users becomes apparent. It is neither easy to think in these ways nor is it
`customary for lawyers to do so. Such thinking is not common in any other context
`nor is it taught at law school. It can to some extent be taught, and most systems
`intended for wide use offer either instruction courses or practice manuals, or both.
`But the really effective method of learning to use such systems is by trial and error.
`After a short course or after reading a manual a lawyer can use the system in the
`sense of operating the terminal so as to secure some relevant results. But efficient
`use of the system in the sense of securing all the relevant information, and only the
`relevant information in the shortest possible time, is something which is acquired
`only over long periods of constant use. This is expensive in both time and money.
`In addition, such potent factors as the conservatism of many members of the legal
`profession, particularly older and more senior members, and the inability and
`reluctance of many to acquire the physical dexterity required to operate a key
`board effectively contribute to the relative reluctance of the legal profession to
`embrace these new systems.
`
`A further problem which has gradually become more apparent relates to access to
`the information. One of the reasons for the development of computerised methods
`was that the volume of legal material was increasing at such a startling rate that it
`could not be handled by conventional means. The choice of full-text as a method
`for computerised retrieval was taken in the teeth of that reasoning on the ground
`stated earlier that the cost of holding and securing access to data held in computer
`stores was decreasing at a time when the costs of all other forms and methods was
`increasing. That was and is true. But what it tendend to gloss over was the initial
`cost of transforming existing legal information into a machine readable form, the
`unwieldiness of the enormous volumes of material required to be stored and the
`difficulty of keeping it, and especially the statutory material, up to date. It was also
`the case that the promoters of computerised legal information retrieval systems
`did not always have the legal rights to reproduce relevant legal materials where the
`copyright was in the hands of private publishers. This would not have been too
`serious if the development of automatic devices for transforming printed books
`into a computer readable format (optical character recognition devices) had been
`more successful, or even if the publishers of law reports had co-operated with
`those collecting data for retrieval systems by producing their printed versions from
`computerised typesetting processes and making the corrected tapes available. It
`
`006
`
`Facebook Inc. Ex. 1204
`
`

`

`Citation Patterns in Legal Information Retrieval
`
`253
`
`seems also that the expense of maintaining and making accessible so large a
`volume of material as the full-text system demands has imposed considerable
`strains and constraints upon the commercial systems. It has tendend to make them
`rather more expensive and rather more selective than would be ideal from the
`point of view of either users or promoters.
`All of these factors have contributed to slowing down the development of compu(cid:173)
`terised legal information retrieval though they have not stopped it altogether,
`which in itself testifies to the felt need for some form of speedier access to legal
`information.
`
`2. Improvements
`
`These factors have not passed unnoticed by the designers of information retrieval
`systems, and various devices have been suggested to help overcome them. First
`there is the problem of specifying all of the strings necessary to retrieve all of the
`relevant material. In theory this can be mitigated in two ways, either by increasing
`the number of strings in the original question, or by increasing the number of
`answers that the original question retrieves. Most systems have tended to concen(cid:173)
`trate on the former. The precise method depends to some extent upon the way in
`which the particular system requires an enquiry to be prepared. If it is by the
`specification of lists of strings then a number of possibilities suggest themselves.
`The most obvious of these is the use of an automatic thesaurus. The difficulty with
`this solution is that it would be extremely difficult to prepare such a thesaurus in
`advance, and it is difficult to see how exactly it could cater for variation as between
`different legal fields. Thus the string ,election' has one set of synonyms in the
`context of equity and another in that of constitutional law. Even more difficult
`problems are posed by particularisations and generalisations, it would, for examp(cid:173)
`le, be difficult to give all the possible particular forms of a string like ,reasonable' in
`the context of the law of negligence. A different approach to this problem, and one
`adopted by the DATUM system in Montreal, is to develop the thesaurus ex post
`facto. Thus final and presumably successful lists of strings are stored and the
`strings in each list are regarded as equivalent for the purpose of enriching the
`strings specified by the current user. This may either be automatic, or, if the system
`is interactive, may operate by supplying the suggested equivalent strings to the
`user for his acceptance or rejection. It should be pointed out that some human
`intervention is required either at the stage of grouping the strings or at that of
`using them just because of the problem of homologues. In some systems que(cid:173)
`stions can be asked not in the form of lists of strings accompanied by Boolean
`connectors and positional specification, but by a simple natural language que(cid:173)
`stion. Such systems often have to apply rules to break the question down into a
`succession of strings and operators. Since there is likely to be only one string to
`each operator, it is further necessary to amplify them. One method would be to
`include in the system a dictionary of word roots so that other strings with the same
`root could be supplied. The same could be done for grammatical variation. The
`difficulty here lies in the premise that all strings coming from the same root can
`first be identified and second regarded as equivalent. The same objection applies
`
`007
`
`Facebook Inc. Ex. 1204
`
`

`

`254
`
`Colin F. H. Tapper
`
`to a comparable solution to the problem of different grammatical forms. Of course
`these objections are less serious in an interactive environment where the user can
`check the amplification of his question, though this may be contrary to the
`intention behind the natural language question approach which is to spare the
`lawyer involvement with that sort of exercise. Two other techniques which can be
`used in either context involve truncation and contextual display. The former is
`quite commonly provided in retrieval systems. It permits the user to instruct the
`system to include all strings having a certain number of characters in certain
`positions, typically consecutively at the beginning. Thus the user might specify
`,taxa+' where,+' indicates to the system that all strings beginning ,taxa' are to be
`included thus bringing in ,taxable' and ,taxation'. This is a very crude tool and it is
`necessary to be very careful in its use. Thus in the example given above it may be
`noted that ,tax' and ,taxing' are not included, but any specification short enough to
`include them such as ,tax+' would also include a huge variety of unwanted strings
`such as ,taxi' and ,taxonomy'. The other method is to display the string specified in
`its alphabetic place in the system's concordance so that the user may select which
`of the alphabetically close forms should be included. This is a natural preliminary
`to the method discussed above, and is subject to the same objection, nf:imely that
`words may be so widely separated alphabetically that they might not be detected. It
`is not clear how effective such techniques are in amplifying lists of strings. There is
`no empirical evidence derived from testing systems with and without such facili(cid:173)
`ties. In its absence it is possible to infer from the near universal adoption of some
`at least of these techniques that they are to some extent effective, but from the
`equally intensive search fo·r ways to improve them, that they are not perfect.
`The converse problem is that of reducing the amount of irrelevant material that
`would otherwise be retrieved. Here, as indeed with the problem just discussed, the
`most potent device is probably the use of an interactive system with the facility for
`supplying the number of documents retrieved by each search for~ulation and
`displaying any part of those documents. In the case of document reduction this
`may be made still more effective by including facilities for the display of those
`parts of the material physically adjacent to a required string, this is the keyword in
`context (KWIC) approach in which the user can specify as m.uch context in
`physical terms as he chooses. In interactive systems it is more common to achieve
`a similar result by highlighting required strings in the displayed text either by the
`use of bold characters, reverse video or some similar technique. This may be
`further enhanced by the provision of keys allowing the text to be skipped to the
`next highlighted term. In this way it is hoped that the user will be able to identify
`speedily, and to reject retrieved documents which are irrelevant because one of
`the required strings has proved ambiguous in meaning. Another method devoted
`not to preventing the retrieval of irrelevant material but to marshalling all the
`material is to present it to the user not in the order in which the documents appear
`in the database, but in descending order of relevance to the user's search
`formulation. Techniques to achieve this are known as ranking algorithms. They
`usually work on the comparison of the frequency of occurrence of the required
`strings in individual documents and in the database as a whole. A cruder basis
`would simply rely upon the frequency of occurrence of the required strings in the
`
`008
`
`Facebook Inc. Ex. 1204
`
`

`

`Citation Patterns in Legal Information Retrieval
`
`255
`
`documents retrieved. Once again there is no hard evidence of the effectiveness of
`such techniques. The main complaint from those who have used them is that they
`tend to exalt documents more by length than relevance, presumably because the
`longer the document the more likely it is that any given string will appear, and
`appear more frequently. Nor is it generally very easy for users to understand the
`probable effects of choosing particular algorithms.
`
`3. An alternative
`Because of the difficulties which were experienced with the string matching
`approach to legal information retrieval by computer an alternative, often referred
`to as the vector technique, was developed in the late 1960's. So far it has not been
`adopted by any widely used system in law,5 but it has a number of interesting
`features which could meet some of the objections raised above. In outline this
`technique substitutes the concept of findling an approximate match for the whole
`document in place of a precise match for part of it. The established system relies
`upon a precise fit between word strings, if the precise match is achieved then the
`document is considered relevant, if it fails in even the slightest respect then it is
`disregarded. It is, in the opinion of many, just this feature which makes it
`impossible for the established system to function efficiently. It presupposes that
`information is either relevant or irrelevant, and that this can be detected by testing
`for the presence or absence of particular character strings. The proponents of this
`alternative approach argue that relevance is a continuous variable in that legal
`material is more or less relevant, and thus that no all or nothing test can for this
`reason ever be completely successful. It may be interjected at this point that a
`more plausible position would accept the arguments of both sides, but direct them
`to different types of legal search. Thus there clearly are some sorts of legal search,
`such as those for all statutory uses of particular words or phrases, or all case-law
`discussion of particular provisions which are all or nothing in the required sense,
`and for which the established technique is ideal. Nevertheless, it is undeniable
`that much case-law searching is of a less precise nature, and it is to this that the
`vector technique is most appropriately directed.
`It too proceeds on the basis of a machine readable version of the full text of the
`material to be analysed, and usually common words are omitted, though there are
`some particular applications of the alternative technique such as the detection of
`authorship which depend upon the presence of common words. This full-text
`requirement presents a few problems since even in the established system com(cid:173)
`mon words appear in the original machine readable version, not because they are
`required in the retrieval procedure itself but because it is easier to prepare the text
`with the words included than to exclude them, and because it would make the text
`impossible to read if they were excluded from the version which is displayed
`during the retrieval procedures or printed out after the procedures have been
`completed. Similarly in just the same way as the established one the alternative
`
`5 This technique was first suggested by Professor Salton and forms the basis of the SMART
`retrieval system. It has been adapted to legal materials by Bryan Niblett and Gillian
`Boreham at the University of Kent whose generous assistance to the author is gratefully
`acknowledged.
`
`009
`
`Facebook Inc. Ex. 1204
`
`

`

`256
`
`Colin F. H. Tapper
`
`system constructs an inverted file of the uncommon words, though there is one
`difference in that the concordance has no need to store full positional information.
`It is sufficient merely to note the document references for each string. This
`concordance is then supplemented by a file or vector for each document which
`records the strings contained in that document. At this point, different applications
`of the alternative approach diverge. One possibility is to store these strings
`together with an indication of their frequency of occurrence in that particular
`document. It is then possible to accept a question in natural language and to
`compare the strings or vector used in the formulation of the question with the
`strings or vector which represent the documents contained in the database. This
`will not be on the basis of attempting to find a complete match, but the nearest
`match. Thus it will not matter if some of the strings in the question do not appear in
`some of the documents, but rather the documents will be presented to the user in
`the order of closeness of fit. Thus this technique incorporates a natural ranking
`algorithm. It is possible to exploit this feature still further. It is unlikely that any
`natural language formulation of a question in the true sense, that is a sentence that
`one might use in ordinary speech or writing would contain much repetition of key
`terms. It is thus possible that some relevant documents would contain none of the
`strings appearing in the question. On the other hand, some relevant documents
`almost certainly would contain some of the strings. This system has the facility not
`only to compare documents with the question that is asked, but also with each
`other. Thus it is quite likely that a document which matches a particular question
`very well may itself match another document very well although the other docu(cid:173)
`ment does not match the question at all just because none of its strings happens to
`have been specified in that particular formulation. The vector system could be
`organised so as to indicate such a second document as having possible relevance
`to the question. In this way a document would be indicated which could not
`possibly be found by the established system (working on an unenriched natural
`question basis). This alternative can of course operate on the same sort of
`question formulation as the established system, namely on lists of strings connec(cid:173)
`ted by Boolean operators. The only difference would be that no positional quali(cid:173)
`fiers could be accepted. Similarly the alternative system could employ most of the
`methods described above to improve search formulation such as string truncation,
`root specification, KWIC and highlighted displays. An exception would be the case
`mentioned above where a document is retrieved without the appearance of any
`string which has been explicitly specified.
`
`A further enhancement possible in the alternative system is to give different
`weights to particular strings. This can be done either on the basis of the frequency
`of occurrence of a string, and is indeed very commonly done in that context, or
`perhaps by reference to the string's position in the question, or by the explicit
`prescription of the user. In any event the precise amount and effect of the
`weighting has to be decided upon and implemented. This makes it slightly
`awkward to allow user's prescription since few users would want to be bothered
`with transmuting an intuitive preference into a precise mathematical value.
`
`010
`
`Facebook Inc. Ex. 1204
`
`

`

`Citation Patterns in Legal Information Retrieval
`
`257
`
`4. Defects of the Alternative
`
`To some extent it suffers from the same defects as the established system. Thus it
`requires a machine-readable and up to date version of the legal documents it
`includes. The enormous bulk of full-text presents to that extent the same problem.
`But in fact it is even more serious here since the amount of computing involved in
`this technique is very much greater than in the established systems. The establis(cid:173)
`hed technique is based on string matching which essentially involves a trivial
`amount of subtraction. Thus if the string ,cab' should be represented by the value
`,312' it is merely a matter of subtracting ,312' from all the other values and
`accounting it an exact match when the result is ,000'. In the alternative system the
`long sequences of values which constitute the vectors have to be summed and
`very large numbers manipulated to compute correlation coefficients either be(cid:173)
`tween question formulations and documents, or even wors

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket