throbber
JOURNAL OF LAW AND INFORMATION SCIENCE
`Published by the Faculties of Law, and Mathematical and Computing Sciences
`New South Wales Institute of Technology
`
`Vol. 1 No. 2
`
`1982
`
`EDITORIAL BOARD
`
`Chairman
`Hon. Mr. Justice M.D. Kirby,
`Chairman, Australian Law Reform Commission
`
`Editor
`Dr. R.A. Brown,
`Lecturer in Law, N.S.W.I.T.
`
`Members
`
`Mr. G.W. Bartholomew,
`Dean, Faculty of Law,
`N.S.W.I.T.
`
`Mr. D. Biles,
`Assistant Director (Research)
`Australian Institute of Criminology
`
`Professor J. Bing,
`Norwegian Research Centre for
`Computers and Law, Oslo University
`
`Professor A. Blackshield.
`Dept. of Legal Studies
`La Trobe University
`
`Professor P. Catala,
`Institut de Recherches et d'Etudes
`pour le Traitement de l'Information
`Juridique, Université de Montpellier I
`
`Dr. V.X. Gledhill,
`Dean, Faculty of Mathematical
`and Computing Sciences,
`N.S.W.I.T.
`
`Professor J. Goulet,
`Faculté de Droit
`Université Laval, Quebec
`
`Mr. C.H. Gray,
`Dept. of Mathematics and Statistics,
`C.S.I.R.O.
`
`Mr. W.A. Steiner,
`Librarian
`Institute of Advanced Legal Studies
`
`Mr. C. Tapper,
`All Souls Reader in Law,
`Magdalen College, Oxford
`
`Mr. R.J. Watt,
`Senior Lecturer in Law,
`N.S.W.I.T.
`
`Professor D.N. Weisstub,
`Professor of Law and Psychiatry,
`Osgoode Hall Law School
`
`Professor D. Whalan,
`Faculty of Law,
`Australian National University
`
`001
`
`Facebook Ex. 1015
`
`

`
`This volume may be cited as
`(1982) 1 J.L.I.S.
`
`Articles, books for review, subscriptions and all inquiries should be
`addressed to The Editor, Journal of Law and Information Science,
`C/O Faculty of Law, New South Wales Institute of Technology,
`P.O. Box 123, Broadway, N.S.W. 2007, Australia.
`
`Copyright © 1982 New South Wales Institute of Technology
`
`All rights reserved. Subject to the law of Copyright no part of this publication
`may be reproduced stored in a retrieval system or transmitted in any form
`or by any means electronic, mechanical, photocopying, recording or otherwise,
`without the permission of the owner of the copyright. All inquiries seeking
`permission to reproduce any part of this publication should be addressed in
`the first instance to The Editor, Journal of Law and Information Science,
`do Faculty of Law, New South Wales Institute of Technology, P.O. Box 123,
`Broadway, N.S.W. 2007, Australia.
`
`Printed in Singapore by Tak Seng Press Pte. Ltd.,
`147, Hill Street, Singapore 0617.
`
`002
`
`Facebook Ex. 1015
`
`

`
`Editorial
`
`Articles
`
`Legality — Information Technology and
`the Laws of Evidence
`
`T.H. Smith
`
`Theories of Information in Law (Recent
`Developments in the Discipline of Rechts-
`und Verwaltungsinformatik -RVI- in
`Germany) A Background Paper
`
`H. Burkert
`
`The Use of Citation Vectors for Legal
`Information Retrieval
`
`C. Tapper
`
`Computerisation of Legal Material in
`Australia
`
`P.J. Ward
`
`Page
`
`vii
`
`89
`
`120
`
`131
`
`162
`
`Casenote
`Conwell v. Tap field
`
`Book Reviews
`The Solicitor and the Silicon Chip
`
`Control & Audit of Small/Medium Com-
`puter Systems
`
`D.I. Robinson
`
`175
`
`R.A. Brown
`
`R.J. Watt
`
`179
`
`180
`
`003
`
`Facebook Ex. 1015
`
`
`
`
`
`
`

`
`Vol. 1 No. 2
`
`131
`
`THE USE OF CITATION VECTORS FOR LEGAL
`INFORMATION RETRIEVAL
`
`C. TAPPER
`
`Cohn Tapper is one of the founders of the study of
`computers and law in England, and this paper adds to his
`contributions to the field. As its title indicates, the article
`is concerned with the use of case citations as selection vectors
`in legal information retrieval, and, in particular, with the value
`of citation vestors in coin parison to the usual semantic vectors
`currently used.
`
`The author details recent experiments with citation vectors
`in the United States and at the Norwegian Research Centre
`for Computers and Law (NRCCL). The comparative results
`of using citation vectors against semantic vectors in these
`experiments are documented and considered, and Mr. Tapper
`provides some valuable discussion of the algorithms used in
`computing and assessing vectors in data retrieval. Despite the
`complexity of this work, it will be of great value to all
`interested in the field of computers and law, because of its
`implications for the future development of legal data retrieval.
`
`The first section of this article is intended for those who have no,
`or little, previous awareness of legal information retrieval techniques.
`Since the main aim of the article is to explain the theory behind the
`substitution for such methods of citation vectors those who have the
`requisite familiarity with matching and vector based systems as applied
`to law might prefer to start with the second section.
`
`1. Current Legal Information Retrieval Techniques
`It is now about 25 years since at the University of Pittsburgh
`in Pennsylvania Professor John Horty first succeeded in applying
`computerised methods to the retrieval of legal information. It is a
`tribute to his insight that the techniques which he devised remain the
`bedrock of virtually all of the systems which operate in the world
`today. The essence of the technique is the identification in the text
`of a document of a word, or words, in a particular combination which
`have been selected by the lawyer as being likely to indicate the
`relevance of that document to the lawyer's problem. As normally
`implemented the system creates a concordance of the full legal texts
`constituting the database of the system, excluding only words of such
`low prima facie information content that they are highly unlikely to
`be nominated by lawyers as search terms. Each concordance item
`then becomes a potential search term, and searches are typically
`conducted by the nomination of classes of words, for example synonyms,
`grammatical variations, particularisations and generalisations, which
`must occur in a given relationship to other similar classes in a docu-
`ment in order for it to satisfy the search request as a potentially
`
`004
`
`Facebook Ex. 1015
`
`
`

`
`132
`
`Journal of Law & information Science
`
`(1982)
`
`relevant document. So as to accomplish this process the lawyer must
`first accustom himself to thinking in terms of word occurrence rather
`than directly in terms of the meaning of a document. He must be
`comprehensive in his classification of terms, and be must be able to
`specify the appropriate logical relationship in terms of Boolean logic
`and relative sequential occurrence in order to secure an answer. In
`a commercially operational system he will also be well-advised to
`consider very carefully not only whether his categorisation is appro-
`priate, but whether it is the most efficiently appropriate formulation
`of his search, since the more efficient the search the quicker and
`cheaper it becomes.
`
`It is no exaggeration to say that this process teems with problems
`both for the system designer and for the average lawyer. Many of
`these can be mitigated by proper training and continual practice.
`Some of them are more intractable. At the level of the selection of
`words with prima facie low information content there is the difficulty
`that "word" is strictly speaking an inaccurate designation. "Words"
`in the system also encompasses such things as numbers and abbre-
`viations, and would more properly be described as strings of characters.
`In this extended sense it is rarely possible to predict with certainty
`that a given string has no information content. Most systems for
`example exclude the string "A" on the basis that the upper case
`indefinite article is rarely essential to a search. This may be true,
`but it is not sufficient to justify the exclusion of the string "A" from
`the concordance since "A" does have meaning in some contexts,
`for example, the Australian abbreviation "A" followed immediately
`by "L" followed immediately by "R". In the United States "A" is
`itself an abbreviation for an important series of reports. It is, of
`course, immaterial that the abbreviation occurs only in some other
`jurisdiction if material from that jurisdiction can ever be reported
`in one's own.
`
`Semantics and syntax present further difficulties. A basic pro-
`blem of a semantic nature is that character strings may not denote
`concepts uniquely or exclusively. In many contexts the strings
`"minor", "infant", "child", "juvenile", "boy" and "girl" are equivalent,
`in others they are not. Conversely strings like "office", "bank", "safe",
`"deposit" and "flag" have more than one meaning. In the former
`case one of the problems is to think of all of the possible alternatives
`so as to include them in the search formulation, in the latter it is to
`think of them so as to draft the combination of classes in such a
`way as to exclude the unintended meanings. Given the presumptively
`inclusive range of search terms this can be extremely difficult, thus
`in one search of British material when seeking documents relevant
`to the Gas Board it was found that the string "Gas" was ambiguous
`because there had been an Indian litigant in one case of that name.
`To some extent these problems interact with each other, for example
`when in an effort to avoid the former problem so far as grammatical
`variants are concerned truncation is used, that is, specifying a word
`root followed by a special character to retrieve all strings commencing
`with that root, extra problems are created in relation to unanticipated
`homographs.
`
`005
`
`Facebook Ex. 1015
`
`

`
`Vol. 1 No. 2
`
`The Use of Citation Vectors for Legal
`Information Retrieval
`
`133
`
`Syntax creates problems because English often permits a wide
`range of word orders to convey similar meanings, so the effectiveness
`of positional logic to avoid problems is reduced. Thus the meaning
`of "man bites dog" cannot be distinguished from the meaning of
`"dog bites man" by the simple expedient of requiring the string
`"man" to precede "dog" in the document, since the sane meaning
`would be conveyed by "dog was bitten by man".
`
`A final difficulty which may be mentioned here is that the end
`product of a search on a system such as this is the specification of
`a number of cases from the database which satisfy the search criteria.
`In many systems the order of presentation of these cases is random
`and reflects only the internal organisation of the database; in others
`it reflects a crude judgment of potential interest, such as higher court
`before lower and most recent first. The essential point is that since
`degrees of relevance are not distinguished it is impossible to rank
`responses in order of relevance to the query.
`
`A number of expedients have been adopted to try to meet this
`last point. In some cases the basic Horty method has been retained
`for the selection of relevant responses, but then further computation
`has been carried out to indicate an order of relevance. Some such
`systems use a statistical algorithm operating upon the basis and
`comparison of string frequency in the documents retrieved and in
`the database as a whole. Others permit the lawyer to assign weights
`to search terms and by algorithms relating to these contrive an order
`of relevance. Another encourages users to multiply classes of search
`terms, and then ranks responses by reference to the number of classes
`represented in selected documents. All however suffer from the dis-
`advantage that lawyers rarely have enough understanding of the
`significance of the algorithm or the way in which it is likely to work
`to be able to use such aids successfully. It may be noted parenthe-
`tically at this point that Horty type systems tend to err on the side
`of over-retrieval, and the most effective way of homing in on the
`most relevant material is by interacting with the system so as to refine
`the search formulation in the light of the system's response to the
`previous formulation. This is clearly done most effectively by the
`person who understands the problem best, namely the lawyer who
`has himself identified and formulated the problem. For this reason
`systems in which lawyers themselves operate the system tend to give
`much better results than those in which the task is delegated to others.
`This constraint is thoroughly beneficial since it forces designers to
`create systems which are clearly thought out and very simple to operate.
`
`Largely because of the drawbacks of full text systems of the
`Horty type a totally different technique was tried by an American
`information scientist, Professor Gerard Salton. The aim of this system
`was essentially twofold. It wanted to overcome the difficulty in
`nominating search terms so precisely as to define relevant documents
`with complete precision, and as a desirable corollary it wanted to
`present results in order of relevance. The essence of the new method
`was to replace the technique of seeking a precise match for part of
`the document, typically words and phrases, by seeking instead for
`an approximate match for the whole document. The responses could
`
`006
`
`Facebook Ex. 1015
`
`

`
`134
`
`Journal of Law & Inf ormation Science
`
`(1982)
`
`then be presented in order of their approximation. This is a highly
`ingenious idea. In lay rather than mathematical terms it looks for
`the degree of overlap between the terminology of different documents.
`If for example the lawyer has a problem involving the escape of oil
`from an undersea pipeline caused by negligent dredging by a harbour
`authority it might be thought that a document which had a very high
`frequency of occurrence of the strings "escape", "oil", "undersea",
`"pipeline", "negligent", "dredging", "harbour" and "authority" might
`be more relevant than one which had a lower frequency of the use
`of such terms, or than one which had a high frequency of occurrence
`of the strings "escape", "oil", "pipeline" and "negligent", but made
`no reference to "dredging", "harbour" or "authority". It might be
`more of a problem to say intuitively which of the two above rejected
`alternatives was the more relevant. The way in which the technique
`operates is to consider each document as constituted by the different
`strings it contains, and the weight of those strings as being the frequency
`with which they occur in the document. It is then possible to use
`a mathematical algorithm to calculate the similarity between the two
`documents, on a scale running from zero when there is no overlapping
`of strings to one where both the assortment of strings and their res-
`pective weights are identical.
`
`Unfortunately this method also has its drawbacks in a legal
`context. As the example quoted above indicates it too relies upon
`the specification of target strings. It is true that the non-occurrence
`of some in a document searched may not be fatal, but the occurrence
`of different equivalents will inevitably be overlooked, and a document
`containing them may be decisively undervalued. Suppose in the con-
`text of the previous example that a case referred only to the "leakage"
`or "hydrocarbon products" from an "undersea conduit" as a result
`of the "reckless operations" of a "docks board". Homographs also
`continue to present a problem. Suppose a lawyer has a problem
`relating to the validity of a will. He will surely find that the string
`"will" occurs so frequently in so many documents as to mask those
`with which he is really most concerned. A related difficulty is that
`while words with plenty of common synonyms will tend to be less
`indicative of content than they deserve, words with no positive negation
`but which rely instead upon a negative operator will tend to be equally
`undeservediy more indicative. A further problem is that this system
`dispenses with syntax altogether, it simply regards a document as
`the strings it contains and the frequency of their occurrence, or in
`the jargon of information science, as a vector. If these vectors are
`identical then the documents are regarded as identical by ascribing
`the similarity or correlation value of one. Yet it is possible for
`documents while equivalent in this sense nevertheless because of their
`word order to be very different in meaning. Thus in this system
`"dog bites man" is regarded as the equivalent or "man bites dog"
`because both would be expressed within the system as the vector
`"bites" (1); "dog" (1); and "man" (1).
`
`A further, and practical, disadvantage of this technique is that
`it involves the system in an inordinate amount of computing. In the
`Horty system which involves only string recognition, Boolean operation
`and positional determination, the number of computations rises arith-
`
`007
`
`Facebook Ex. 1015
`
`

`
`Vol. 1 No. 2
`
`The Use of Citation Vectors for Legal
`inf ormation Retrieval
`
`135
`
`metically with the length of the document. The Salton system saves
`on Boolean operation and positional determination, but it then requires
`a series of mathematical operations rising geometrically with the
`length of the document, each repeated at least twice. If the further
`step is taken of establishing the similarities of all of the documents
`within the database then the number of times these computations
`must be performed is about half of the square of the number of
`documents within the collection.1
`
`It was against this state of the art that the possible substitution
`of citations for words in a Salton style system was first considered.
`This article wifi explain the theoretical basis for such a substitution,
`it will then describe the experiments which have been conducted to
`test the theory and their results, and will finally suggest ways in which
`the technique may be further developed.
`
`2. Theory of Citation Vectors
`The main difference between citation vectors and word based
`vector systems of the type pioneered by Salton is, as might be imagined,
`one of the content of the vectors. Whereas Salton characterised a
`document as a vector the elements of which were words and the
`weights of which were their frequency of occurrence, in a citation
`vector system the elements are citations and their weights could be
`their frequency, but could equally well represent a number of other
`parameters, as will be explained below.
`
`There are several reasons for choosing to represent legal docu-
`ments by citation vectors rather than by word based vectors. By far
`the most important of these is that the method corresponds well with
`the intuitive approach that lawyers have to legal research. If a lawyer
`knows that a particular case deals with his problem he is likely to
`use it in at least two ways. The first is to exploit it to investigate
`recent material which may not have got into the textbooks, and which
`his usually haphazard personal up-dating system may have missed.
`He simply scans the recent material for cases which may have cited
`the ease he knows. It has been discovered that many lawyers in the
`United States use the ordinary full text matching system LEXIS for
`just this purpose. It is ideal for this application since in a corn-
`puterised legal information retrieval system material becomes available
`on the computer much faster than it can be published in conventional
`hard copy form. It may also happen that the given case, if very
`recent, has itself undergone some change. This possibility led Lawyers'
`Co-Operative Publishing Company to devise, first for their own in-
`house use, then publicly, and finally in association with LEXIS, a
`special computerised system, AUTOCITE, for checking the accuracy
`and currency of citations. A second way of using citations to assist
`with research is to use the well-known case to indicate the relevant
`part of an encyclopaedia or textbook by reference to the Table of
`Cases. It is significant that no scholarly legal publication can ever
`afford to omit such a Table.
`
`1 In fact N(N-1)/2 where N is the number of documents in the database.
`
`008
`
`Facebook Ex. 1015
`
`

`
`136
`
`Journal of Law & inf ormation Science
`
`(1982)
`
`Because the use of citations as a research tool is so important
`in law it has been developed there far more thoroughly than in any
`other field of study. The United States partly because of its multi-
`plicity of jurisdictions, partly because of its federal character, partly
`because of the volume of litigation, partly because of the structure
`of the law, and partly because of the practices of the law reporters
`has by far the largest volume of case reporting of any common law
`jurisdiction. It is not surprising that this is reflected by the most
`far-reaching and sophisticated citation reference system ever devised.
`The most outstanding feature of this system is Shepard's Citator.
`It tracks the history of every reported decision in minute detail using
`an elaborate structure of codes, superscripts and subscripts to indicate
`the effects of the subsequent reference and the particular point in
`the cited case to which reference has been made. This magnificent
`work is the main resort of most American lawyers engaged on any
`piece of legal research. It may be asked why if Shepard is so useful
`is there any necessity to replace it with an automated system. The
`answer is that Shepard's greatest virtues, its comprehensiveness and
`its currency, are also its greatest weaknesses. It is cripplingly ex-
`pensive to buy and maintain if a thorough coverage is required: it
`is awkward to use because at least three, and sometimes four, separate
`volumes need to be consulted to trace the complete history of an
`early decision; it is hard to interpret because of the mass of detail
`crammed into a tiny space in coded form; and it becomes far too
`time-consuming to follow up all of the references to even one popular
`case, let alone the snowball effect of Shepardising all the references
`yielded by Shepard to the original point-of-entry case. It is believed
`that by the use of a computerised system of citation vectors the
`advantages of citation based research can be maintained, but in a
`very much more efficient way.
`
`Citations also have very substantial advantages over words as
`the elements in a vector based system. It will be recalled that among
`the disadvantages of word based systems were found to be the difficul -
`ties created by synonyms, grammatical variants, particularisations,
`generalisations and homographs. In all of these respects citation
`based systems have the advantage. The only sense in which a case
`citation can be said to have a synonym is in relation to parallel
`reports of the same decision. This is, however, very easily dealt with.
`Often the different citations appear in the text so that the appropriate
`one can be extracted, but more fundamentally such synonymity is
`fixed and invariable so it presents no problem to have a simple
`conversion table built into the system to permit automatic trans-
`formation into the chosen style. There are clearly no such things
`as grammatical variations, particularisations or generalisations. It is
`however worth mentioning a special problem with citations which
`relates to the page referencing. Ideally such references should contain
`two page numbers, that of the first page of the cited report and that
`of the page in it to which reference is made. Unfortunately while
`this sometimes occurs, it is not uncommon for one or other of the
`single page references to be used. In a well-developed system this
`feature could actually be turned to advantage because where there
`was a missing first page reference a simple table look-up technique
`should be able to yield the first page number. It might then be
`
`009
`
`Facebook Ex. 1015
`
`

`
`Vol. 1 No. 2
`
`The Use of Citation Vectors for Legal
`Inf ormation Retrieval
`
`137
`
`possible to exploit differences in the point of citation as possible
`parameters for element weighting in the correlation algorithm. If only
`the first page reference is given then it would require manual inter-
`vention, and perhaps even the exercise of delicate judgment to assign
`the page number of the point to which reference was made. It is,
`after all, sometimes a mystery to know how a particular citation
`supports the proposition for which it is prayed in aid. Case citations
`are not in the same sense as words capable of being homographs
`either. Each citation is exclusive. It always refers to the same report.
`It may however be ambiguous in the sense that the case may contain
`more than one discrete point, and thus the same citation may perfectly
`properly occur in two subsequent cases, each dealing with a totally
`different subject. This phenomenon poses difficult problems for a
`system based on citation vectors to handle as will be seen later.
`
`So far it has been assumed that citation vectors are confined to
`the citation of cases. While it is true that cases are much easier to
`handle, it is worth considering the possibility of including citations
`to statutory materials which are no less indicative of meaning than
`are citations to other cases. This consideration is, of course, confined
`to retrospective citation since it is highly uncommon for statutes to
`refer to cases explicitly. It is also worth considering statutory citations
`because there is a tiny proportion of cases which cite, and are cited
`by, no other cases, but which do cite statutes. It is a very rare case
`indeed which cites no authority of any sort to justify the decision
`arrived at. Statutes present two especially acute problems. The first
`is that there is such a wide variety of notation that it is extremely
`difficult to devise a framework into which all statutory provisions
`can be cast. This is not unconnected with the second problem which
`concerns the dissection of statutes into their component parts. A
`difficulty is to know when to stop. It is clear that if two cases merely
`cite the same statute they may have too little in common for it to
`be significant. In conventional information retrieval, here too following
`Horty's original technique, the statutory section is usually taken to
`be the minimum unit of reference. It is not clear that for the purposes
`of citation vectors significance should not be attached to lower levels
`of subordination. It is here that the link to the former problem
`becomes apparent. In some jurisdictions at some periods of history
`the fashion is to use long undivided sections, in others it is to divide
`statutes into very short self-contained sections, in still others it is to
`build into the statutory sections a complicated hierarchy of sub-
`sections and sub-sub-sections. It is not immediately obvious how
`all of this can be rendered into a sufficiently homogenous form for
`the operation of the algorithms essential to the success of the vector
`system.
`
`The most obvious difference between statutory and case materials
`is that statutory material is subject to subsequent amendment and
`modification in a formal way. This presents another formidable
`problem for the vector method. It is not uncommon for a statutory
`section to have its meaning changed quite substantially by such
`amendment. If the amended section is cited before and after the
`amendment should this count as the same citation in both cases?
`
`010
`
`Facebook Ex. 1015
`
`

`
`138
`
`Journal of Law & Information Science
`
`(1982)
`
`It would seem that it should not. In this case a further problem is
`posed for the design of an appropriate notation since it will no longer
`bear so close a resemblance to the conventional form, which by
`hypothesis is identical in both cases. The situation is made still
`worse in cases where the original version of the statutory section
`dealt with two distinct points. If it is amended so far as one of them
`is concerned, but left unchanged so far as the other is concerned
`then the system must break down. Whether the two versions are
`treated as the same or as different the system will be inaccurate in
`the case of citation for one point or the other. The only way out
`would be to make an artificial sub-division, and then to assign reference
`to one or the other by manual means, but this is impractical in a
`large working system.
`
`A major advantage which working with citations rather than
`words possesses is that citations have more, and more readily quanti-
`fiable, parameters than have words. In the case of words the only
`objectively quantifiable criterion is the frequency of occurrence. It is
`true that other values could be assigned manually, but it is hard to
`see how they could be controlled in any satisfactory way in a working
`system. Citations on the other hand are different. They do also
`have a dimension of frequency which cannot sensibly be ignored.
`If a case makes dozens of references to another that other is likely
`to be more significant for the citing case than another case which is
`cited only once. This however raises a problem which does not
`exist with words. It is not difficult to determine whether or not a
`word occurs, and to count the separate occurrences. With citations
`it is equally easy to discover whether a source has been cited. It is
`much less easy to know how many times it has been cited. Suppose
`a judge discusses a case extensively throughout his judgment, and
`because it has become familiar he ceases to repeat its name, and the
`reporter stops footnoting references to it by the standard coding.
`It is clear that such a case makes more reference to that case than
`to another which may be referred to in full form once at the beginning
`and once at the end of the judgment, perhaps in support of some
`quite parenthetic comment by the judge. It is clear therefore that
`the former case must count as having been cited more than once,
`but there is no obvious and objective criterion to establish just how
`often.
`
`Other possible parameters present no such thorny problems. One
`of the most obvious is age, in the sense of the time difference between
`the citing and cited case. This is clearly quantifiable, and about the
`only problem is to know whether to take the relevant date as the
`date of decision or the date of reporting, bearing in mind that this
`can occasionally be very substantial. Other possible parameters in-
`clude such things as the level within the hierarchy at which the case
`was decided, and the degree of remoteness of the jurisdiction. It is
`true that here the numerical value which may be assigned is not
`objectively determined, but the application of the standard is objective.
`Thus one may attach a high numerical value to a decision of the
`High Court. This value will be an arbitrary one, but its assignment
`will be objectively determined. Here the main problem is to reconcile
`
`011
`
`Facebook Ex. 1015
`
`

`
`Vol. 1 No. 2
`
`The Use of Citation Vectors for Legal
`Information Retrieval
`
`139
`
`different structures in different jurisdictions so as to make values
`homogenous. For example should the Australian Federal Court be
`given the same value as the Federal Courts of Appeals in the United
`States, or the same as the Federal District Courts? It is felt that,
`at least so far as the common law jurisdictions are concerned this
`should not be an impossible task, though even here historical change
`can present problems. Other possible parameters are more contro-
`versial. It is possible that one ought to pay some attention to the
`degree of approval or disapproval accorded by the citing case, whether
`it has followed, distinguished, overruled or doubted the previous case,
`for example. Such a judgment is made by Shepard, but it is extremely
`doubtful whether it is practical to control the ascertainment of this
`opinion, and in any event it too might be very much affected by the
`practices of different jurisdictions.
`
`For these reasons citation vectors appeared, in theory, to offer
`a useful supplement to full text matching systems for the retrieval
`of legal information, and to be much more promising than word based
`vector systems. The next step was to test the theory by experiment.
`
`3. Experiments with citation vectors
`In much the same way as it is wise to test a theory by experiment
`before putting it into practice, it is also wise to test an experimental
`design by a pilot project. An opportunity arose in 1975 to conduct
`such a pilot study of citation vectors at the Law School of the University
`of Stanford in California. The object of the pilot study was to see
`whether the practical implementation of a vector citation experiment
`would involve unforeseen difficulties. At that time there were well-
`established programmes for conducting vector based work, among
`them some interesting work conducted under the auspices of Dr. Bryan
`Niblett at the University of Kent. These programmes incorporated
`standard techniques for calculating the degree of similarity between
`vectors, and also of grouping the documents together into clusters on
`the basis of the similarity coefficients so established. The very first
`pilot experiments were conducted with the generous assistance of
`Dr. Niblett using the Kent programmes on some English data compiled
`at Standford. The data were taken from a volume of the English
`Criminal Appeal Reports, and included as vector elements only cited
`cases without any weighting. The first runs were evaluated on an
`intuitive basis and it seemed that prima facie the technique was
`grouping cases together in a sufficiently satisfactory manner to justify
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket