throbber
c12) United States Patent
`Page
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006285999Bl
`US 6,285,999 Bl
`Sep.4,2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METIIOD FOR NODE RANKING IN A
`LINKED DATABASE
`
`(75)
`
`Inventor: Lawrence Page, Stanford, CA (US)
`
`(73) Assignee: The Board of Trustees of the Leland
`Stanford Junior University, Stanford,
`CA(US).
`-
`-
`-
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/004,827
`
`(22) Filed:
`
`Jan. 9,1998
`
`Related U.S. Application Data
`(60) Provisional application No. 60/035,205, filed on Jan. 10,
`1997.
`Int. CJ.7
`...................................................... G06F 17/30
`(51)
`(52) U.S. CJ ..................................... 707/5; 707n; 707/501
`(58) Field of Search .................................... 707/100, 5, 7,
`707/513, 1-3, 10, 104, 501; 345/440; 382!226,
`229, 230, 231
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,953,106 • 8/1990 Gansner et al. .. ................... 345/440
`5,450,535 • 9/1995 North ................................... 395/140
`5/1998 Mauldin ............................... 395/610
`5,748,954
`5,752,241 • 5/1998 Cohen ...................................... 707/3
`5,832,494 * 11/1998 Egger et al .......................... 707/102
`5,848,407 • 12/1998 Ishikawa et al. ........................ 707/2
`6,014,678 •
`l/2000 Inoue et at. .......................... 707/501
`
`OTIIER PUBliCATIONS
`
`S. Jeromy Carriere et al, "Web Query: Searching and Visu(cid:173)
`alizing the Web through Connectivity", Computer Networks
`and ISDN Systems 29 (1997). pp. 1257-1267.*
`Wang et al"Prefetching in Worl Wide Web", IEEE 1996, pp.
`28-32.*
`Ramer et al "Similarity, Probability and Database Organi(cid:173)
`sation: Extended Abstract", 1996, pp. 272.276.*
`
`Craig Boyle "To link or not to link: An empirical comparison
`of Hypertext linking strategies". ACM 1992, pp. 221-231.*
`L. Katz, "A new status index derived from sociometric
`analysis," 1953, Psychometricka, vel. 18, pp. 39-43.
`C.H. Hubbell, "An input-<Jutput approach to clique identi(cid:173)
`fication sociometry," 1965, pp. 377-399.
`· Mi.Zrnclii -et aL, "Techil.iques fofaiSaggregatiiJ.g centiility - ·
`scores in social networks," 1996, Sociological Methodology,
`pp. 26-48.
`E. Garfield, "Citation analysis as a tool in journal evalua(cid:173)
`tion," 1972, Science, vel. 178, pp. 471-479.
`Pinski et al., "Citation influence for journal aggregates of
`scientific publications: Theory, with application to the lit(cid:173)
`erature of physics," 1976, Inf. Proc. And Management, vel.
`12, pp. 297-312.
`N. Geller, "On the citation influence methodology of Pinski
`and Narin," 1978, Inf. Proc. And Management, vol. 14, pp.
`93-95.
`P. Doreian, "Measuring the relative standing of disciplinary
`journals," 1988, Inf. Proc. And Management, vol. 24, pp.
`45-56.
`
`(List continued on next page.)
`
`Primary Examiner-Thomas Black
`Assistant Examiner-Vyen Le
`(74) Attorney, Agent, or Firm-Harrity & Snyder L.LP.
`
`(57)
`
`ABSTRACT
`
`A method assigns importance ranks to nodes in a linked
`database, such as any database of documents containing
`citations, the world wide web or any other hypermedia
`database. The rank assigned to a document is calculated
`from the ranks of documents citing it. In addition, the rank
`of a document is calculated from a constant representing the
`probability that a browser through the database will ran(cid:173)
`domly jump to the document. The method is particularly
`useful in enhancing the performance of search engine results
`for hypermedia databases, such as the world wide web,
`whose documents have a large variation in quality.
`
`29 Claims, 3 Drawing Sheets
`
`B
`0.2
`
`EXHIBIT 2086
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`

`

`US 6,285,999 Bl
`Page 2
`
`01HER PUBLICATIONS
`
`P. Doreian, "A measure of standing for citation networks
`within a wider environment," 1994, Inf. Proc. And Manage(cid:173)
`ment, vol. 30, pp. 21-31.
`Botafogo et al., "Structural analysis of hypertext: Identifying
`hierarchies and useful metrics," 1992, ACM Trans. Inc.
`Systems, vol. 10, pp. 142-180.
`Mark E. Frisse, "Searching for information in a hypertext
`medical handbook," 1988, Communications of the ACM,
`vol. 31, No. 7, pp. 880--886.
`Massimo Marchiori, "The quest for correct information on
`the Web: Hyper search engines," 1997, Computer Networks
`and ISDN Systems, vol. 29, No. 8-13, pp. 1225-1235.
`Oliver A McBryan, "GENVL and WWWW: Tools for
`taming the web," 1994, Proceedings of the first International
`Wold Wide Web Conference, pp. 1-13.
`
`Carriere et al., "WebQuery: Searching and visualizing the
`web through connectivity," 1997, Proc. 6'h International
`World Wide Web Conference, pp. 1-14.
`
`Arocena et al., "Applications of a web query language,"
`1997, Computer Networks and ISDN Systems, vol. 29, No.
`8-13, pp. 1305-1316.
`
`Jon M. Kleinberg, "Authoritative sources in a hyperlinked
`environment," 1998, Proc. Of the 9'h Annual ACM-SIAM
`Symposium on Discrete Algorithms, pp. 668-677.
`
`Henzinger et al., "Measuring index quality using random
`walks on the web", 1999, Pro c. of the 8'h International World
`Wide Web Conference, pp. 213-225.
`
`* cited by examiner
`
`

`

`U.S. Patent
`US. Patent
`
`Sep.4,2001
`Sep. 4, 2001
`
`Sheet 1 013
`Sheet 1 of 3
`
`US 6,285,999 Bl
`US 6,285,999 B1
`
`8
`
`c
`
`
`
`A
`
`
`
`FIG. 1
`FIG. 1
`
`

`

`U.S. Patent
`US. Patent
`
`Sep.4,2001
`Sep. 4, 2001
`
`Sheet 2 0f3
`Sheet 2 of 3
`
`US 6,285,999 Bl
`US 6,285,999 B1
`
`A
`0.4
`
`0.2
`
`B
`0.2
`
`
`
`c
`0.4
`
`FIG. 2
`FIG. 2
`
`

`

`U.S. Patent
`
`Sep.4,2001
`
`Sheet 3 of 3
`
`US 6,285,999 Bl
`
`101
`~--------------~--------------~~
`
`START
`
`+
`
`SELECT AN INITIAL N-DIMENSIONAL VECTOR p0
`
`103
`,,
`~--------------~--------------~~
`
`COMPUTE AN APPROXIMATION Pn TO A
`STEADY-STATE PROBABILITY Pco IN
`ACCORDANCE WITH THE EQUATION Pn = Anp0
`
`105
`,.
`~--------------~--------------~~
`
`DETERMINE A RANK r[k] FOR NODE k FROM A
`k1h COMPONENT OF Pn
`
`+
`
`DONE
`
`FIG. 3
`
`

`

`US 6,285,999 Bl
`
`1
`METHOD FOR NODE RANKING IN A
`LINKED DATABASE
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`This application claims priority from U.S. provisional
`patent application Ser. No. 60/035,205 filed Jan. 10, 1997,
`which is incorporated herein by reference.
`
`STATEMENT REGARDING GOVERNMENT
`SUPPORT
`
`This invention was supported in part by the National
`Science Foundation grant number IRI-9411306-4. The Gov(cid:173)
`ernment has certain rights in the invention.
`
`FIELD OF THE INVENTION
`
`This invention relates generally to techniques for analyz(cid:173)
`ing linked databases. More particularly, it relates to methods
`for assigning ranks to nodes in a linked database, such as any 20
`database of documents containing citations, the world wide
`web or any other hypermedia database.
`
`BACKGROUND OF THE INVENTION
`
`2
`Hyperlink Search Engine, developed by IDD Information
`Services, (http://rankdex.gari.com/) uses backlink informa(cid:173)
`tion (i.e., information from pages that contain links to the
`current page) to assist in identifying relevant web docu-
`5 ments. Rather than using the content of a document to
`determine relevance, the technique uses the anchor text of
`links to the document to characterize the relevance of a
`document. The idea of associating anchor text with the page
`the text points to was first implemented in the World Wide
`10 Web Worm (Oliver A McBryan, GENVL and WWWW:
`Tools for Taming the Web, First International Conference on
`the World Wide Web, CERN, Geneva, May 25-27, 1994).
`The Hyperlink Search Engine has applied this idea to assist
`in determining document relevance in a search. In particular,
`15 search query terms are compared to a collection of anchor
`text descriptions that point to the page, rather than to a
`keyword index of the page content. A rank is then assigned
`to a document based on the degree to which the search terms
`match the anchor descriptions in its backlink documents.
`The well known idea of citation counting is a simple
`method for determining the importance of a document by
`counting its number of citations, or backlinks. The citation
`rank r(A) of a document which has n backlink pages 1s
`simply
`r(A)=n.
`In the case of databases whose content is of relatively
`uniform quality and importance it is valid to assume that a
`highly cited document should be of greater interest than a
`document with only one or two citations. Many databases,
`30 however, have extreme variations in the quality and impor(cid:173)
`tance of documents. In these cases, citation ranking is overly
`simplistic. For example, citation ranking will give the same
`rank to a document that is cited once on an obscure page as
`to a similar document that is cited once on a well-known and
`35 highly respected page.
`
`25
`
`SUMMARY
`
`Various aspects of the present invention provide systems
`and methods for ranking documents in a linked database.
`One aspect provides an objective ranking based on the
`relationship between documents. Another aspect of the
`invention is directed to a technique for ranking documents
`within a database whose content has a large variation in
`quality and importance. Another aspect of the present inven(cid:173)
`tion is to provide a document ranking method that is scalable
`and can be applied to extremely large databases such as the
`world wide web. Additional aspects of the invention will
`become apparent in view of the following description and
`50 associated figures.
`One aspect of the present invention is directed to taking
`advantage of the linked structure of a database to assign a
`rank to each document in the database, where the document
`rank is a measure of the importance of a document. Rather
`55 than determining relevance only from the intrinsic content of
`a document, or from the anchor text of backlinks to the
`document, a method consistent with the invention deter(cid:173)
`mines importance from the extrinsic relationships between
`documents. Intuitively, a document should be important
`60 (regardless of its content) if it is highly cited by other
`documents. Not all citations, however, are necessarily of
`equal significance. A citation from an important document is
`more important than a citation from a relatively unimportant
`document. Thus, the importance of a page, and hence the
`65 rank assigned to it, should depend not just on the number of
`citations it has, but on the importance of the citing docu(cid:173)
`ments as well. This implies a recursive definition of rank: the
`
`Due to the developments in computer technology and its
`increase in popularity, large numbers of people have recently
`started to frequently search huge databases. For example,
`internet search engines are frequently used to search the
`entire world wide web. Currently, a popular search engine
`might execute over 30 million searches per day of the
`indexable part of the web, which has a size in excess of 500
`Gigabytes. Information retrieval systems are traditionally
`judged by their precision and recall. What is often neglected,
`however, is the quality of the results produced by these
`search engines. Large databases of documents such as the
`web contain many low quality documents. As a result,
`searches typically return hundreds of irrelevant or unwanted
`documents which camouflage the few relevant ones. In order
`to improve the selectivity of the results, common techniques 40
`allow the user to constrain the scope of the search to a
`specified subset of the database, or to provide additional
`search terms. These techniques are most effective in cases
`where the database is homogeneous and already classified
`into subsets, or in cases where the user is searching for well 45
`known and specific information. In other cases, however,
`these techniques are often not effective because each con(cid:173)
`straint introduced by the user increases the chances that the
`desired information will be inadvertently eliminated from
`the search results.
`Search engines presently use various techniques that
`attempt to present more relevant documents. Typically,
`documents are ranked according to variations of a standard
`vector space model. These variations could include (a) how
`recently the document was updated, and/or (b) how close the
`search terms are to the beginning of the document. Although
`this strategy provides search results that are better than with
`no ranking at all, the results still have relative! y low quality.
`Moreover, when searching the highly competitive web, this
`measure of relevancy is vulnerable to "spamming" tech(cid:173)
`niques that authors can use to artificially inflate their docu(cid:173)
`ment's relevance in order to draw attention to it or its
`advertisements. For this reason search results often contain
`commercial appeals that should not be considered a match to
`the query. Although search engines are designed to avoid
`such ruses, poorly conceived mechanisms can result m
`disappointing failures to retrieve desired information.
`
`

`

`US 6,285,999 Bl
`
`4
`ments B and C are pointers to document A. In this case we
`say that B and C are backlinks of A, and that A is a forward
`link of B and of C. Documents B and C also have other
`forward links to documents that are not shown.
`Although the ranking method of the present invention is
`superficially similar to the well known idea of citation
`counting, the present method is more subtle and complex
`than citation counting and gives far superior results. In a
`simple citation ranking, the rank of a document A which has
`n backlink pages is simply
`r(A)=n.
`According to one embodiment of the present method of
`ranking, the backlinks from different pages are weighted
`differently and the number of links on each page is normal(cid:173)
`ized. More precisely, the rank of a page A is defined
`according to the present invention as
`
`a:
`r(Bn))
`(r(BJ)
`r(A) = - + (1- a:) - - + · · · + - - ,
`IBII
`IBnl
`N
`
`20
`
`3
`rank of a document is a function of the ranks of the
`documents which cite it. The ranks of documents may be
`calculated by an iterative procedure on a linked database.
`Because citations, or links, are ways of directing
`attention, the important documents correspond to those 5
`documents to which the most attention is directed. Thus, a
`high rank indicates that a document is considered valuable
`by many people or by important people. Most likely, these
`are the pages to which someone performing a search would
`like to direct his or her attention. Looked at another way, the 10
`importance of a page is directly related to the steady-state
`probability that a random web surfer ends up at the page
`after following a large number of links. Because there is a
`larger probability that a surfer will end up at an important
`page than at an unimportant page, this method of ranking 15
`pages assigns higher ranks to the more important pages.
`In one aspect of the invention, a computer implemented
`method is provided for scoring linked database documents.
`The method comprises the steps of:
`obtaining a plurality of documents, at least some of the
`documents being linked documents, at least some of the
`documents being linking documents, and at least some
`of the documents being both linked documents and
`linking documents, each of the linked documents being 25
`pointed to by a link in one or more of the linking
`documents; assigning a score to each of the linked
`documents based on scores of the one or more linking
`documents; and processing the linked documents
`according to their scores.
`Additional aspects, applications and advantages will
`become apparent in view of the following description and
`associated figures.
`
`30
`
`where B1 , . . . , Bn are the backlink pages of A, r(B1 ), . . . ,
`r(Bn) are their ranks, IB 1 I, ... , IBnl are their numbers of
`forward links, and a is a constant in the interval [0,1], and
`N is the total number of pages in the web. This definition is
`clearly more complicated and subtle than the simple citation
`rank. Like the citation rank, this definition yields a page rank
`that increases as the number of backlinks increases. But the
`present method considers a citation from a highly ranked
`backlink as more important than a citation from a lowly
`ranked backlink (provided both citations come from back-
`link documents that have an equal number of forward links).
`In the present invention, it is possible, therefore, for a
`document with only one backlink (from a very highly ranked
`35 page) to have a higher rank than another document with
`many backlinks (from very low ranked pages). This is not
`the case with simple citation ranking.
`The ranks form a probability distribution over web pages,
`so that the sum of ranks over all web pages is unity. The rank
`40 of a page can be interpreted as the probability that a surfer
`will be at the page after following a large number of forward
`links. The constant a in the formula is interpreted as the
`probability that the web surfer will jump randomly to any
`web page instead of following a forward link. The page
`45 ranks for all the pages can be calculated using a simple
`iterative algorithm, and corresponds to the principal eigen(cid:173)
`vector of the normalized link matrix of the web, as will be
`discussed in more detail below.
`In order to illustrate the present method of ranking,
`50 consider the simple web of three documents shown in FIG.
`2. For simplicity of illustration, we assume in this example
`that r=O. Document A has a single backlink to document C,
`and this is the only forward link of document C, so
`r(A)=r(C).
`Document B has a single backlink to document A, but this
`is one of two forward links of document A, so
`r(B)=r(A)/2.
`Document C has two backlinks. One backlink is to
`document B, and this is the only forward link of document
`60 B. The other backlink is to document A via the other of the
`two forward links from A Thus
`r(C)=r(B)+r(A)/2.
`In this simple illustrative case we can see by inspection
`that r(A)=0.4, r(B)=0.2, and r(C)=0.4. Although a typical
`value for a is -0.1, if for simplicity we set a=0.5 (which
`corresponds to a 50% chance that a surfer will randomly
`jump to one of the three pages rather than following a
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of the relationship between three
`linked hypertext documents according to the invention.
`FIG. 2 is a diagram of a three-document web illustrating
`the rank associated with each document in accordance with
`the present invention.
`FIG. 3 is a flowchart of one embodiment of the invention.
`
`DETAILED DESCRIPTION
`
`Although the following detailed description contains
`many specifics for the purposes of illustration, anyone of
`ordinary skill in the art will appreciate that many variations
`and alterations to the following details are within the scope
`of the invention. Accordingly, the following embodiments of
`the invention are set forth without any loss of generality to,
`and without imposing limitations upon, the claimed inven(cid:173)
`tion. For support in reducing the present invention to
`practice, the inventor acknowledges Sergey Erin, Scott
`Hassan, Rajeev Motwani,Alan Steremberg, and Terry Wino(cid:173)
`grad.
`A linked database (i.e. any database of documents con(cid:173)
`taining mutual citations, such as the world wide web or other
`hypermedia archive, a dictionary or thesaurus, and a data(cid:173)
`base of academic articles, patents, or court cases) can be
`represented as a directed graph of N nodes, where each node
`corresponds to a web page document and where the directed
`connections between nodes correspond to links from one
`document to another. A given node has a set of forward links
`that connect it to children nodes, and a set of backward links
`that connect it to parent nodes. FIG. 1 shows a typical 65
`relationship between three hypertext documents A, B, and C.
`As shown in this particular figure, the first links in docu-
`
`55
`
`

`

`US 6,285,999 Bl
`
`6
`present invention. Because the rank of various different
`documents can vary by orders of magnitude, it is convenient
`to define a logarithmic rank
`
`5
`forward link), then the mathematical relationships between
`the ranks become more complicated. In particular, we then
`have
`r(A)=%+r(C)/2,
`r(B)=%+r(A)/4, and
`r(C)=%+r(A)/4+r(B)/2.
`The solution in this case is r(A)=1%9, r(B)=10/39, and
`r(C)=15f39.
`In practice, there are millions of documents and it is not
`possible to find the solution to a million equations by 10
`inspection. Accordingly, in the preferred embodiment a
`simple iterative procedure is used. As the initial state we
`may simply set all the ranks equal to 1/N. The formulas are
`then used to calculate a new set of ranks based on the
`existing ranks. In the case of millions of documents, suffi(cid:173)
`cient convergence typically takes on the order of 100 itera(cid:173)
`tions. It is not always necessary or even desirable, however,
`to calculate the rank of every page with high precision. Even
`approximate rank values, using two or more iterations, can
`provide very valuable, or even superior, information.
`The iteration process can be understood as a steady-state
`probability distribution calculated from a model of a random
`surfer. This model is mathematically equivalent to the expla(cid:173)
`nation described above, but provides a more direct and
`concise characterization of the procedure. The model 25
`includes (a) an initial N-dimensional probability distribution
`vector p0 where each component p0[i] gives the initial
`probability that a random surfer will start at a node i, and (b)
`an NxN transition probability matrix A where each compo(cid:173)
`nent A[i][j] gives the probability that the surfer will move 30
`from node i to node j. The probability distribution of the
`graph after the surfer follows one link is p1=Ap0, and after
`two links the probability distribution is p2 =Ap1=A2p0.
`Assuming this iteration converges, it will converge to a
`steady-state probability
`
`5
`
`which assigns a rank of 0 to the lowest ranked node and
`increases by 1 for each order of magnitude in importance
`higher than the lowest ranked node.
`"FIG. 3 shows one embodiment of a computer imple(cid:173)
`mented method for calculating an importance rank for N
`linked nodes of a linked database. At a step 101, an initial
`15 N-dimensional vector p0 is selected. An approximation Pn to
`a steady-state probability p= in accordance with the equation
`Pn=Anp 0 is computed at a step 103. Matrix A can be an NxN
`transition probability matrix having elements A[i][j] repre(cid:173)
`senting a probability of moving from node ito node j. At a
`20 step 105, a rank r[k] for node k from a k'h component of Pn
`is determined.".
`In one particular embodiment, a finite number of itera-
`tions are performed to approximate poo. The initial distri(cid:173)
`bution can be selected to be uniform or non-uniform. A
`uniform distribution would set each component of p0 equal
`to 1/N. A non-uniform distribution, for example, can divide
`the initial probability among a few nodes which are known
`a priori to have relatively large importance. This non(cid:173)
`uniform distribution decreases the number of iterations
`required to obtain a close approximation to poo and also is
`one way to reduce the effect of artificially inflating relevance
`by adding unrelated terms.
`In another particular embodiment, the transition matrix A
`is given by
`
`35
`
`a:
`A = N 1 + (1- a:)B,
`
`which is a dominant eigenvector of A The iteration circu- 40
`lates the probability through the linked nodes like energy
`flows through a circuit and accumulates in important places.
`Because pages with no links occur in significant numbers
`and bleed off energy, they cause some complication with
`computing the ranking. This complication is caused by the 45
`fact they can add huge amounts to the "random jump" factor.
`This, in turn, causes loops in the graph to be highly empha(cid:173)
`sized which is not generally a desirable property of the
`model. In order to address this problem, these childless
`pages can simply be removed from the model during the 50
`iterative stages, and added back in after the iteration is
`complete. After the childless pages are added back in,
`however, the same number of iterations that was required to
`remove them should be done to make sure they all receive
`a value. (Note that in order to ensure convergence, the norm 55
`of P; must be made equal to 1 after each iteration.) An
`alternate method to control the contribution of the childless
`nodes is to only estimate the steady state by iterating a small
`number of times.
`The rank r[i] of a node i can then be defined as a function 60
`of this steady-state probability distribution. For example, the
`rank can be defined simply by r[i]=poo[i]. This method of
`calculating rank is mathematically equivalent to the iterative
`method described first. Those skilled in the art will appre(cid:173)
`ciate that this same method can be characterized in various 65
`different ways that are mathematically equivalent. Such
`characterizations are obviously within the scope of the
`
`where 1 is an NxN matrix consisting of all 1s, a is the
`probability that a surfer will jump randomly to any one
`of the N nodes, and B is a matrix whose elements
`B[ i ][j] are given by
`
`B[i][J] =
`
`_1_ if node i points to node j
`n;
`0 otherwise,
`
`{
`
`where n; is the total number of forward links from node
`i. The (1-a) factor acts as a damping factor that
`limits the extent to which a document's rank can be
`inherited by children documents. This models the
`fact that users typically jump to a different place in
`the web after following a few links. The value of a
`is typically around 15%. Including this damping is
`important when many iterations are used to calculate
`the rank so that there is no artificial concentration of
`rank importance within loops of the web.
`Alternatively, one may set a=O and only iterate a few
`times in the calculation.
`Consistent with the present invention, there are several
`ways that this method can be adapted or altered for various
`purposes. As already mentioned above, rather than including
`the random linking probability a equally among all nodes,
`it can be divided in various ways among all the sites by
`changing the 1 matrix to another matrix. For example, it
`could be distributed so that a random jump takes the surfer
`
`

`

`US 6,285,999 Bl
`
`7
`to one of a few nodes that have a high importance, and will
`not take the surfer to any of the other nodes. This can be very
`effective in preventing deceptively tagged documents from
`receiving artificially inflated relevance. Alternatively, the
`random linking probability could be distributed so that
`random jumps do not happen from high importance nodes,
`and only happen from other nodes. This distribution would
`model a surfer who is more likely to make random jumps
`from unimportant sites and follow forward links from
`important sites. A modification to avoid drawing unwar(cid:173)
`ranted attention to pages with artificially inflated relevance
`is to ignore local links between documents and only consider
`links between separate domains. Because the links from
`other sites to the document are not directly under the control
`of a typical web site designer, it is then difficult for the
`designer to artificially inflate the ranking. A simpler
`approach is to weight links from pages contained on the
`same web server less than links from other servers. Also, in
`addition to servers, internet domains and any general mea(cid:173)
`sure of the distance between links could be used to deter(cid:173)
`mine such a weighting.
`Additional modifications can further improve the perfor(cid:173)
`mance of this method. Rank can be increased for documents
`whose backlinks are maintained by different institutions and
`authors in various geographic locations. Or it can be
`increased if links come from unusually important web
`locations such as the root page of a domain.
`Links can also be weighted by their relative importance
`within a document. For example, highly visible links that are
`near the top of a document can be given more weight. Also, 30
`links that are in large fonts or emphasized in other ways can
`be given more weight. In this way, the model better approxi(cid:173)
`mates human usage and authors' intentions. In many cases
`it is appropriate to assign higher value to links coming from
`pages that have been modified recently since such informa- 35
`tion is less likely to be obsolete.
`Various implementations of the invention have the advan(cid:173)
`tage that the convergence is very fast (a few hours using
`current processors) and it is much less expensive than
`building a full-text index. This speed allows the ranking to
`be customized or personalized for specific users. For
`example, a user's home page and/or bookmarks can be given
`a large initial importance, and/or a high probability of a
`random jump returning to it. This high rating essentially
`indicates to the system that the person's homepage and/or 45
`bookmarks does indeed contain subjects of importance that
`should be highly ranked. This procedure essentially trains
`the system to recognize pages related to the person's inter(cid:173)
`ests. The present method of determining the rank of a
`document can also be used to enhance the display of 50
`documents. In particular, each link in a document can be
`annotated with an icon, text, or other indicator of the rank of
`the document that each link points to. Anyone viewing the
`document can then easily see the relative importance of
`various links in the document.
`The present method of ranking documents in a database
`can also be useful for estimating the amount of attention any
`document receives on the web since it models human
`behavior when surfing the web. Estimating the importance
`of each backlink to a page can be useful for many purposes 60
`including site design, business arrangements with the
`backlinkers, and marketing. The effect of potential changes
`to the hypertext structure can be evaluated by adding them
`to the link structure and recomputing the ranking.
`Real usage data, when available, can be used as a starting 65
`point for the model and as the distribution for the alpha
`factor.
`
`8
`This can allow this ranking model to fill holes in the usage
`data, and provide a more accurate or comprehensive picture.
`Thus, although this method of ranking does not necessar(cid:173)
`ily match the actual traffic, it nevertheless measures the
`5 degree of exposure a document has throughout the web.
`Another important application and embodiment of the
`present invention is directed to enhancing the quality of
`results from web search engines. In this application of the
`present invention, a ranking method according to the inven-
`10 tion is integrated into a web search engine to produce results
`far superior to existing methods in quality and performance.
`A search engine employing a ranking method of the present
`invention provides automation while producing results com(cid:173)
`parable to a human maintained categorized system. In this
`approach, a web crawler explores the web and creates an
`15 index of the web content, as well as a directed graph of
`nodes corresponding to the structure of hyperlinks. The
`nodes of the graph (i.e. pages of the web) are then ranked
`according to importance as described above in connection
`with various exemplary embodiments of the present inven-
`20 tion.
`The search engine is used to locate documents that match
`the specified search criteria, either by searching full text, or
`by searching titles only. In addition, the search can include
`the anchor text associated with backlinks to the page. This
`25 approach has several advantages in this context. First,
`anchors often provide more accurate descriptions of web
`pages than the pages themselves. Second, anchors may exist
`for images, programs, and other objects that cannot be
`indexed by a text-based search engine. This also makes it
`possible to return web pages which have not actually been
`crawled. In addition, the engine can compare the search
`terms with a list of its backlink document titles. Thus, even
`though the text of the document itself may not match the
`search terms, if the document is cited by documents whose
`titles or backlink anchor text match the search terms, the
`document will be considered a match. In addition to or
`instead of the anchor text, the text in the immediate vicinity
`of the backlink anchor text can also be compared to the
`search terms in order to improve th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket