throbber
The PageRank Citation Ranking:
`Bringing Order to the Web
`
`January 29, 1998
`
`Abstract
`
`The importance of a Web page is an inherently subjective matter, which depends on the
`readers interests, knowledge and attitudes. But there is still much that can be said objectively
`about the relative importance of Web pages. This paper describes PageRank, a method for
`rating Web pages objectively and mechanically, effectively measuring the human interest· and
`attention devoted to them.
`We compare PageRank to an idealized random Web surfer. We show how to efficiently
`compute PageRank for large numbers of pages. And, we show how to apply PageRank to search
`and to user navigation.
`
`1
`
`Introduction and Motivation
`
`The World Wide Web creates many new challenges for information retrieval. It is very large and
`heterogeneous. Current estimates are that there are over 150 million web pages with a doubling
`life of less than one year. More importantly, the web pages are extremely diverse, ranging from
`"What is Joe having for lunch today?" to journals about information retrieval. In addition to these
`major challenges, search engines on the Web must also contend with inexperienced users and pages
`engineered to manipulate search engine ranking functions.
`However, unlike "fiat" document collections, the World Wide Web is hypertext and provides
`considerable aUxiliary information on top of the text of the web pages, such as link structure and
`link text. In this paper, we take advantage of the link structure of the Web to produce a global
`"importance" ranking of every web page. This ranking, called PageRa.nk, helps search engines and
`users quickly make sense of the vast heterogeneity of the World Wide Web.
`
`1.1 Diversity of Web Pages
`
`Although there is already a large literature on academic citation analysis, there are a number
`of significant differences between web pages and academic publications. Unlike academic papers
`which are scrupulously reviewed, web pages proliferate free of quality control or publishing costs.
`With a simple program, huge numbers of pages can be created easily, artificially inflating citation
`counts. Because the Web environment contains competing profit seeking ventures, attention getting
`strategies evolve in response to search engine algorithms. For this reason, any evaluation strategy
`which counts replicable features of web pages is prone to manipulation. Further, .academic papers
`are well defined units of work, roughly similar in quality and number of citations, as well as in
`their purpose - to extend the body of knowledge. Web pages vary on a much wider scale than
`academic papers in quality, usage, citations, and length. A random archived message posting
`
`1
`
`EXHIBIT 2054
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`

`asking an obscure question about an 113:\1 computer is very different from the II3M home page. A
`research article about the effects of cellular phone use on driver attention is very different from an
`advertisement for a pmticular cellular provider. The average web page quality experienced by a
`user is higher than the quality of the average web page. This is because the simplicity of creating
`and publishing web pages results in a large fraction of low quality web pages that users are unlikely
`t.o read.
`There are many axes along which web pages may be differentiated. In this paper, we deal
`primarily ·with one - an approximation of the overall relative importance of web pages.
`
`1.2 PageRank
`
`In order to measure the relative importance of web pages, we propose PageRank, a method for
`computing a ranking for every web page based on the graph of the web. PageRank has applications
`in search, browsing, and traffic estimation.
`Section 2 gives a mathematical description of PageR.ank and provides some intuitive justifi(cid:173)
`cation. In Section 3, we show how we efficiently compute PageRank for as many as 518 million
`hyperlinkR. To test the utility of PageRank for search, we built a. web Rea.n:h engine called Google
`(Section 5). V·./e also demonstrate how PageRank can be used as a browsing aid in Section 7.3.
`
`2 A Ranking for Every Page on the Web
`
`2.1 Related Work
`
`There has been a great deal of work on academic eitation analysis [Gar95]. Coffman [Go£71] ha.•;;
`published an interesting theory of how infonnation flow in a scientific community is an epidemic
`process.
`There has been a fair amount of recent activity on how to exploit the link structure of large
`hypertext systems such as the web. Pitkow recently completed his Ph.D. thesis on "Characterizing
`World Wide Web Ecologies" [Pit97, PPR96] with a wide variety of link based analysis. Weiss
`discuss clustering methods that take the link structure into account [WVS+96]. Spertus [Spe97]
`discusses information that can be obtained from the link struclure for a varicly of applications.
`Good visuali,;ation demands added structure on the hypertext and is discussed in [MFH95, MF95].
`Recently, Kleinberg [Kle98] has developed an interesting model of the web as Hubs and Authorities,
`based on an eigenvector calculation on the co-citation matrix of the web.
`Finally, there ha .. o;; been some interest in what "quality" means on the net from a library com(cid:173)
`munity [Til].
`It is obvious to try to apply standard citation analysis techniques to the web's hypertextual
`citation structure. One can simply think of every link as being like an academic citation. So,
`a major page like http:/ /www.yahoo.com/ will have tens of thousands of backlinks (or citations)
`pointing to it.
`This fact that the Yahoo horne page has so many backlinks generally imply that it is quite
`imporlanL. Indeed, many of the web search engines have used backlink counl as a way Lo try to bias
`their databases in favor of higher quality or 1nore important pages. However, simple backlink eounts
`have a number of problems on the web. Some of these problems have to do with characteristics of
`the web which are not present in normal academic citation databa~es.
`
`2
`
`

`

`2.2 Link Structure of the Web
`
`While estimates vary, the current graph of the crawlable Web has roughly 150 million nodes (pages)
`and 1.7 billion edges (links). Every page has some number of forward links (outedges) and backlinks
`(inedges) (see Figure 1). We can never know whether we have found all the backlinks of a particular
`page but if we have downloaded it7 we know all of its forward links at that time.
`
`A
`
` A :
`
`\s
`
`C
`
`
`
`C
`
`y’
`B
`
`
`
`Figure 1: A and B are Backlinks of C
`
`Web pages vary greatly in terms of the number of backlinks they have. For example7 the
`Netscape home page has 62.804 backlinks in our current database compared to most pages which
`have just a few backlinks. Generally, highly linked pages are more “important” than pages with
`few links. Simple citation counting has been used to speculate on the future winners of the Nobel
`
`Prize [San95]. PageRank provides a more sophisticated method for doing citation counting.
`The reason that PageRank is interesting is that there are many cases where simple Citation
`counting does not correspond to our common sense notion of importance. For example7 if a web
`page has a link off the Yahoo home page, it may be just one link but it is a very important one.
`This page should be ranked higher than many pages with more links but from obscure places.
`PageRank is an attempt to see how good an approximation to “importance” can be obtained just
`from the link structure.
`
`2.3 Propagation of Ranking Through Links
`
`Based on the discussion above, we give the following intuitive description of PageRank: a page has
`high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page
`has many backlinks and when a page has a few highly ranked backlinks.
`
`2.4 Definition of PageRank
`
`Let u be a web page. Then let F“ be the set of pages u points to and Bu be the set of pages that
`
`point to u. Let N = |Fu| be the number of links from u and let 0 be a factor used for normalization
`(so that the total rank of all web pages is constant).
`We begin by defining a simple ranking, R which is a slightly simplified version of PageRank:
`
`
`
`

`

`
`
`53
`
`
`
`
`
`100
`
`
`
`9
`
`
`
`
`
`50
`3
`
`50
`
`
`
`50
`
`
`
`
`
`3
`3
`
`Figure 2: Simplified PageRank Calculation
`
`This formalizes the intuition in the previous section. Note that the rank of a page is divided
`among its forward links evenly to contribute to the ranks of the pages they point to. Note that
`c < 1 because there are a number of pages with no forward links and their weight is lost from the
`system (see section 2.7). The equation is recursive but it may be computed by starting with any set
`of ranks and iterating the computation until it converges. Figure 2 demonstrates the propagation
`of rank from one pair of pages to another. Figure 3 shows a consistent steady state solution for a
`set of pages.
`Stated another way, let A be a square matrix with the rows and column corresponding to web
`pages. Let AW} : l/Nu if there is an edge from a to v and AW. : 0 if not.
`If we treat R as a
`vector over web pages7 then we have R 2 CAR. So R is an eigenvector of A with eigenvalue 0. In
`fact, we want the dominant eigenvector of A. It may be computed by repeatedly applying A to any
`nondegenerate start vector.
`There is a small problem with this simplified ranking function. Consider two web pages that
`point to each other but to no other page. And suppose there is some web page which points to
`one of them. Then, during iteration, this loop will accumulate rank but never distribute any rank
`(since there are no outedges). The loop forms a sort of trap which we call a rank sink.
`To overcome this problem of rank sinks, we introduce a rank source:
`
`Definition 1 Let E(u) be some vector over the Web pages that corresponds to a source of rank.
`Then, the PageRank of a set of Web pages is an assignment, R’,
`to the Web pages which satisfies
`
`
`
`R'(a) : e 2 R131?) + cE(a)
`
`’UEBu
`
`(1)
`
`such that c is maximized and ||R’||1 : 1 (llR’ll1 denotes the L1 norm of R’).
`
`

`

`
`
`
`
`
`
`B0
`
`.2
`
`
`
`
`
`0.2
`
`0.2
`
`
`
`C0
`
`.4
`
`
`
`
`
`A0
`
`.4
`
`
`
`
`
`
`
`0.2
`
`0.4
`
`Figure 3: Simplified PageRank Calculation
`
`
`
`
`
`
`
`
`
`∞
`
`
`
`
`
`∞
`
`∞
`
`Figure 4: Loop Which Acts as a Rank Sink
`
`where E(u) is some vector over the web pages that corresponds to a source of rank (see Sec-
`tion 6). Note that if E is all positive, c must be reduced to balance the equation. Therefore, this
`technique corresponds to a decay factor.
`In matrix notation we have R’ : C(AR' + E). Since
`||R’ H1 2 17 we can rewrite this as R’ : c(A + E x 1)R’ where 1 is the vector consisting of all ones.
`So, R’ is an eigenvector of (A + E X 1).
`
`2.5 Random Surfer Model
`
`The definition of PageRank above has another intuitive basis in random walks on graphs. The
`simplified version corresponds to the standing probability distribution of a random walk on the
`graph of the Web.
`Intuitively, this can be thought of as modeling the behavior of a “random
`surfer”. The “random surfer” simply keeps clicking on successive links at random. However, if a
`real Web surfer ever gets into a small loop of web pages7 it is unlikely that the surfer will continue
`in the loop forever. Instead, the surfer will jump to some other page. The additional factor E can
`be viewed as a way of modeling this behavior: the surfer periodically “gets bored” and jumps to a
`
`

`

`random page chosen based on the distribution in E.
`So far we have left E as a user defined parameter. In most tests we let E be uniform over all
`web pageR with value u. However, in Section 6 we shnw how different values of E ean generate
`"customized" page ranks.
`
`2.6 Computing PageRank
`
`The computation of PageRank is fairly straightforward if we ignore the issues of scale. Let S be
`almost any vector over \Veb pages (for example E). Then PagcRank may be computed as follows:
`Ro +-- s
`loop:
`~+I +-- ARi
`ll~ll 1 -11RH1II1
`d +--
`+-- Ri+l +dE
`fti+l
`IIRi+l- Rill I
`8 +--
`while 15 > E
`
`Kote that the d factor increases the rate of convergence aiHlmaintains IIR[[ 1 . An alternative
`normalization is to multiply R by the appropriate factor. The use of d may have a small impact
`on the influence of E.
`
`2. 7 Dangling Links
`
`One issue with this model is dangling links. Dangling links arc simply links that point to any page
`with no outgoing links. They affect the model because it is not clear where their weight should
`he distributed, and there arc a large number of them. Often these dangling linkR arc simply pages
`that we have not downloaded yet, since it is hard to sample the entire web (in our 24 million pages
`currently downloaded, we have 51 million URLs not downloaded yet, and hence dangling). Because
`dangling links do not alfecl. the ranking of any other page directly, we simply remove Lhem from
`the system until all the PageRanks are calculated. After all the PageRanks are calculated, they
`can be added back in, without affecting things significantly. Notice the normalization of the other
`links on the same page as a link which was removed will change slightly, but this should not have
`a large cii'ccL.
`
`3
`
`Implementation
`
`As part of the Stanford Webl3ase project. [Pl398], we have built a complete crawling and indexing
`system with a current repository of 24 million web pages. Any web crawler needs to keep a database
`of URLs so it can discover all the URLs on the web. To implement PageRank, the web crawler
`simply needs to build an index of links as it crawls. While a simple task, it is non-trivial because
`of the huge volumes involved. For example, to index our current 24 million page database in about
`five days, we need to process about 50 web pages per second. Since there about about 11 links on
`an average page (depending on what yon connt as a link) we need to process 550 links per second.
`Also, our datab<llie of 24 million pages references over 75 million unique CRLs which each link must
`be compared against.
`
`6
`
`

`

`Much time has been spent making the system resilient in the face of many deeply and intricately
`flawed web artifact~. There exi~t infinitely large ~ite~, page~. and even URL~. A large fraction of
`web pages have incorrect HT:\1L, making parser design difficult. Messy heuristics arc used to help
`the crawling process. For example, we do not crawl URLs with / cgi-bin/ in them. Of course it
`is impossible to get a correct sample of the ''entire web" since it is always changing. Sites are
`sometimes down, and some people decide t.o not. allow their sites Lobe indexed. Despite all this, we
`believe we have a re<:1sonable representation of the actual link structure of publidy accessible web.
`
`3.1 PageRank Implementation
`
`We convert each URL into a unique integer, and store each hypcrlink in a database using the integer
`IDs to identify page~. Details of our implementation are in [PB98]. In general, we have implemented
`PageRank in the following manner. First we sort the link structure by Parent ID. Then dangling
`links arc removed from the link database for reasons discussed above (a few iterations removes the
`vast majority of the dangling links). We need to make an initial assignment of the ranks. This
`assignment can be made by one of several strategies. If it is going to iterate until convergence, in
`general the initial values will not affect final values, just the rate of convergence. Tint we can speed
`up convergence by choosing a good initial assignment. '0lc believe that careful choice of the initial
`assignment and a small finite number of iterations may result in excellent or improved performance.
`Memory is allocated for the weights for every page. Since >ve usc single precision floating point
`values at 4 bytes each, this amounts to :300 megabytes for our 70 million URLs. Tf insufficient
`RAM is available to hold all the weights, multiple passes can be made (our implementation uses
`half as much memory and two passes). The weights from the current time step are kept in memory,
`and the previous weights are accessed linearly on disk. Also, all the access to the link database,
`A, is linear because it is sorted. Therefore, A can be kept on disk as well. Although these data
`structures arc very large, linear disk access allows each iteration to be completed in about (j minutes
`on a typical workstation. After the weights have converged, we add the dangling links back in and
`Iw:ornpnte the rankings. Kote after adding the dangling links back in, we need to iterate c1S many
`times as was required to remove the dangling links. Otherwise, some of the dangling links will have
`a ;7,ero weight. This whole process takes about five hours in the current implementation. ·with less
`strict convergence criteria, and more optimi<::ation, the calculation could be much faster. Or, more
`efficient techniques for estimating eigenvectors could be used to improve performance. However, it
`should be noted that the cost required to compute the PagcR.ank is insignificant compared to the
`cost required to build a full text index.
`
`4 Convergence Properties
`
`As can be seen from the graph in Figure 4 PageRank on a large :322 million link database converges
`to a rea1'10nahle tnleranr:e in roughly .12 iterationR. The convergence on half the data takes roughly
`45 iterations. This graph suggests that PageRank will scale very well even for extremely large
`collections as the scaling factor is roughly linear in log n.
`One of the interesting ramifications of the fact that the PagcRank calculation converges rapidly
`is that the web is an expander-like graph. To understand this better, we give a brief overview of
`the theory of random walks on graphs; refer to Motwani-Raghavan [MR95] for further details. A
`random walk on a gra.ph is a stochastic process where at any given time step we are at a particular
`node of the graph and choose an outedgc uniformly at random to determine the node to visit at
`the next tirue ~tep. A graph i~ said to be an expander if it is the case that every (not too large)
`subset of nodes S ha..<; a neighborhood (set of vertices accessible via outcdgcs emanating from nodes
`
`7
`
`

`

`Convergence of PageRank Computation
`322 Million Links
`161 Million Links
`
`100000000
`
`10000000
`
`1000000
`
`100000
`
`10000
`
`1000
`
`100
`
`Total Difference from Previous Iteration
`
`10
`
`0
`
`7.5
`
`15
`
`30
`22.5
`Number of Iterations
`
`37.5
`
`45
`
`52.5
`
`

`

`The benefits of Pa.geit1.uk a.re the grea.kst for urulcrspceified queri<~s. For example, a query
`for ''SLan ford U ni versit.v" may •·et.urn any number of web pages w l1 ich mention St.an l(.)l'd (such as
`publication list.s) on a c:onvnnt.iona.l sea.reh engine, hut. using Pa.geR;J.nk, the university horne page
`is listed first.
`
`5.1 Title Search
`
`To test the usefulness of PageRank for search we implemented a search engine that used only the
`titles of 16 million web pages. To ::mswer a query, the search engil1e fimls a.ll the web page~ whose
`titles contain all of the query word~. Then it sort,s Uw results hy PageRank. This Heard• en~ine
`is very simple and cheap Lo implemem. In informal LesLs, iL worked remarkably well. As can be
`s<•en in Figure 6, a ~mreh f(Jr "University" yi<•lds a. list of top univcr~itic~. This figur<• shows our
`~1ultiQuery system which allows a ns<~r to <!um·y two sea.reh engines at the same time. The search
`engine on the left is ou•· PageRank based Litle sea•·ch engine. The ba.1· gra.phs and pe•·cenl.ages sl10wn
`arc a Jog of the actual PageR.ank with t.hc top pa~c nonua.li:ted t.o 100%, uot a. percentile which is
`used eve•·ywl1ere else in Lhis paper. The sea•·ch engine on t.he right. is A llavisLa. You can see t.hat
`Altavista returns random lookin~ weh pages that match the query "University'' and are the root.
`page of the server (AJLavisLa seems Lo be usiug URL length as a quaJiL,v heur.isL.ic).
`
`Multi Search !university
`
`[ Semh ) Next' [nationol ROiks J
`
`~
`
`{}
`
`~ {}
`Qptical Ph~ics at the Un.iver5itt of OregQ.S
`Oregon Cen1er for Optics in Science and Technology. Department of
`Physics, UniveiSity of Oregon, Eugene OR 97403. Resemh Groups:
`Cannichael Group ..
`A..f1JJ...f!l.?J!.t!l::b. llt.V~C't:Vl. NtV -:i'i .. ~ 1K -IIi~· 96
`
`Camegie Mellon Un.iver5itt - Campus Netvor~
`Departments. Da1a Conununications. Da1a Conununications is
`responsible for installing and malll1allrlng all on campus networking
`equipment and all of ..
`A..ITJ!..#~JkUlWJlledtV -:i'i .. ~ .fK -19A~T9$
`
`Wesle::n:;Q Universitt Computer Science Group Home P!,g~
`Compu1er Science Group. Wesleyan University. Welcome to the home
`page of the Compu1er Science Group at Wesleyan University. We are
`administratively within.
`,0_/l]!ii'W'W'IY. <:<. '1/l!<k,!illil. edt!/ ·>"i.'> J'f( ·IS ll;r'/0
`
`,;;;;:
`
`I·"'·
`
`Keio Uni9'ersi!Y. Shonan l'ujisava Cam~u.s (Sl'C).
`B$3$N%Z !EFnPUBt%-%c%e%Q%9 (B(SFC) $B$N (BWWW $B9!
`$BCmDU=q$· (B $B$rFI$s$G$1$@$5$$ !U (B. Nihongo [English.
`SFC $B>pJs (B. [ $B%o.%G%11%"%;%e%?! • ..
`A..11J!..#~;Ui:.b>.k?.,i(':..~-:i'i.~SK-SF~J9,':'
`
`School of Chemis!IY., Uni9'ersi!Y. of SY.!!neY.
`The School of Chemistry. School of Chemistry, UniveiSity of Sydney,
`NSW 2006 Australia lntemationol Phone: +61·2·9351·4504 Fax:
`•61·2·9351·3329 Australia.
`A..ITJ!..#~ l~..W .• "fll t13. ,itV -:i'i.~ .fK -J'S F~J 9,':'
`
`Mankato State Universitt
`The Campus Athletics, Campus Tour, Bookstore, Maps, Current
`Events ... Admission & Registration Admissions, Financial Aid,
`Registrar's, Graduale ..
`A..ITJ!..#~m..'tl!bt.t?.JJJ.."~t«NtV -:i'i.~!>'K-J',-:<MJ,..96
`
`St. Ambrose Universitt
`Malll. Index: Academic Departments. Administrative Services. Campus
`News. Computing Services. Galvin Pine Arts Cenler. lnlemet
`Connections. Librazy ..
`A..ITJ!..#~-~~lNtV-:i'i.~JW-.fF~J9,':'
`
`Universitv of Washinvton ECSEL Pro'ects
`~ ¢
`I
`
`~
`
`9
`~>a? 0
`
`Query: university
`II Results Returned
`Showing Results From 0 to I 0
`
`Stollford UniveiSi!J( Homep§g~
`
`?4.?9% ~ - ~"SVtm - Q.M.J:»Y.-:<
`
`Stollford UniveiSi!J(: Portfolio Collection
`
`65.?8%
`
`'*" -~"S9tm - Q.MAAY.-:<
`
`UniveiSi!J( of Illinois at Urbona·Chomp§ign
`
`?3.26%
`
`1SK - /J}~W - Q,M.J:»Y.-:<
`
`68.38%
`
`tk - ~~~w - Qht.J$Y.-:<
`
`UniversiJ:Y. of California, Irvine
`
`68.0?%
`
`Jlf" • /J}~W • QMAAY.-:<
`
`UniversiJ:Y. of Milmeso1a
`
`6?.05%
`
`Of" - fJ}'f/iiW - QMMW.-:<
`
`Iowa Stale UniversiJ:Y. Homep.§g~
`
`66.66%
`
`SK - /J}'f&W - QMAAY.-:<
`
`The UniveiSi!J( of Mic!ljgm
`
`66.35%
`
`tk - ~'SVtm - QMMW.-:<
`
`Mississippi Stale UniversiJ:Y.
`
`66.35%
`
`SK - ~'SV!m - QMAAY.-:<
`
`Northwestern UniversiJ:Y.: NUinfo
`
`66.15%
`
`SK - /J}'f4W - Qht.J$Y.-:<
`
`OPVI 10
`v /1-.2)
`
`- http:llwww.stWonl.td\l/
`- http:llwww.stWonl.tdWhom.tiWD.iDistl').liolllpol1folio.html
`- http:llwww:uiut.td\1/
`Indiana UniversiJ:Y. - http:llwww.~.tdW
`- http:llwww.v.ei.tdW
`- http :llwww :u.m. udW
`- http:llwww.Wt$lt.tdW
`- http:llwww.WJJ.ieh.tdW
`- http:llwww.m.sst$.lt .tdW
`- http:llwww.nwu.tdW
`
`Fignn! 6: Comparison of Query i(lr "Cnivcrsity"
`
`9
`
`

`

`vVeb Page
`Download Netscape Software
`http:/ /www.w3.org/
`vVdcome to Netscape
`Point: It's Vnmt You're Searching For
`vVeb-Counter Home Page
`The Blue Ribbon Campaign for Online Free Speech
`CERN \Vekmne
`Yahoo!
`vVelcorne to Netscape
`Wusage 4.1: A Usage Statistics System For Web Servers
`The World \Vide Web Consortium (W3C)
`Lycos, Inc. Home Page
`St<uting Point
`Welcome to Magellan!
`Omde Corporation
`
`PageRank (average i~ 1.0)
`11589.00
`10717.70
`867:1.51
`7930.92
`7254.97
`7010.39
`6.562.49
`6561.80
`6203.47
`5963.27
`5672.21
`4683.31
`4501.98
`:~866.82
`3587.63
`
`Table 1: Top 15 Page Ranks: July 1996
`
`5.2 Rank Merging
`
`The reason LhaL the Litle based PageRank system works so well is thaL the Litle match ensures
`high precision, and the PageRank ensures high quality. vVhen matching a query like "University"
`on the web, recall is not very important because there is far more than a user can look at. For
`more specific searches where recall is more important, the traditional information retrieval scores
`over full-text and the PagcRank should be combined. Our Googlc system docs this Lypc of rank
`merging. Rank merging is known to be a very difficult problenL and we need to spend considerable
`additional effort before we will be able to do a reasonable evaluation of these types of queries.
`However, we do believe that using PageRank as a factor in these queries is quite beneficial.
`
`5.3 Some Sample Results
`
`We have experimented c:onsiderably with Google, a full-text search engine which n~e~ PageRank.
`While a full-scale user study is beyond the :::;cope of this paper, we provide a sample query in
`Appendix A. For more queries, we encourage the reader to test Google themselves [BPJ.
`Table 1 shows the top 15 pages based on PageRank. This particular listing was generated in
`July 1996. Tn a more recent cakulation of PageRank, Microsoft has just edged out 1\etscape for
`the highest PageRank.
`
`5.4 Common Case
`
`One of the design goals of PageRank was to handle the common case for queries well. For example,
`a user searched for "wolverine", remembering that the "C niversity of ).1ichigan systen1 used for all
`administrative functions by student:::; was called something with a wolverine in it. Our PageR.ank
`based title search system returned the answer "vVolverine Access" as the first reRnlt. This is sensible
`since all the student:::; regularly use the vVolverine Acce:::;s system, and a random user i:::; quite likely
`to be looking for it given the query "wolverine". The fact that the vVolverine Access site is a good
`common case is noL contained in lhc HTML of lhc page. Even if there were a way of ddining good
`
`10
`
`

`

`meta-information of this form within a page, it would be problematic since a page author could
`not be trusted with this kind of evaluation. Many web page authors would simply claim that their
`pages were all the best ami most uRcd on the web.
`It is important to note that the goal of finding a site that contains a great deal of information
`about wolverines is a very different task than finding the connnon case wolverine site. There is an
`interesting system [Mar97]that at.LemptR to Lind RiteR that discusR a topic in detail by propagating
`the textual matching score through the link structure of the web. It then tries to return the page
`on the most central path. This results in good results for queries like "flower"; the system will
`return good navigation pages from sites that deal with the topic of flowers in detail. Contrast that
`with the connnon case approach which might simply return a cmmnonly used connnercial site that
`had little information except how to buy flowers. It is our opinion that both of these tasks are
`important, and a general purpose \veb search engine should return results which fulfill the needs
`of both of these tasks automatically. In this paper, we are concentrating only on the common case
`approach.
`
`5.5 Subcomponents of Common Case
`
`It is instructive to consider what kind of common case scenarios PageRank can help represent.
`Besides a page which has a high usage, like the \Volverine Access cite, PageRank can also represent
`a collaborative notion of authority or trust. Por example, a user might prefer a news story simply
`because it is linked is linked directly from the :'-Jew York Times home page. Of course sm:h a story
`will receive quite a high PagcRank simply because it is mentioned by a very important page. This
`seems to capture a kind of collaborative trust, since if a page was mentioned by a trustworthy
`or authoritative source, it is more likely to be trustworthy or authoritative. Similarly, quality or
`importance seems to fit within this kind of circular definition.
`
`6 Personalized PageRank
`
`An important component of the PageRank calculation is E
`a vector over the Web pages which
`is used as a source of rank to make up for the rank sinks such a.'i cycles with no outedges (see
`Section 2.4). However, aside from solving the problem of rank sinks, E turns out to be a powerful
`parameter to adjust the page ranks. Intuitively theE vector corresponds to the distribution of web
`pages that a random surfer periodically jumps to. As we sec below, it can be used to give broad
`general views of the Web or views which are focussed and personalized to a particular individual.
`V•lc have performed most experiments with an E vector that. is uniform over all web pages with
`[[EI[ 1 = 0.15. This corresponds to a random surfer periodically jumping to a random web page.
`This is a very democratic choice for E since all web pages arc valued simply because they exist.
`Although this technique has been quite successful, there is an important problem with it. Some
`Weh pages with many related links receive an overly high ranking. Examples of these indmle
`copyright warnings, disclaimers, and highly interlinked mailing list archives.
`Another extreme is to have E consist entirely of a single web page. vVe tested two such E's -
`the Nctscapc home page, and the home page of a famous computer scientist, John McCarthy. For
`the Netscape home page, we atternpt to generate page ranks from the perspective of a novice user
`who has Nctscape set a.'l the default home page. In the case of John McCarthy's home page we
`want to calculate page ranks from the perspective of an individual who has given us considerable
`contextual information based on the links on his home page.
`In both cases, the mailing list problem mentioned above did not occur. And, in both cases, the
`respective home page got the highest PagcRank and was followed by its immediate links. From
`
`11
`
`

`

`Web Page
`Title
`John McCarthy's Home Page
`John Mitchell (Stanford CS Theory Group)
`Venture Law (Local Startup Law Finn)
`Stanford CS Home Page
`University of Michigan AI Lab
`University of Toronto CS Department
`Stanford CS Theory Croup
`Leadershape Institute
`
`.John Mr:Carthy'R View
`PagcRank Percentile
`100.00%
`100.00%
`99.94%
`100.00%
`99.95%
`99.99%
`!HJ.99%
`95.96%
`
`Netscape'R View
`PagcRank Percentile
`99.23%
`9:1.1-19%
`99.82%
`99.83%
`99.94%
`99.09%
`9!Ul5%
`97.10%
`
`Table 2: Page Ranks for Two Different Views: Netscape vs. John McCarthy
`
`that point, the disparity decrea~ed. In Table 2, we show the resulting page rank percentiles for
`an assortment of different pages. Pages related Lo computer science have a higher McCarthy-rank
`than Netscape-rank and pages related to computer science at Stanford have a considerably higher
`:\1cCarthy-rank. For example, the Web page of another Stanford Computer Science Dept. faculty
`member is more than six percentile points higher on the McCarthy-rank. Note that the page ranks
`arc displayed n.'l percentiles. This hn.'l the effect of compressing large differences in PageRank at
`the top of the range.
`Such personali;~:ed page ranks may have a number of applications, including personal search
`engines. These search engines could save users a greaL deal of trouble by efficiently guessing a
`large part of their interests given simple input such as their bookmarks or home page. vVe show an
`example of this in Appendix A with the "Mitchell" query. In this example, we demonstrate that
`while there are many people on the web named Mitchell, the number one result is the home page
`of a colleague of John McCarthy named John Mitchell.
`
`6.1 Manipulation by Commercial Interests
`
`These types of personalized PagcRanks arc virtually immune to manipulation by commercial in(cid:173)
`terests. For a page to get a high PageRank, it must convince an important page, or a lot of
`non-important pages to link to it. At worst, you can have manipulation in the form of buying
`aclv~rtis~m~nts(links) on important siteR. 13ut, this seemR w~ll mHl~r control siw:e it coRts mon~y.
`This immunity to manipulation is an extremely important property. This kind of commercial ma(cid:173)
`nipulation is causing search engines a great deal of trouble, and making features that would be great
`to have very difficult to implement. For example fast updating of documents is a very desirable
`feature, but it iR abuRed by people who want to manipulate the reRults of the search engine.
`A compromise between the t-wo extremes of uniform J;J and single page J;J is to let ~' consist of
`all the root level pages of all web servers. Notice this will allow some manipulation of PageRanks.
`Som~one who wiRhecl to ma.nipulat~ this syst~n1 could simply cr~ate a large munb~r of root l~vel
`servers all pointing at a particular site.
`
`12
`
`

`

`7 Applications
`
`7.1 Estimating Web Traffic
`
`Because PagcRank roughly corresponds to a random web surfer (sec Section 2.5), it is interesting
`to see how PageRank corresponds to actual usage. We used the counts of web page accesses from
`~LANR [NLA] proxy caehc and compared these t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket