throbber
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
`
`US007028029B2
`
`c12) United States Patent
`Kamvar et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,028,029 B2
`Apr.ll, 2006
`
`(54) ADAPTIVE COMPUTATION OF RANKING
`
`(75)
`
`Inventors: Sepandar D. Kamvar, Palo Alto, CA
`(US); Taber H. Haveliwala, Mountain
`View, CA (US); Gene H. Golub,
`Stanford,_CA (US)
`
`(73) Assignee: Google Inc., Mountain View, CA (US)
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 10/925,189
`
`(22) Filed:
`
`Aug. 23, 2004
`
`(65)
`
`Prior Publication Data
`
`US 2005/0027685 AI
`
`Feb. 3, 2005
`
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 10/646,331,
`filed on Aug. 22, 2003.
`
`(60) Provisional application No. 60/458,921, filed on Mar.
`28,2003.
`
`(51)
`
`Int. Cl.
`(2006.01) .
`G06F 17130
`(52) U.S. Cl. ................................ 707/S; 3/7; 715/501.1
`(58) Field of Classification Search .................... 707/3,
`707/100, 5, 101, 10; 706/15; 715/500
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,560,600 Bl"' 5/2003 Broder .......................... 70717
`6,584,468 Bl"' 6/2003 Gabriel eta! ................. 707/IO
`6,799,176 Bt• 9/2004 Page ............................. 707/5
`6,87I,202 Bl * 3/2005 Broder .......................... 707/7
`2003/0204502 AI* I0/2003 Tomlin eta! .................. 707!5
`2003/0208478 AI "' 11/2003 Harvey . .. . ...... ... .. ... ........ 707/3
`2003/0208482- AI "' H/2003 --IGm-et at .. ~ ................ ;~ 707/3
`2004/0024752 AI* 2/2004 Manber eta! ................. 707/3
`2004/0111412 AI* 6/2004 Broder .......................... 707!7
`
`OTHER PUBLICATIONS
`Arasu, A, eta!., "PageRank Computation and the Structure
`of the Web: Experiments and Algorithms," Proceedings of
`the 11th lnt'l World Wide Web Conj, Poster Track, 2002.
`Bharat, K., et a!., "Improved Algorithms for Topic Distilla(cid:173)
`tion in a Hyperlinked Environment," Proceedings of the
`ACM-SIGIR, 1998.
`
`(Continued)
`
`Primary Examiner-Jean M. Corrielus
`(74) Attomey, Agent, or Firm-Morgan, Lewis & Beckius
`LLP
`
`(57)
`
`ABSTRACT
`
`A system and method is disclosed in which a ranking
`function for a set of document rank values is iteratively
`solved with respect to a set of linked documents until a first
`stability condition is satisfied. After such condition is satis(cid:173)
`fied, some of the ranks will have converged. The ranking
`function is modified to take into account these converged
`ranks so as to reduce the ranking function's computation
`cost. The modified ranking function is then solved until a
`second stability condition is satisfied. After such condition is
`satisfied more of the ranks will have converged. The ranking
`function is again modified and process continues until
`complete.
`
`6,112,202 A
`6,285,999 BI •
`
`8/2000 Kleinberg
`9/200I Page ............................. 70715
`
`31 Claims, S Drawing Sheets
`
`EXHIBIT 2104
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`

`
`US 7,028,029 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`Haveliwala, T., "Efficient Computation of PageRank," Stan(cid:173)
`ford University Technical Report, 1999.
`Haveliwala, T., "Topic Sensitive PageRank," Proceedings of
`the 11th Int'l World Wide Web Conf, 2002.
`Haveliwala, T., eta!., "The Second Eigenvalue of the Google
`Matrix," Stanford University Technical Report, 2003.
`Jeh, G., eta!., "Scaling Personalized Web Search," Proceed(cid:173)
`ings ofthe 12th Int'l World Wide Web Conf, 2003.
`
`Kamvar, S., et a!., "Exploiting the Block Structure of the
`Web for Computing PageRank," Stanford University Techni(cid:173)
`cal Report, 1999.
`Kamvar, S., et a!., "Extrapolation Methods for Accelerating
`PageRank Computations," Proceedings of the 12th Int 'l
`World Wide Web Conf, 2003.
`Page, L., et a!., "The PageRank Citation Ranking: Bringing
`Order to the Web," Stanford Digital Libraries Working
`Paper, 1998.
`* cited by examiner
`
`

`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 1 of 5
`
`US 7,028,029 B2
`
`Search Engine 100
`
`-
`
`Indexers
`
`106 ' Document
`
`Index
`108
`
`------------------~-·
`
`Second Level
`Controller
`120
`
`t
`
`Doc.
`Index
`Server
`
`•••
`
`t
`
`Doc.
`Index
`Server
`122n
`
`I
`
`122a
`
`Link Records
`124
`
`'-
`
`RL Resolver
`126
`
`Link Map(s)
`128
`age Ranke~
`130
`/
`
`Page Ranks
`132
`
`Quantizers
`134
`
`1
`
`r-------r1 ------------------~
`I
`I
`I
`
`Search Engine
`Back End System
`102
`
`Search Engine
`Front End System
`104
`
`Fig. 1
`
`

`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 2 of 5
`
`US 7,028,029 B2
`
`210
`
`1
`
`0.9
`
`.0.6
`
`(/)
`Q)
`
`0.7
`
`C> m-
`a...~ 0.6
`
`-·-oro
`c"S 0.5
`.Q E
`t:::
`0(.) c..-e
`a...
`
`::::::1
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`00
`
`5
`
`10
`
`15
`
`20
`
`25
`
`JO
`
`35
`
`40
`
`45
`
`Convergence Time
`
`Fig. 2
`
`

`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 3 of 5
`
`US 7,028,029 B2
`
`Create a directed graph of linked documents
`
`Associate all nodes with the set of non-converged nodes
`
`302
`
`304
`
`Calculate iteration ranking values of a
`ranking function for the nodes in the set of
`non-converged nodes
`
`312
`
`no
`
`ave a predefine
`number of iterations
`finished?
`
`Identify nodes in the non-converged set
`whose ranking values have converged to a
`predefined iteration tolerance
`
`Associate the identified nodes to a set of
`converged nodes and remove them from
`the set of non-converged nodes
`
`310
`
`314
`
`316
`
`Fig. 3
`
`

`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 4 of 5
`
`US 7,028,029 B2
`
`J
`X (k+1)
`1
`
`X (k+1)
`2
`
`402
`
`404
`J -
`
`A11
`
`A12
`
`A2.1
`
`A2.2
`
`412
`
`406
`
`A1,n
`
`~.n
`
`X (k)
`1
`
`X (k)
`2
`
`(k+1)
`xn-m
`/ ,..---- '
`(k+1 '
`'X
`n-m+1
`\
`
`/
`
`(k+1)
`xn-m+2
`
`1
`I
`I
`I
`I
`I
`I
`I
`I
`\
`
`\
`I
`I
`I
`I
`I
`I
`I
`J
`I
`
`I
`
`X (k+1)
`' n
`
`' --c_
`410
`
`-
`
`An-m,1
`
`An-m,2
`
`An-m,n
`
`An-m+1,1 An-m+1,2
`
`An-m+1,n
`
`An-m+2,1 An-m+2,2
`
`An-m+2,n
`
`An1
`
`An,2
`
`Ann
`
`Fig. 4
`
`(k+1
`xn-m
`........ - ...... ,
`
`\
`
`/
`
`I
`
`'(_k
`1 Xn-m+1 1
`I
`(k
`xn-m+2
`
`I
`I
`I
`
`\ X (k)
`\
`/
`n
`' '~
`408
`
`X (k+1)
`1
`
`\
`I
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`
`•••
`X (k+1)
`' n
`
`\ ',,_--(_
`
`502
`
`506
`
`f
`
`508
`
`/
`
`514
`
`/
`
`A1.1
`
`A1 n-m
`
`A1,n-m+1
`
`A1 n
`
`··· An-m n-m
`
`An-m,n-m+1
`
`· · ·
`
`An-m n
`
`- -------------------~--------------------
`A n-m+1,n-m
`
`An-m+1,1
`
`An-m+1 n-m+1
`
`· · · An-m+1,n
`
`518 J -
`
`X (k)
`1
`
`X
`(k+1
`n-m
`
`\ 512
`
`An,n-m+1
`
`\
`\
`
`\ 510
`
`Fig. 5
`
`X (k)
`n
`
`I
`1
`J
`
`I
`
`',-<_
`504
`
`

`
`U.S. Patent
`
`Apr. 11, 2006
`
`Sheet 5 of 5
`
`US 7,028,029 B2
`
`Computer 600
`
`./
`Operating System
`
`Memory 606
`
`CPU
`602
`
`I 608~
`
`__.~61 0
`
`User interface
`I Display ~ -612
`I Keyboard I
`_/
`
`Communication
`interface
`_;
`
`604
`
`614
`
`Network Communication
`Module
`Page Ranker
`
`Computation Module
`
`Modification Module
`
`Removal Module
`
`Modifier Module
`
`Identification Module
`
`Convergence Module
`
`Fig. 6
`
`v
`v
`
`f
`f
`v
`v
`v
`v
`v
`
`616
`
`618
`
`130
`
`620
`
`622
`
`624
`
`626
`
`628
`
`630
`
`

`
`US 7,028,029 B2
`
`1
`ADAPTIVE COMPUTATION OF RANKING
`
`RELATED APPLICATIONS
`
`This application is a continuation-in-part of U.S. Utility
`patent application Ser. No. 10/646,331, filed Aug. 22, 2003,
`which claimed priority on U.S. Provisional Patent Applica(cid:173)
`tion No. 60/458,921 filed Mar. 28, 2003, both of which are
`incorporated by reference herein in their entirety.
`
`STATEMENT OF GOVERNMENT SPONSORED
`SUPPORT
`
`This invention was supported in part by the National
`Science Foundation under Grant No. IIS-0085896 and Grant
`No. CCR-9971010. The US Government has certain rights
`in this invention.
`
`2
`based ranking. For example, U.S. Pat. No. 6,285,999 to Page
`discloses a technique used by the Google search engine for
`assigning a rank to each document in a hypertext database.
`According to the link-based ranking method of Page, the
`rank of a node is recursively defined as a function of the
`ranks of its parent nodes. Looked at another way, the rank of
`a node is the steady-state probability that an arbitrarily long
`random walk through the network will end up at the given
`node. Thus, a node will tend to have a high rank if it has
`10 many parents, or if its parents have high rank.
`Although link-based ranking techniques are improve(cid:173)
`ments over prior techniques, in the case of an extremely
`large database, such as the world wide web which contains
`billions of pages, the computation of the ranks for all the
`15 pages can take considerable time. Accordingly, it would be
`valuable to provide techniques for calculating page ranks
`with greater computational efficiency.
`
`TECHNICAL FIELD
`
`SUMMARY
`
`The present invention relates generally to improved tech(cid:173)
`niques for analyzing large directed graphs for use in com(cid:173)
`puter systems, and in particular to reducing the computa(cid:173)
`tional complexity of assigning ranks to nodes.
`
`BACKGROUND
`
`20
`
`In one embodiment, the invention includes iteratively
`solving a ranking function for a set of document rank values
`with respect to a set of linked documents until a first stability
`condition is satisfied. The ranking function is modified so as
`25 to reduce the ranking function's computation cost and then
`the modified ranking function is solved until a second
`stability condition is satisfied.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The aforementioned features and advantages of the inven(cid:173)
`tion as well as additional features and advantages thereof
`will be more clearly understood hereinafter as a result of a
`detailed description of embodiments of the invention when
`taken in conjunction with the drawings. Like reference
`numerals refer to corresponding parts throughout the several
`views of the drawings.
`FIG. 1 illustrates a search engine environment in accor(cid:173)
`dance with an embodiment of the present invention.
`FIG. 2 illustrates a graph of the cumulative proportion of
`converged pages versus convergence time in accordance
`with an embodiment of the invention.
`FIG. 3 illustrates solving the ranking function in accor(cid:173)
`dance with an embodiment of the invention
`FIG. 4 illustrates a ranking function in accordance with an
`embodiment of the invention.
`FIG. 5 illustrates a ranking function in accordance with an
`embodiment of the invention.
`FIG. 6 illustrates a computer system in accordance with
`50 an embodiment of the invention.
`
`A search engine is a software program designed to help a
`user access files stored on a computer, for example on the
`World Wide Web (WWW), by allowing the user to ask for 30
`documents meeting certain criteria (e.g., those containing a
`given word, a set of words, or a phrase) and retrieving files
`that match those criteria. Web search engines work by
`storing information about a large number of web pages
`(hereinafter also referred to as "pages" or "documents"), 35
`which they retrieve from the WWW. These documents are
`retrieved by a web crawler or spider, which is an automated
`web browser which follows the links it encounters in a
`crawled document. The contents of each successfully
`crawled document are indexed, thereby adding data con- 40
`cerning the words or terms in the document to an index
`database for use in responding to queries. Some search
`engines, also store all or part of the document itself, in
`addition to the index entries. When a user makes a search
`query having one or more terms, the search engine searches 45
`the index for documents that satisfY the query, and provides
`a listing of matching documents, typically including for each
`listed document the URL, the title of the document, and in
`some search engines a portion of document's text deemed
`relevant to the query.
`It can be useful for various purposes to rank or assign
`importance values to nodes in a large linked database. For
`example, the relevance of database search results can be
`improved by sorting the retrieved nodes according to their
`ranks, and presenting the most important, highly ranked 55
`nodes first. Alternately, the search results can be sorted based
`on a query score for each document in the search results,
`where the query score is a function of the document ranks as
`well as other factors.
`One approach to ranking documents involves examining 60
`the intrinsic content of each document or the back-link
`anchor text in parents of each document. This approach can
`be computationally intensive and often fails to assign high(cid:173)
`est ranks to the most important documents. Another
`approach to ranking involves examining the extrinsic rela(cid:173)
`tionships between documents, i.e., from the link structure of
`the directed graph. This type of approach is called a link-
`
`DESCRIPTION OF EMBODIMENTS
`
`The techniques of the present invention may used in a
`search engine environment where the linked database is
`generated from crawling a number of documents, such as the
`Internet. FIG. 1 is a block diagram of one such typical search
`engine environment. As illustrated in FIG. 1, a search engine
`100 has a back end system 102 and a front end system 104.
`The layout of the search engine system 100 is merely
`exemplary and can take on any other suitable layout or
`configuration.
`The back end system 102 may include one or more
`crawlers 105 (also known as spiders), one or more document
`65 indexers 106 and a document index 108. To index the large
`number of Web pages that exist on the worldwide web, the
`web crawler 104 locates and downloads web pages and other
`
`

`
`US 7,028,029 B2
`
`3
`information (hereinafter also referred to as "documents"). In
`some embodiments, a set of content filters 110 identify and
`filter out duplicate documents, and determine which docu(cid:173)
`ments should be sent to the document indexers 106 for
`indexing. The document indexers 106 process the down(cid:173)
`loaded documents, creating a document index 108 of terms
`found in those documents. If a document changes, then the
`document index 108 is updated with new information. Until
`a document is indexed, it is generally not available to users
`of the search engine 100.
`The front end system 104 may include a web server 112,
`one or more controllers 114, a cache 118, a second level
`controller 120 and one or more document index servers
`122a, ... , 122n. The document index 108 is created by the
`search engine 100 and is used to identifY documents that
`contain one or more terms in a search query. To search for
`documents on a particular subject, a user enters or otherwise
`specifies a search query, which includes one or more terms
`and operators (e.g., Boolean operators, positional operators,
`parentheses, etc.), and submits the search query to the search 20
`engine 100 using the web server 112.
`The controller 114 is coupled to the web server 112 and
`the cache 118. The cache 118 is used to speed up searches
`by temporarily storing previously located search results. In
`some embodiments, the cache 118 is distributed over mul(cid:173)
`tiple cache servers. Furthermore, in some embodiments, the
`data (search results) in the cache 118 is replicated in a
`parallel set of cache servers.
`While the following discussion describes certain func(cid:173)
`tions as being performed by one or more second level
`controllers 120, it should be understood that the number of
`controllers (114, 120) and the distribution of functions
`among those controllers may vary from one implementation
`to another. The second level controller 120 communicates
`with one or more document index servers 122a, ... , 122n.
`The document index servers 122a, ... , 122n (or alternately,
`one of the controllers 114, 120) encode the search query into
`an expression that is used to search the document index 108
`to identify documents that contain the terms specified by the
`search query. In some embodiments, the document index
`servers 122 search respective partitions of the document
`index 108 generated by the back end system 102 and return
`their results to the second level controller 120. The second
`level controller 120 combines the search results received
`from the document index servers 122a, ... , 122n, removes
`duplicate results (if any), and forwards those results to the
`controller 114. In some embodiments, there are multiple
`second level controllers 120 that operate in parallel to search
`different partitions of the document index 108, each second
`level controller 120 having a respective set of document
`index servers 122 to search respective sub-partitions of
`document index 108. In such embodiments, the controller
`114 distributes the search query to the multiple second level
`controllers 120 and combines search results received from
`the second level controllers 120. The controller 114 also
`stores the search query and search results in the cache 118,
`and passes the search results to the web server 112. A list of
`documents that satisfY the search query is presented to the
`user via the web server 112.
`In some embodiments, the content filters 110, or an
`associated set of servers or processes, identifY all the links
`in every web page produced by the crawlers 105 and store
`information about those links in a set of link records 124.
`The link records 124 indicate both the source URL and the
`target URL of each link, and may optionally contain other 65
`information as well, such as the "anchor text" associated
`with the link. A URL Resolver 126 reads the link records 124
`
`4
`and generates a database 128 of links, also called link maps,
`which include pairs of URLs or other web page document
`identifiers. In some embodiments, the links database 128 is
`used by a set of one or more Page Rankers 130 to compute
`Page Ranks 132 for all the documents downloaded by the
`crawlers. These Page Ranks 132 are then used by the
`controller 114 to rank the documents returned in response to
`a query of the document index 108 by document index
`servers 122. Alternately, the document index servers 122
`10 may utilize the Page Ranks 132 when computing query
`scores for documents listed in the search results. In certain
`embodiments of the present invention, the back end system
`102 further comprises quantizers 134 that are used to
`quantize data in Page Ranks 132. Brin and Page, "The
`15 Anatomy of a Large-Scale Hypertextual Search Engine," 7th
`International World Wide Web Conference, Brisbane, Aus(cid:173)
`tralia, provides more details on how one type of Page Rank
`metric can be computed. Other types of link-based on
`non-link based ranking techniques could also be utilized.
`A link-based ranking system, such as PageRank, makes
`the assumption that a link from a page u to a page v can be
`viewed as evidence that page v is an "important" page. In
`particular, the amount of importance conferred on page v by
`page u is proportional to the importance of page u and
`25 inversely proportional to the number of pages to which page
`u points. Since the importance of page u is itself not known,
`determining the importance for every page i requires an
`iterative fixed-point computation.
`In one embodiment, the importance of a page i is defined
`30 as the probability that at some particular time step, a random
`web surfer is at page i. Provided that the surfer chooses one
`of the links on page i, that link is chosen with a probability
`of 1 divided by the number of outlinks from page i, when the
`probability of choosing any of the outlinks is uniform across
`35 the outlinks. A transition probability matrix, P, may be
`created where P(i,j) is provided as 1/deg(i), where deg(i)
`represents the number of outlinks from page i. In other
`embodiments, P(i,j) could take into consideration certain
`personalization information for an individual or for a group,
`40 or could take into account other information derived from
`page i itself and/or elsewhere, and need not be uniform over
`each outlink from a given page.
`Some web pages have no outlinks, but for P to be a more
`useful transition probability matrix, every node must have at
`45 least 1 outgoing transition, i.e., P should have no rows
`consisting of all zeros. A matrix P can be converted into a
`more useful transition matrix by adding a complete set of
`outgoing transitions to pages with outdegree(O), i.e., no
`outlinks, to account for the probability that the surfer visiting
`50 that page randomly jumps to another page. In one embodi(cid:173)
`ment, the row for a page having no outlinks is modified to
`account for a probability that the surfer will jump to a
`different page uniformly across all pages, i.e., each element
`in the row becomes 1/n, where n is the number of nodes, or
`55 pages. In another embodiment, the modification could be
`non-uniform across all nodes and take into account person(cid:173)
`alization information. This personalization information
`might cause certain pages to have a higher probability
`compared to others based on a surfer's preferences, surfing
`60 habits, or other information. For example, if a surfer fre(cid:173)
`quently visits http://www.google.com, the transition prob(cid:173)
`ability from page i to the Google homepage would be higher
`than a page that the user infrequently visits. Another modi-
`fication to P may take into account the probability that any
`random surfer will jump to a random Web page (rather than
`following an outlink). The destination of the random jump is
`chosen according to certain probability distributions. In
`
`

`
`US 7,028,029 B2
`
`5
`some embodiments, this is uniform across all pages and in
`some embodiments this distribution is non-uniform and
`based on certain personalization information. Taking the
`transpose of the twice modified matrix P provides a matrix
`A. In the matrix P, a row i provided the transition probability
`distribution for a surfer at node i, whereas in the matrix A
`this is provided by column i. Mathematically this can be
`represented as:
`
`A~(c(P+D)+(1-c)E)T,
`
`where Pis a probability transition where P(i,j) represents the
`probability that the surfer will choose one of the links on i
`to page j; D represents the probability that a surfer visiting
`a page with no outlinks will jump to any other page; E
`represents the probability that a surfer will not choose any of 15
`the links and will jump to another page; and (1-c) represents
`a de-coupling factor indicating how likely it is that a surfer
`will jump to a random Web page, while c represents a
`coupling factor indicating how likely it is that a surfer will
`select one of the links in a currently selected or viewed page. 20
`Assuming that the probability distribution over all the
`nodes of the surfer's location at time 0 is given by xC0l, then
`the probability distribution for the surfer's location at time
`k is given by xCk)=A (k)x(o). The unique stationary distribution
`of the Markov chain is defined as limk~=xCkl, which is
`equivalent to limk~=ACklxC0l, and is independent of the
`initial distribution xC0 l. This is simply the principal eigen(cid:173)
`vector of the matrix A and the values can be used as ranking
`values. One way to calculate the principal eigenvector
`begins with a uniform distribution xC0 l=v and computes
`successive iterations of the ranking function, xCk)=A xCk- 1
`),
`until convergence. Convergence can be defined when two
`successive iterations of the ranking function produce a
`difference within a tolerance value. Various method can be
`used to determine tolerance values based on desired con- 35
`vergence characteristics or how much variation exists as the
`tolerance decreases.
`FIG. 2 illustrates an exemplary cumulate plot of conver(cid:173)
`gence times using the above described iterative process. The
`x-axis represents convergence by iteration number and the
`y-axis represents the cumulative proportion of document
`rank values that have converged. At a point 210, it can be
`seen, for this exemplary data set, that a large number of
`ranks have converged by point 210 within 20 iterations, but 45
`the final ranks take a significantly longer time to converge.
`Embodiments of the invention take advantage of this
`skewed distribution of convergence times to reduce the
`computational cost required for the determination of the full
`set of document rank values. Computational cost can be
`reduced by reducing the number of operations that must be
`performed and/or simplifying the types that must be pre(cid:173)
`formed. Additionally, reducing the need to move items in
`and out of main memory can have an effect on computa(cid:173)
`tional cost. By not recalculating the ranks of those ranks
`which have converged during a particular cycle of iterations,
`embodiments of the invention reduce the computation cost
`of determining document rank values.
`Referring to FIG. 3, a directed graph oflinked documents
`is initially created (302) where each document is represented 60
`by a node in the graph, and all nodes are associated with the
`set of nodes whose document rank values have not con(cid:173)
`verged (304). If the set of nodes which have not converged
`is empty (306-yes), then all the nodes have converged and
`the process ends (308). If the set of nodes which have not 65
`converged is not empty (330-no ), then an iteration of the
`function is calculated for those nodes which have not
`
`25
`
`30
`
`6
`converged (310). A predetermined number of iterations are
`completed per given cycle before examining which nodes'
`document rank values have converged. Accordingly, if a
`predetermined number of iterations for the current cycle has
`not been completed (312-no), then an additional iteration is
`calculated (310). On the other hand, if the predetermined
`number of iterations for the cycle been completed (312-yes ),
`then those nodes whose ranks have converged are identified
`(314). The number of iterations per cycle can be chosen in
`10 different ways and in some embodiments may depend on the
`balancing the computation cost of identifYing the nodes
`which have converged and modifying the ranking function
`versus computing the iterations. For example, the number of
`iterations could be chosen from a number between 5 and 15.
`In other embodiments, the number of iterations prior to
`identifying converged ranks could vary depending on a
`given cycle, with successive cycles having different number
`of iterations. For example, when the number of iterations for
`a cycle has been met (312-yes), the number of iterations for
`the next loop could be modified, such that the next iterative
`cycle would end after a different set of iterations, and so on.
`In other embodiments, instead of basing the end of a cycle
`on whether a number of iterations have been completed, the
`cycle is based on a proportion of nodes whose rank has
`converged. For example, the first cycle of iterations could
`complete after 25% of the nodes have converged. The
`proportion for the next cycle could be set to be an additional
`25% or some other percentage. One of ordinary skill in the
`art will readily recognize other ways this concept can be
`expanded using various criteria to end the iterative cycle.
`After the iteration cycle is complete (312-yes), those
`nodes whose document ranking value has converged to
`within a predefined iteration tolerance are identified (314).
`In some embodiments, the same tolerance value is used for
`each cycle of iteration and in other embodiments, the
`tolerance value could vary depending on the iterative cycle.
`Tolerances values could be selected from 0.00001 to 0.01, or
`other values. Those nodes which have converged are disas(cid:173)
`sociated with the set of non-converged nodes (316). The
`40 process continues until all document rank values have
`converged or some other type of ending mechanism is
`triggered. Other triggering mechanisms might include, for
`example, identifying convergence for a specific subset of
`nodes.
`In other embodiments, a first phase of rank computation
`may be computed using an initial tolerance level for con(cid:173)
`vergence as described above and using the phase tolerance
`level for each cycle of iteration in the phase. However,
`another phase of rank computation could follow using a
`50 second tolerance level for the cycles in the phase and using
`the ranks previously computed in the first phase as respec(cid:173)
`tive, initial document rank values in the next phase of rank
`computation. In some embodiments, the second tolerance
`level is smaller by an order of magnitude than the previous
`55 phase. In some embodiments, more than two phases are used
`with successively narrower tolerances for convergence.
`When the nodes whose document rank values are asso-
`ciated with the converged set, their document rank values
`are no longer calculated. In some embodiments, computing
`only document rank values which have not converged takes
`advantage of the matrix structure of the ranking function. As
`mentioned above, in some embodiments, the ranking func(cid:173)
`tion can be described as xCkl=A xCk- 1
`). At some time k, some
`of the document rank values will have converged. FIG. 4
`illustrates a ranking function accordance with some embodi(cid:173)
`ments where some of the rank values have converged.
`Colunm 402 of FIG. 4 illustrates the document rank value at
`
`

`
`US 7,028,029 B2
`
`7
`the k+ 1st iteration of the ranking function for node, or
`document, i, x?+ 1l. The document ranking values for the
`k+l" iteration are given by the matrix multiplication of A
`(shown at 404) by the k'h iteration of the document rank
`values x, (k) (shown at 406). The ranks which have converged
`by iteration k can be represented by xn-m+l (k) to xn (k) (shown
`at 408), where n represents the total number of nodes, or
`documents, and m represents the number of document rank
`values which have converged. Accordingly, the values for
`xn-m>l(k+l) to xn (k+l) (shown at 410) at the k+l st iteration will
`be the same as xn-m+l (k) to xn (k) (shown at 408) and those
`document rank values need not be calculated again. In some
`embodiments, only the calculations for those nodes which
`have not converged (shown at 412) are calculated. The
`ranking function is modified to remove those rows from the
`calculation. In some embodiments, the rows of colunm 402
`and matrix 404 corresponding to the converged nodes 410
`are not read into memory. In some embodiments, the matrix
`multiplication needed for rows corresponding to the con(cid:173)
`verged ranks are simply ignored and not calculated. In other
`embodiments the rows of 402 and 404 corresponding to the
`converged ranks are replaced by all zeros (which signifi(cid:173)
`cantly reduces computation time). In these embodiments, the
`colunm 406 is not affected since the converged values
`therein are used in the ranking function iteration. In some
`embodiments, the rows are initially ordered by decreasing
`order of convergence based on a previous solving of the
`ranking function. This has the effect of keeping longer
`converging nodes in main memory and reducing the amount
`of memory accesses to read portions of the modified ranking
`function into memory during the course of the computation.
`As mentioned earlier, reducing the amount of memory
`accesses can significantly reduce computation cost.
`During each cycle of iteration, the contributions to the
`rank of a non-converged node from the converged nodes is 35
`a constant. Accordingly, in some embodiments these con(cid:173)
`tributions are only calculated once per cycle of iteration.
`These embodiments can be understood with reference to
`FIG. 5. After a period of iterations, the nodes 502 have
`converged as described above. Accordingly, the values at 40
`504 will remain constant throughout each iteration cycle
`until another examination of convergence is made (314 and
`316 of FIG. 3). The matrix 506 now may be thought of as
`consisting of 4 partitions 508, 510, 512, 514. The partition
`508 illustrates the contributions that the non-converged 45
`nodes make to other non-converged nodes 516 (also called
`sub-matrix 516). The partition 510 illustrates the contribu(cid:173)
`tions that converged nodes make to converged nodes. The
`partition 512 illustrates the contributions that the non(cid:173)
`converged nodes make to the converged nodes. Finally, the 50
`partition 514 illustrates the contributions that the converged
`nodes make to the non-converged nodes 516. When matrix
`518 (the previous document ranks values) is multiplied
`against a row i in matrix 506, the multiplication products
`corresponding to values in partition 514 are constants. 55
`Therefore, to modifY the ranking function even further, some
`embodiments only calculate the products produced by mul(cid:173)
`tiplying partition 514 (representing contributions of the
`converged nodes to the non-converged nodes) once per

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket