`US007260573Bl
`
`c12) United States Patent
`Jeh et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,260,573 Bl
`Aug. 21, 2007
`
`(54) PERSONALIZING ANCHOR TEXT SCORES
`IN A SEARCH ENGINE
`
`(75)
`
`Inventors: Glen Jeh, San Francisco, CA (US);
`Taber H. Haveliwala, Mountain View,
`CA (US); Sepandar D. Kamvar, Palo
`..... Alto, CA (US)
`-
`
`(73) Assignee: Google Inc., Mountain View, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 3 5
`U.S.C. 154(b) by 543 days.
`
`(21) Appl. No.: 10/848,599
`
`(22) Filed:
`
`May 17,2004
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 17130
`(52) U.S. Cl ................ 707/7; 707/3; 707/10; 715/501.1
`(58) Field of Classification Search .................... 707/7,
`707/3, 10; 715/501.1
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,920,859 A *
`6,285,999 B1
`6,636,848 B1 *
`2002i0 169770 A1 •
`2003/0208482 A1 *
`2004/0215606 A1 •
`2005/007174I AI*
`2005/0114324 AI •
`2005/0I65781 AI*
`2005/0240580 A1 *
`OTHER PUBLICATIONS
`
`7/1999 Li ................................. 707/5
`9/2001 Page ............................. 707/5
`10/2003 Aridor eta! ................... 707/3
`lli2002 Kim eta!. . .................... 707/5
`lli2003 Kim et a!. .... .. ... ... ...... . . . 707/3
`10/2004 Cossock ........................ 707/3
`.. .......... 715/500
`3/2005 Achmya ct a!.
`5/2005 Mayer ........................... 707/3
`712005 Kraft eta! ..................... 70717
`1012005 Zamir eta! .................... 707/4
`
`Tsukasa Hirashirna eta!, "Context-Sensitive Filtering for Browsing
`in Hypertext", ACM I998, pp. 119-126.*
`
`Brin, S., eta!., "The Anatomy of a Laige-Scale Hypertextual Web
`Search Engine," In Proc. of the 7th Int'l World Wide Web Conf.,
`1998, 26 pages.
`Haveliwala, T.H., "Topic-Sensitive PageRank," In Proc. of the lith
`lnt'l Worl~ Wide Web Conf., May 2002, 10 pages.
`Jeh, G.; et al.; "Scaling· Personalized Web·Search," In Proc. of the
`12th Int'l World Wide Web Conf., 2003, 9 pages.
`Kamvar, S.D., eta!., "Exploiting the Block Structure of the Web for
`Computing PageRank," Stanford Univ. Technical Report, 2003, 13
`pages.
`* cited by examiner
`Primary Examiner-Uyen Le
`(74) Attorney, Agent, or Firm-Morgan, Lewis & Beckius
`LLP
`
`(57)
`
`ABSTRACT
`
`A search engine identifies a list of documents from a set of
`documents in a database in response to a set of query terms.
`For each document in the list, the search engine determines
`an information retrieval score based on its content and the
`query terms, and also identifies a set of source documents
`that have links to the document and that also have anchor
`text satisfYing a predefined requirement with respect to the
`query terms. The search engine calculates a personalized
`page importance score for each of the identified source
`documents according to a set of user-specific parameters and
`accumulates the personalized page importance scores to
`produce a personalized anchor text score for the document.
`The personalized anchor text score is then combined with
`the document's information retrieval score to generate a
`personalized ranking for the document. The documents are
`ordered according to their respective personalized rankings.
`
`51 Claims, 6 Drawing Sheets
`
`EXHIBIT 2103
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 1 of 6
`
`US 7,260,573 Bl
`
`110-1
`
`1 http:iiwww.stantord.edu/bentindex.html
`
`Ben's Homepage
`
`<a.:_:firef=http://www.gostanford.com>Sports News~/~?.>
`-~--------------------------------------------------------------------------
`
`c 110-2
`
`;(]
`
`1 http://www.gostanford.com
`
`I
`
`I
`
`(
`
`120-1
`
`I http://www.geocity.com/cindy/index.html
`
`Cindy's Homepage
`c_~a=fire~t;ii;;:ii~-.~~~~~~~-~~sp~rt~--N~~s-~ia·>--{
`------------------------------------------------------------------------------
`
`(
`
`120-2
`
`I v I http://www.espn.com
`
`Fig.1
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 2 of 6
`
`US 7,260,573 Bl
`
`Search Engine System
`
`200 J
`
`Search Query
`
`User Info
`204
`
`,---------- ----------·----·----------- --".-· -----
`: Server
`210
`
`User
`14-------1~ Info DB
`206
`
`Inverted
`Content
`Index
`208
`
`' ' ' '
`' ' ' '
`
`'
`
`'
`'
`'
`' '
`'
`'
`'
`
`'
`
`' ' ' ' '
`' ' ' ' '
`' ' ' '
`' ' ' ' '
`
`'
`'
`'
`L---- --------- ------------------ ---· -- -- ·- --------------+- -----------------------------------------------------
`
`Page
`Importance
`Ranker
`215
`
`Indexer
`217
`
`Network Crawler 220
`
`Fig. 2
`(Prior Art)
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 3 of 6
`
`US 7,260,573 Bl
`
`(203
`
`Search_..
`Query
`
`Indexes/
`Databases
`
`208,209,211
`
`(302
`
`Dl (IR score, URL, PR, List of Source Pages)
`~ D2 (IR score, URL, PR, List of Source Pages)
`
`Dl:
`Source Page 1 (URL, PR, Anchor Text
`of Link)
`t . - ._ Source Page 2 (URL, PR, Anchor Text
`ofLink)
`~~
`
`~~~.
`
`_r312
`
`3M I
`
`User
`Pro fi I e f----l~
`
`Personalized
`AT Score
`Generator
`310
`
`Dl:
`Source Page 1 LA Score
`Source Page 2 LA Score
`
`02: ...
`
`_r320
`
`Accumulator
`
`.----....L.._--.v 322
`D1: AT Score
`D2: AT Score
`
`.-------._r 324
`Search Result
`IR Scores
`Ranking
`Function
`
`326 ~ Personalized
`Document
`Rankings
`
`328 \.._,-------'----,
`Order Search
`Results
`
`330~ Search Results,
`Ordered by
`Personalized
`Rankings
`
`Fig. 3
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 4 of 6
`
`US 7,260,573 Bl
`
`Determine an IR score for a L-3
`
`document based on its content and a
`set of query terms
`
`0
`
`'
`having links to the document and l-/
`
`Identify a set of source documents
`
`42
`0
`
`anchor text matching at least one of
`the query terms
`
`,,
`Calculate personalized page 0
`
`importance scores for the identified
`source documents according to a set
`of user-specified parameters
`
`30
`
`,
`
`Accumulate a personalized AT score
`for the document based on the
`personalized page importance
`scores of the identified source
`documents
`
`/
`
`4
`50
`
`Generate a personalized ranking for __)
`
`the document based on its I R score
`and personalized AT score
`
`60
`
`r
`
`End
`
`Fig. 4
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 5 of 6
`
`US 7,260,573 Bl
`
`Identify a list of documents that
`satisfy a set of query terms
`
`510
`
`.------------------------------------------------------ ------------------------------------------------------~:
`For each
`candidate
`document
`
`410
`
`Determine an IR score for a
`document based on its content
`
`Identify a set of source documents
`having links to the document and
`anchor text matching at least one of
`the query terms
`
`Calculate personalized page
`importance scores for the identified
`source documents according to a set
`of user-specified parameters
`
`Accumulate a personalized AT score
`for the document based on the
`personalized page importance
`scores of the identified source
`documents
`
`430
`
`450
`
`' ' '
`
`Generate a personalized ranking for
`the document based on its IR score
`and personalized AT score
`'
`.._ ______________________________________________________ ------------------------------------------------------
`
`460
`
`Order the list of documents
`according to their personalized
`ran kings
`
`590
`
`End
`
`Fig. 5
`
`
`
`U.S. Patent
`
`Aug. 21, 2007
`
`Sheet 6 of 6
`
`US 7,260,573 Bl
`
`Memory 612 ~
`
`Operating System
`
`Network Communication Module
`
`System Initialization Module
`
`Search Engine 600
`
`Query Processor
`
`~
`CPf]
`'----I
`614 ~
`
`602
`
`604\
`
`User interface
`
`I Display ~ t-606
`II Keyboard I
`~608
`
`61(>__
`
`Network
`interface
`
`Fig. 6
`
`Search Result Ranking Function
`
`Personalized Score Generator
`
`Personalized AT Score Generator
`
`Persorv=!!iz:ed !R Score GP.nP.rator
`
`.
`
`.
`
`(System) Page Importance Score Database
`
`Inverted Content Index Database
`
`Inverted Anchor Text Index Database
`
`User Information Database
`
`User Profile (User1)
`
`.
`
`.
`
`Network Crawler
`
`Content Indexer
`
`Page Importance Ranker
`
`Anchor Text Indexer
`
`.
`
`Search Results
`
`Document#1
`
`Information Retrieval Score
`
`Pagelmportance~core
`(Personalized or System)
`Personalized Anchor Text Score
`: .
`Document#N
`
`Information Retrieval Score
`Page Importance Score
`(Personalized or System)
`Personalized Anchor Text Score
`
`616
`
`618
`
`v
`v
`v
`v
`v
`v
`622
`v
`310
`622
`Lr
`
`620
`
`205
`
`324
`
`v
`v
`v
`v
`v
`v
`v
`v
`v
`
`J
`
`v
`v
`v
`v
`v
`
`209
`
`208
`
`211
`
`206
`
`306
`
`306
`
`220
`
`213
`
`215
`
`217
`
`302/330
`
`640
`
`642
`
`644
`
`646
`
`
`
`US 7,260,573 Bl
`
`1
`PERSONALIZING ANCHOR TEXT SCORES
`IN A SEARCH ENGINE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to patent application Ser. No.
`10/646,331, filed Aug. 22, 2003, "Improved Methods For
`Ranking Nodes In Large Directed Graphs," which is hereby
`incorporated by reference.
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to the field of
`search engines for locating documents in a computer net(cid:173)
`work system, and in particular, to a system and method of
`personalizing search results produced by a search engine in
`accordance with user-specific parameters.
`
`BACKGROUND OF THE INVENTION
`
`Search engines provide a powerful tool for locating
`documents in a large database of documents, such as the
`documents on the World Wide Web (WWW) or the docu(cid:173)
`ments stored on the computers of an Intranet. The docu(cid:173)
`ments are located in response to a search query submitted by
`a user. A typical search query includes only two to three
`terms. As the number of documents accessible via the
`Internet grows, the number of documents that match the
`search query may also increase. However, not every docu(cid:173)
`ment matching the search query is equally important from
`the user's perspective. As a result, a user might be over(cid:173)
`whelmed by the enormous number of documents retrieved
`by a search engine, if the search engine did not order the 35
`search results based on their relevance to the user's query.
`One approach to improving the relevance of search results
`to a search query is to use the link structure of the documents
`in the database, such as the links between documents on the
`WWW, to compute global "importance" scores for the
`documents in the database. These scores are used to affect
`the order of search results when they are presented to the
`user. This approach is sometimes referred to as the PageR(cid:173)
`ank algorithm. Amore detailed description of the PageRank
`algorithm can be found in the article "The Anatomy of a 45
`Large-Scale Hypertextual Search Engine" by S. Brin and L.
`Page, 7'h International World Wide Web Conference, Bris(cid:173)
`bane, Australia and U.S. Pat. No. 6,285,999, both of which
`are hereby incorporated by reference as background infor-
`mation.
`An important assumption of the PageRank algorithm is
`that there is a "random surfer" who starts his web surfing at
`a randomly selected web page and keeps clicking on the
`links embedded in the web pages, never clicking on the
`"back" button. Occasionally, the random surfer re-starts his
`surfing by randomly picking another web page. The prob(cid:173)
`ability that the random surfer visits (i.e., views or down(cid:173)
`loads) a web page is a function of its PageRank. A web page
`may have a high PageRank if there are many other web
`pages pointing to it, or if some of the web pages pointing to
`it have a high PageRank. For example, www.espn.com is a
`famous website reporting sports-related news. It is conceiv(cid:173)
`able that there are many web pages over the Internet having
`links to www.espn.com. In contrast, www.gostanford.com is
`a website that only reports news about the sports teams of
`Stanford University. For the purposes of this explanation, we
`will assume that www.espn.com is more frequently visited
`
`2
`by WWW users than www.gostanford.com, and we will
`further assume that www.espn.com has a higher PageRank
`than www.gostanford.com.
`For each link in the link structure (representing links
`between the documents in the database), there is a pair of
`source and destination web pages. Source pages are also
`sometimes called "referring" pages. Further, many links in
`source web pages are associated with text that describes the
`destination web page of the link. Such text, commonly
`10 referred to as anchor text, often provides a more concise and
`accurate description than the destination web page itself and
`therefore can be used in determining the relevance of the
`destination web page to a particular query. FIG. 1 provides
`two examples of the link structure between different web
`15 pages. Each of the source web pages 110-1 and 120-1 has an
`embedded link pointing to one of the two destination web
`pages 110-2 and 120-2, respectively. An anchor text "Sports
`News" is associated with each link, characterizing the key
`feature of the corresponding destination page. When a user
`20 submits a query for "sports news" to a search engine (such
`as the Google search engine) that considers a web page's
`PageRank and anchor text, the engine may return both web
`pages 110-2 and 120-2. If so, the www.espn.com web page
`120-2 would likely be displayed higher in the search results
`25 than the www.gostanford.com web page 110-2 because page
`120-2 has a higher PageRank than page 110-2. It is noted
`that the Google search engine, as of late 2003, determines
`the position of a document in a set of search results as a
`function of the PageRanks of the documents in the search
`30 results, the query terms, the documents in the search results,
`and the anchor text of links to those documents. For pur(cid:173)
`poses of this discussion, we have assumed that large differ(cid:173)
`ences in the PageRanks of two documents often determine
`their relative position in a set of search results.
`When using a conventional search engine, the ordering of
`documents in a set of search results may be less than optimal
`for a user with specific personal preferences. In particular,
`documents of highest interest to the user may be positioned
`lower in the search results than one or more other docu-
`40 ments. It would be desirable to have a system and method of
`making the order of documents in a set of search results
`more attuned to a user's personal preferences, and it would
`be desirable for such a system to be computationally fea(cid:173)
`sible.
`
`SUMMARY
`
`In a method of personalizing the search results produced
`by a search engine, in accordance with a set of user-specific
`50 parameters, a search engine produces a set of search results
`in response to a query. The search results identify a set of
`documents, each of which is assigned an information
`retrieval score based on its content and the query terms. In
`some embodiments, the information retrieval score is a
`55 query dependent score that does not take into account the
`user-specific parameters. For a document in the identified set
`of documents, the method identifies a set of source docu(cid:173)
`ments having links to the document. The anchor text of each
`of the identified source documents is examined to determine
`60 if the anchor text satisfies a predefined requirement with
`respect to the query terms. After identifYing the source
`documents whose anchor text satisfies the predefined
`requirement, a personalized page importance score is com(cid:173)
`puted for each of the identified source documents according
`65 to the set of user-specific parameters.
`A personalized anchor text score is generated for the
`document by accumulating the personalized page impor-
`
`
`
`US 7,260,573 Bl
`
`3
`tance scores of the identified source documents. The per(cid:173)
`sonalized anchor text score of a document is combined with
`the document's information retrieval score to produce a
`personalized ranking for the document. The personalized
`ranking can be used in ordering the document in the search
`results.
`In one embodiment, the personalized page importance
`score of a document is a personalized link analysis score (a
`personalized PageRank is sometimes used for this score),
`which is based on an analysis of the linkages between
`documents that are directly or indirectly linked to this
`document. In some embodiments, the user-specific param(cid:173)
`eters includes a list of use favored websites, or includes URL
`keywords suitable for identifYing user favored websites. The
`user-specific parameters may be provided by the user, col(cid:173)
`lected from a third-party having such information, or
`derived by analysis of the user's previous search queries and
`the documents selected by the user from the search results of
`the user's previous search queries.
`In another aspect, a search engine system is configured to
`personalize the search results produced by a search engine,
`in accordance with a set of user-specific parameters, using
`the methodologies summarized above.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The aforementioned features and advantages of the inven(cid:173)
`tion as well as additional features and advantages thereof
`will be more clearly understood hereinafter as a result of a
`detailed description of preferred embodiments of the inven(cid:173)
`tion when taken in conjunction with the drawings.
`FIG. 1 provides two examples of the link structure
`between different web pages, the two source web pages of
`the two examples having the same anchor text.
`FIG. 2 is a simplified diagram illustrating a search engine
`system that utilizes user specific information to produce
`personalized search results.
`FIG. 3 is a conceptual diagram of a method for ordering
`search results in accordance with a user profile.
`FIG. 4 is a flowchart of a method of determining a
`personalized ranking for a document in a set of search
`results.
`FIG. 5 is a flowchart of a method of ordering a set of
`search results in accordance with a user profile.
`FIG. 6 is a block diagram of a search engine configured
`to generate personalized search results.
`Like reference numerals refer to corresponding parts
`throughout the several views of the drawings.
`
`DESCRIPTION OF EMBODIMENTS
`
`Referring to FIG. 1, assume that a user named Adam is
`looking for a website covering Stanford's sports teams. For
`the purposes of this explanation, we will assume that Adam
`would prefer that the search engine return www.gostanford- 55
`.com 110-2 ahead ofwww.espn.com 120-2. To achieve this
`goal, one approach would be to allow a user like Adam to
`instruct the search engine to personalize the rankings of
`search results by providing appropriate user information
`such as the user's background information or a plurality of 60
`favorite websites. For example, Adam may register with the
`search engine that he prefers web pages whose URL
`includes the term "Stanford" over other web pages.
`In FIG. 1, source page 110-1 (www.stanford.edu/ben/
`index.html) has a URL that includes "stanford", while 65
`source page 120-1 (www.geocity.com/cindy/index.html)
`does not. If Adam enters a search query of"sports news", the
`
`25
`
`4
`query terms will be found in the anchor text of both of these
`source pages 110-1, 120-1. Furthermore, based on Adam's
`registered search preferences, source page 110-1 should
`receive a higher score than source page 120-1. Described
`below are embodiments of search engine systems and meth(cid:173)
`ods for ranking and ordering search results in accordance
`with a user's preferences. Using these systems and methods,
`the web page www.gostandford.com 110-2 may be ranked
`higher and ordered before web age www.espn.com 120-2,
`10 depending on a variety of factors that are taken into account
`by the search result ranking function of the search engine.
`FIG. 2 is a simplified diagram illustrating a search engine
`system 200. The search engine system 200 is implemented
`in a client-server network environment, which comprises
`15 one or more client computers 201 and one or more server
`computers 210. Prior to any searches being performed on
`behalf of users, a network crawler 220 (sometimes called a
`web crawler) locates and downloads documents from a
`document database or network 219 (e.g., the Internet or an
`20 Intranet). In some embodiments, these documents are pro(cid:173)
`cessed by a content indexer to produce a set of indexes and
`databases are generated
`The documents downloaded by the network crawler 220
`are stored in the server and analyzed by different compo(cid:173)
`nents of the server 210. For instance, when a document
`arrives at the server 210, a content indexer 213 generates
`inverted content index entries for the document, which are
`stored in or added to the inverted content index 208. A page
`30 importance ranker 215 computes the document's page
`importance score. In some embodiments, the page impor(cid:173)
`tance score is the document's PageRank, which is a score
`generated using a specific link analysis methodology that
`propagates rank through links. A document's PageRank is
`35 based on the PageRanks of the documents that have links to
`the document. The resulting page importance scores are
`stored in a database 209. In other embodiments, the page
`importance scores could be replaced by another set of
`scores, such as scores produced using another link analysis
`40 methodology or scores produced using yet another page
`importance determination methodology.
`An anchor text indexer 217 is responsible for generating
`an inverted anchor text index 211 from the links in each
`document received by the server 210, including the text
`45 surrounding the links. The links and corresponding text are
`extracted from each document and stored in records identi(cid:173)
`:tying the source document, the target document associated
`with a link, and the anchor text associated with the link.
`When a sufficient number of such records have been accu-
`50 mulated, an inverted anchor text index 211 is generated,
`mapping anchor text terms to the documents that are the
`target of the corresponding links. In some embodiments, the
`inverted anchor text index 211 is merged with or incorpo(cid:173)
`rated in the inverted content index 208. More information
`about anchor text indexing is provided in U.S. patent appli(cid:173)
`cation Ser. No. 10/614,113, filed Jul. 3, 2003, "Anchor Text
`Indexing in a Web Crawler System", which is hereby
`incorporated by reference.
`A user of the system 200 first submits a search query 203
`through a client 201. A search query typically includes a set
`of query terms, which identifY terms to be included in
`documents that satisfY the search query. The search query is
`processed by a query processor 205 on the server side. Based
`on the search query, the query processor 205 generates
`search results 207, typically a list of documents that satisfy
`the search query 203, and returns the search results to client
`201.
`
`
`
`US 7,260,573 Bl
`
`10
`
`5
`Within server 201, the query processor 205 communicates
`with various databases to identifY documents that satisfY the
`search query and to determine how to order the search
`results. In some embodiments these includes the inverted
`content index 208, the page importance scores database 209
`and the inverted anchor text index 211. For example, the
`database that stores the inverted content index 208 first
`returns a set of documents identifiers, which identifY docu(cid:173)
`ments that contain the query terms of the search query.
`Optionally, the query processor 205 may submit the same
`query to the database storing the inverted anchor text index
`211 to find another set of documents that satisfY the search
`query. It is possible that a document may appear in both sets
`of documents. Finally, the two sets of documents are sub(cid:173)
`mitted to the page importance scores database 209 and 15
`ordered in accordance with their respective page importance
`scores. In some systems, the ordering of documents is query
`dependent, taking into account the documents' query inde(cid:173)
`pendent page importance scores (one example of which is
`PageRank) as well as the position(s) in which the query 20
`terms are found in the documents within the search result.
`The search engine system 200 further includes a user
`information database 206, with personalization information
`for its users. The personalization information for each
`respective user is herein called a user profile. The term user 25
`profile is used here without limitation on the particular data
`structures and methodology used to store the personalization
`information.
`In some embodiments, in addition to submitting a search
`query 203 to the server 210, a user can also submit user 30
`information 204 to the server 210. The user information 204
`may be in the form of a user profile or a set of user-specific
`parameters characterizing a user's background and prefer(cid:173)
`ences. In one embodiment, user information 204 is submit(cid:173)
`ted to the server together with the search query 302. In 35
`another embodiment, user information is submitted to the
`server separately. In yet another embodiment, the user
`information 204, is derived by the server 210 at least in part
`from search queries previously submitted by the user and by
`the documents in the search results that the user chooses to 40
`view or use. In some embodiments, when the server 210
`receives (or derives) user information 204, it stores such
`information in a user information database 206 and associ(cid:173)
`ates the user information with a unique user ID. In other
`embodiments, the server 210 receives the user information 45
`204 with each search query and does not retain such user
`information for use when processing subsequent search
`queries.
`In some embodiments, the user information is used to
`compute personalized page importance scores for at least a
`subset of the documents retrieved by the network crawler
`220. The Page Importance Ranker 215 generates a system
`page importance score for each document, as well as a set of
`user-specific (personalized) page importance scores for at
`least a subset of the retrieved documents. The Page Impor(cid:173)
`tance Ranker 215 utilizes the user-specific parameters to
`compute a personalized page importance score for at least a
`subset of the documents retrieved by the server 210 from the
`network or document database 219. In one embodiment, the
`Page Importance Ranker 215 accomplishes this task using
`an efficient link analysis calculation method taught in greater
`detail in the co-pending application Ser. No. 10/646,331,
`filed Aug. 22, 2003, which is hereby incorporated by refer(cid:173)
`ence. Conceptually, when computing personalized page
`importance scores, the Page Importance Ranker 215 boosts
`the page importance scores of documents that are deemed to
`match the user-specific parameters, which in tum boosts the
`
`6
`downstream documents linked to those documents. From
`another viewpoint, the Page Importance Ranker 215 boosts
`the page importance scores of documents of each host whose
`URL matches one or more of the user-specific parameters.
`In some embodiments, a document is be deemed to match
`(or not match) user-specific parameters solely based on the
`URL of the document. In other embodiments, a document is
`deemed to match the user-specified parameters based not
`only on the URL of the document, but also based on the
`content of the document, and/or based on the anchor text
`content of links to the document. When a document is
`deemed to match the user-specific parameters in a user
`profile (e.g., if the URL of the document includes any of
`URL keywords in the user profile), the document is assigned
`a personalized page importance score specified by a param-
`eter in the user profile. For example, the user profile may
`specifY for each URL keyword a particular page importance
`score adjustment that is to be applied to matching docu(cid:173)
`ments. When a document matches more than one URL
`keyword, the largest such page importance score adjustment
`is applied to the document. In other embodiments, the user
`profile may specifY the adjustment or assigrnnent of person(cid:173)
`alized page importance scores in other ways.
`Note that the personalized page importance score of a
`document is only a function of the document, the user(cid:173)
`specific parameters, and the link structure through which the
`document is related to other documents. In other words, the
`personalized page importance score of a document is a
`ranking factor that is independent of any individual search
`query submitted by the user to the search engine. However,
`it should be noted that in some embodiments the server 210
`is configured to generate at least some of the user-specific
`parameters based on previous search queries submitted by
`the user, and thus a user's search queries may indirectly
`affect a document's personalized page importance score. For
`example, if a user has submitted many queries related to the
`standard aptitude test (SAT), the server 210 may update his
`user information and incorporate this information into the
`set of user-specific parameters.
`In some embodiments, due to storage and computational
`limitations, the number of documents for which personal(cid:173)
`ized page importance scores are stored by the server 210 is
`limited. For instance, the server 210 may store up to N
`personalized page importance scores for each user, and
`furthermore will store personalized page importance scores
`only for those documents where the personalized page
`importance scores differs form the system (non-personal-
`50 ized) page importance scores. In another example, the server
`210 may store up to N personalized page importance score
`adjustment values, each for documents whose address or
`URL has a prefix indicating a respective web host. When the
`server 210 is processing a search query from a user for
`55 which it has a user profile that includes a stored set person(cid:173)
`alized page importance scores, it retrieves page importance
`scores from both the page importance scores database 209
`and the user information database 206. Where a page
`importance score for a particular document is found in both
`60 databases 215, 206, the page importance score from the user
`information database 206 is used (or, in some embodiments,
`the adjustment value from the user information database is
`applied to the system page importance score). Alternately,
`the server 210 first retrieves page importance scores from
`65 the user information database 206, and then retrieves page
`importance scores from the page importance scores database
`209 only for those documents for which a personalized page
`
`
`
`US 7,260,573 Bl
`
`7
`importance score (for the user who submitted the search
`query being processed) is not found in the user information
`database 206.
`In some embodiments, a personalized page importance
`score is determined for a respective document at runtime,
`while the server 210 is processing a search query. In
`particular, the personalized page importance score is gener(cid:173)
`ated by determining the set of documents that have links
`referring to the respective document, determining personal(cid:173)
`ized page importance scores for the referring documents 10
`based on the user profile of the user, and then computing a
`personalized page importance score for the respective docu(cid:173)
`ment as a function of the personalized page importance
`scores of the referring documents. This methodology may be
`extended to the "grandparent" documents that refer to the 15
`referring documents. While the personalized page impor(cid:173)
`tance scores produced by this runtime methodology may
`differ from those produced by a Page Importance Ranker
`using a full network link analysis, it avoids or largely
`reduces the persistent storage of personalized page impor- 20
`tance scores.
`In the remainder of this document, when a document's
`page importance score is retrieved or otherwise determined,
`the source of the page importance score will be understood
`to be either a page importance scores database 209, or a
`personalized page importance score associated with the user
`whose search query is being processed.
`The personalized page importance scores of source docu(cid:173)
`ments having links to a destination document can be used in
`many different ways. For example, the personalized page
`importance scores can be used in a method for generating a
`personalized and query-dependent ranking for the destina(cid:173)
`tion document. Such ranking can be used for ordering the
`destination document in the search results or for other
`further analysis.
`FIG. 3 is a conceptual representation of one embodiment
`of a method for personalizing the ordering of documents in
`a set of search results. In this em