throbber
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
`US 20070100817Al
`
`(19) United States
`c12) Patent Application Publication
`Acharya et al.
`
`(to) Pub. No.: US 2007/0100817 Al
`May 3, 2007
`(43) Pub. Date:
`
`(54) DOCUMENT SCORING BASED ON
`DOCUMENT CONTENT UPDATE
`
`(75)
`
`Inventors: Anurag Acharya, Campbell, CA (US);
`Jeffrey Dean, Palo Alto, CA (US); Paul
`Haahr, San Francisco, CA (US); _______ _
`Monika Benzinger, Corseaux (CH);
`Steve Lawrence, Mountain View, CA
`(US); Karl Pfleger, Mountain View, CA
`(US); Simon Tong, Mountain View, CA
`(US)
`
`Correspondence Address:
`HARRITY SNYDER, LLP
`11350 Random Hills Road
`SUITE 600
`FAIRFAX, VA 22030 (US)
`
`(73) Assignee: GOOGLE INC., Mountain View, CA
`
`(21) Appl. No.:
`
`11/562,285
`
`(22) Filed:
`
`Nov. 21, 2006
`
`125
`
`Related U.S. Application Data
`
`( 62) Division of application No. 10/748,664, filed on Dec.
`31, 2003.
`
`~(60} ProYisionaLappJication No._60/501,6J7,~filed.onSep .. _
`30, 2003.
`
`Publfcatlon Classification
`
`(51)
`
`Int. Cl.
`G06F 17130
`(2006.01)
`(52) U.S. CI ................................................................... 707/5
`
`(57)
`
`ABSTRACT
`
`A system may determine a measure of how a content of a
`document changes over time, generate a score for the
`document based, at least in part, on the measure of how the
`content of the document changes over time, and rank the
`document with regard to at least one other document based,
`at least in part, on the score.
`
`SEARCH ENGINE
`
`DOCUMENT
`LOCATOR
`310
`
`HISTORY
`COMPONENT
`~
`
`RANKING
`COMPONENT
`330
`
`DOCUMENT
`CORPUS
`340
`
`EXHIBIT 2109
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`
`Patent Application Publication May 3, 2007 Sheet 1 of 4
`
`US 2007/0100817 A1
`
`100 ~
`
`110 8 ~
`110 8 /
`
`FIG. 1
`
`120
`
`/
`
`130
`
`1/
`
`~ SERVER
`
`~ 140 B
`
`

`
`Patent Application Publication May 3, 2007 Sheet 2 of 4
`
`US 2007/0100817 A1
`
`110-140 ~
`
`INPUT DEVICES
`
`OUTPUT DEVICES
`
`COMMUNICATION
`INTERFACE
`
`MAIN
`MEMORY
`
`ROM
`
`STORAGE
`DEVICE
`
`BUS
`210
`
`PROCESSOR
`
`FIG. 2
`
`

`
`Patent Application Publication May 3, 2007 Sheet 3 of 4
`
`US 2007/0100817 A1
`
`125
`
`SEARCH ENGINE
`
`FIG. 3
`
`DOCUMENT
`LOCATOR
`310
`
`HISTORY
`COMPONENT
`320
`
`RANKING
`COMPONENT
`330
`
`DOCUMENT
`CORPUS
`340
`
`

`
`Patent Application Publication May 3, 2007 Sheet 4 of 4
`
`US 2007/0100817 A1
`
`FIG. 4
`
`410
`
`420
`
`IDENTIFY DOCUMENTS
`
`OBTAIN HISTORY DATA
`ASSOCIATED WITH DOCUMENTS
`
`430
`
`SCORE DOCUMENTS BASED, AT
`LEAST IN PART, ON HISTORY DATA
`
`

`
`US 2007/0100817 AI
`
`May 3, 2007
`
`1
`
`DOCUMENT SCORING BASED ON DOCUMENT
`CONTENT UPDATE
`
`RELATED APPLICATION
`
`[0001] This application is a divisional of U.S. patent
`application Ser. No. 10/748,664, filed Dec. 31, 2003, which
`claims priority under 35 U.S.C. § 119 based on U.S. Pro(cid:173)
`visional Application No. 60/507,617, filed Sep. 30, 2003, the
`disclosures of which are incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002] 1. Field of the Invention
`
`[0003] The present invention relates generally to informa(cid:173)
`tion retrieval systems and, more particularly, to systems and
`methods for generating search results based, at least in part,
`on historical data associated with relevant documents.
`
`[0004] 2. Description of Related Art
`
`[0005] The World Wide Web ("web") contains a vast
`amount of information. Search engines assist users in locat(cid:173)
`ing desired portions of this information by cataloging web
`documents. Typically, in response to a user's request, a
`search engine returns links to documents relevant to the
`request.
`
`[0006] Search engines may base their determination of the
`user's interest on search terms (called a search query)
`provided by the user. The goal of a search engine is to
`identify links to high quality relevant results based on the
`search query. Typically, the search engine accomplishes this
`by matching the terms in the search query to a corpus of
`pre-stored web documents. Web documents that contain the
`user's search terms are considered "hits" and are returned to
`the user.
`
`[0007]
`Ideally, a search engine, in response to a given
`user's search query, will provide the user with the most
`relevant results. One category of search engines identifies
`relevant documents based on a comparison of the search
`query terms to the words contained in the documents.
`Another category of search engines identifies relevant docu(cid:173)
`ments using factors other than, or in addition to, the presence
`of the search query terms in the documents. One such search
`engine uses information associated with links to or from the
`documents to determine the relative importance of the
`documents.
`
`[0008] Both categories of search engines strive to provide
`high quality results for a search query. There are several
`factors that may affect the quality of the results generated by
`a search engine. For example, some web site producers use
`spamming techniques to artificially inflate their rank. Also,
`"stale" documents (i.e., those documents that have not been
`updated for a period of time and, thus, contain stale data)
`may be ranked higher than "fresher" documents (i.e., those
`documents that have been more recently updated and, thus,
`contain more recent data). In some particular contexts, the
`higher ranking stale documents degrade the search results.
`
`[0009] Thus, there remains a need to improve the quality
`of results generated by search engines.
`
`SUMMARY OF THE INVENTION
`[0010] Systems and methods consistent with the principles
`of the invention may score documents based, at least in part,
`
`on history data associated with the documents. This scoring
`may be used to improve search results generated in connec(cid:173)
`tion with a search query.
`
`[0011] According to one aspect, a method may include
`determining a measure of how a content of a document
`changes over time; generating a score for the document
`based, at least in part, on the measure of how the content of
`the document changes over time; and ranking the document
`with regard to at least one other document based, at least in
`part, on the score.
`
`[0012] According to another aspect, a method may include
`determining a first rate of change in a content of a document
`in a first time period; determining a second rate of change in
`the content of the document in a second time period;
`comparing the first rate of change and the second rate of
`change to determine whether there is an increase or a
`decrease in the rate of change in the content of the docu(cid:173)
`ment; generating a score for the document based, at least in
`part, on whether there is an increase or a decrease in the rate
`of change in the content of the document; and ranking the
`document with regard to at least one other document based,
`at least in part, on the score.
`
`[0013] According to yet another aspect, a method may
`include receiving a search query; performing a search based,
`at least in part, on the search query to identifY a group of
`search result documents; determining a date on which a
`content changed for each of the search result documents in
`a set of the search result documents in the group; determin(cid:173)
`ing an average date-of-change of the search result docu(cid:173)
`ments in the set of search result documents based, at least in
`part, on the determined dates; generating a score for a search
`result document in the set of search result documents based,
`at least in part, on a difference between the determined date
`associated with the search result document and the average
`date-of-change of the search result documents in the set of
`search result documents; and ranking the search result
`document with regard to at least one other one of the search
`result documents based, at least in part, on the score.
`
`[0014] According to a further aspect, a method may
`include determining a measure of how anchor text associ(cid:173)
`ated with a link pointing to a document changes over time;
`generating a score for the document based, at least in part,
`on the measure of how the anchor text associated with the
`link pointing to the document changes over time; and
`ranking the document with regard to at least one other
`document based, at least in part, on the score.
`[0015] According to another aspect, a system may include
`means for determining whether a topic associated with a
`document changes over time; means for generating a score
`for the document based, at least in part, on the whether the
`topic associated with the document changes; and means for
`ranking the document with regard to at least one other
`document based, at least in part, on the score.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`[0016] The accompanying drawings, which are incorpo(cid:173)
`rated in and constitute a part of this specification, illustrate
`an embodiment of the invention and, together with the
`description, explain the invention. In the drawings,
`[0017] FIG. 1 is a diagram of an exemplary network in
`which systems and methods consistent with the principles of
`the invention may be implemented;
`
`

`
`US 2007/0100817 AI
`
`May 3, 2007
`
`2
`
`[0018] FIG. 2 is an exemplary diagram of a client and/or
`server of FIG. 1 according to an implementation consistent
`with the principles of the invention;
`
`[0019] FIG. 3 is an exemplary functional block diagram of
`the search engine of FIG. 1 according to an implementation
`consistent with the principles of the invention; and
`
`[0020] FIGS. 4 is a flowchart of exemplary processing for
`scoring documents according to an implementation consis(cid:173)
`tent with the principles of the invention.
`
`DETAILED DESCRIPTION
`
`[0021] The following detailed description of the invention
`refers to the accompanying drawings. The same reference
`numbers in different drawings may identify the same or
`similar elements. Also, the following detailed description
`does not limit the invention.
`
`[0022] Systems and methods consistent with the principles
`of the invention may score documents using, for example,
`history data associated with the documents. The systems and
`methods may use these scores to provide high quality search
`results.
`
`[0023] A "document," as the term is used herein, is to be
`broadly interpreted to include any machine-readable and
`machine-storable work product. A document may include an
`e-mail, a web site, a file, a combination of files, one or more
`files with embedded links to other files, a news group
`posting, a blog, a web advertisement, etc. In the context of
`the Internet, a common document is a web page. Web pages
`often include textual information and may include embed(cid:173)
`ded information (such as meta information, images, hyper(cid:173)
`links, etc.) and/or embedded instructions (such as Javascript,
`etc.). A page may correspond to a document or a portion of
`a document. Therefore, the words "page" and "document"
`may be used interchangeably in some cases. In other cases,
`a page may refer to a portion of a document, such as a
`sub-document. It may also be possible for a page to corre(cid:173)
`spond to more than a single document.
`
`[0024]
`In the description to follow, documents may be
`described as having links to other documents and/or links
`from other documents. For example, when a document
`includes a link to another document, the link may be referred
`to as a "forward link." When a document includes a link
`from another document, the link may be referred to as a
`"back link." When the term "link" is used, it may refer to
`either a back link or a forward link.
`
`Exemplary Network Configuration
`
`[0025] FIG. 1 is an exemplary diagram of a network 100
`in which systems and methods consistent with the principles
`of the invention may be implemented. Network 100 may
`include multiple clients 110 connected to multiple servers
`120-140 via a network 150. Network 150 may include a
`local area network (LAN), a wide area network (WAN), a
`telephone network, such as the Public Switched Telephone
`Network (PSTN), an intranet, the Internet, a memory device,
`another type of network, or a combination of networks. Two
`clients 110 and three servers 120-140 have been illustrated
`as connected to network 150 for simplicity. In practice, there
`may be more or fewer clients and servers. Also, in some
`instances, a client may perform the functions of a server and
`a server may perform the functions of a client.
`
`[0026] Clients 110 may include client entities. An entity
`may be defined as a device, such as a wireless telephone, a
`personal computer, a personal digital assistant (PDA), a lap
`top, or another type of computation or communication
`device, a thread or process running on one of these devices,
`and/or an object executable by one of these device. Servers
`120-140 may include server entities that gather, process,
`search, and/or maintain documents in a manner consistent
`with the principles of the invention. Clients 110 and servers
`120-140 may connect to network 150 via wired, wireless,
`and/or optical connections.
`
`[0027]
`In an implementation consistent with the principles
`of the invention, server 120 may include a search engine 125
`usable by clients 110. Server 120 may crawl a corpus of
`documents (e.g., web pages), index the documents, and store
`information associated with the documents in a repository of
`crawled documents. Servers 130 and 140 may store or
`maintain documents that may be crawled by server 120.
`While servers 120-140 are shown as separate entities, it may
`be possible for one or more of servers 120-140 to perform
`one or more of the functions of another one or more of
`servers 120-140. For example, it may be possible that two or
`more of servers 120-140 are implemented as a single server.
`It may also be possible for a single one of servers 120-140
`to be implemented as two or more separate (and possibly
`distributed) devices.
`
`Exemplary Client/Server Architecture
`[0028] FIG. 2 is an exemplary diagram of a client or server
`entity (hereinafter called "client/server entity"), which may
`correspond to one or more of clients 110 and servers
`120-140, according to an implementation consistent with the
`principles of the invention. The client/server entity may
`include a bus 210, a processor 220, a main memory 230, a
`read only memory (ROM) 240, a storage device 250, one or
`more input devices 260, one or more output devices 270, and
`a communication interface 280. Bus 210 may include one or
`more conductors that permit communication among the
`components of the client/server entity.
`
`[0029] Processor 220 may include one or more conven(cid:173)
`tional processors or microprocessors that interpret and
`execute instructions. Main memory 230 may include a
`random access memory (RAM) or another type of dynamic
`storage device that stores information and instructions for
`execution by processor 220. ROM 240 may include a
`conventional ROM device or another type of static storage
`device that stores static information and instructions for use
`by processor 220. Storage device 250 may include a mag(cid:173)
`netic and/or optical recording medium and its corresponding
`drive.
`
`Input device(s) 260 may include one or more
`[0030]
`conventional mechanisms that permit an operator to input
`information to the client/server entity, such as a keyboard, a
`mouse, a pen, voice recognition and/or biometric mecha(cid:173)
`nisms, etc. Output device(s) 270 may include one or more
`conventional mechanisms that output information to the
`operator, including a display, a printer, a speaker, etc.
`Communication interface 280 may include any transceiver(cid:173)
`like mechanism that enables the client/server entity to com(cid:173)
`municate with other devices and/or systems. For example,
`communication interface 280 may include mechanisms for
`communicating with another device or system via a net(cid:173)
`work, such as network 150.
`
`

`
`US 2007/0100817 AI
`
`May 3, 2007
`
`3
`
`[0031] As will be described in detail below, the client/
`server entity, consistent with the principles of the invention,
`perform certain searching-related operations. The client/
`server entity may perform these operations in response to
`processor 220 executing software instructions contained in a
`computer-readable medium, such as memory 230. A com(cid:173)
`puter-readable medium may be defined as one or more
`physical or logical memory devices and/or carrier waves.
`
`[0032] The software instructions may be read into memory
`230 from another computer-readable medium, such as data
`storage device 250, or from another device via communi(cid:173)
`cation interface 280. The software instructions contained in
`memory 230 may cause processor 220 to perform processes
`that will be described later. Alternatively, hardwired cir(cid:173)
`cuitry may be used in place of or in combination with
`software instructions to implement processes consistent with
`the principles of the invention. Thus, implementations con(cid:173)
`sistent with the principles of the invention are not limited to
`any specific combination of hardware circuitry and software.
`
`Exemplary Search Engine
`
`[0033] FIG. 3 is an exemplary functional block diagram of
`search engine 125 according to an implementation consis(cid:173)
`tent with the principles of the invention. Search engine 125
`may include document locator 310, history component 320,
`and ranking component 330. As shown in FIG. 3, one or
`more of document locator 310 and history component 320
`may connect to a document corpus 340. Document corpus
`340 may include information associated with documents that
`were previously crawled, indexed, and stored, for example,
`in a database accessible by search engine 125. History data,
`as will be described in more detail below, may be associated
`with each of the documents in document corpus 340. The
`history data may be stored in document corpus 340 or
`elsewhere.
`
`[0034] Document locator 310 may identifY a set of docu(cid:173)
`ments whose contents match a user search query. Document
`locator 310 may initially locate documents from document
`corpus 340 by comparing the terms in the user's search
`query to the documents in the corpus. In general, processes
`for indexing documents and searching the indexed collection
`to return a set of documents containing the searched terms
`are well known in the art. Accordingly, this functionality of
`document locator 310 will not be described further herein.
`
`[0035] History component 320 may gather history data
`associated with the documents in document corpus 340. In
`implementations consistent with the principles of the inven(cid:173)
`tion, the history data may include data relating to: document
`inception dates; document content updates/changes; query
`analysis; link-based criteria; anchor text (e.g., the text in
`which a hyperlink is embedded, typically underlined or
`otherwise highlighted in a document); traffic; user behavior;
`domain-related information; ranking history; user main(cid:173)
`tained/generated data (e.g., bookmarks); unique words, big(cid:173)
`rams, and phrases in anchor text; linkage of independent
`peers; and/or document topics. These different types of
`history data are described in additional detail below. In other
`implementations, the history data may include additional or
`different kinds of data.
`
`[0036] Ranking component 330 may assign a ranking
`score (also called simply a "score" herein) to one or more
`documents in document corpus 340. Ranking component
`
`330 may assign the ranking scores prior to, independent of,
`or in connection with a search query. When the documents
`are associated with a search query (e.g., identified as rel(cid:173)
`evant to the search query), search engine 125 may sort the
`documents based on the ranking score and return the sorted
`set of documents to the client that submitted the search
`query. Consistent with aspects of the invention, the ranking
`score is a value that attempts to quantifY the quality of the
`documents. In implementations consistent with the prin(cid:173)
`ciples of the invention, the score is based, at least in part, on
`the history data from history component 320.
`
`Exemplary History Data
`
`Document Inception Date
`
`[0037] According to an implementation consistent with
`the principles of the invention, a document's inception date
`may be used to generate (or alter) a score associated with
`that document. The term "date" is used broadly here and
`may,
`thus,
`include
`time and date measurements. As
`described below, there are several techniques that can be
`used to determine a document's inception date. Some of
`these techniques are "biased" in the sense that they can be
`influenced by third parties desiring to improve the score
`associated with a document. Other techniques are not biased.
`Any of these techniques, combinations of these techniques,
`or yet other techniques may be used to determine a docu(cid:173)
`ment's inception date.
`
`[0038] According to one implementation, the inception
`date of a document may be determined from the date that
`search engine 125 first learns of or indexes the document.
`Search engine 125 may discover the document through
`crawling, submission of the document (or a representation/
`snnnnary thereof) to search engine 125 from an "outside"
`source, a combination of crawl or submission-based index(cid:173)
`ing techniques, or in other ways. Alternatively, the inception
`date of a document may be determined from the date that
`search engine 125 first discovers a link to the document.
`
`[0039] According to another implementation, the date that
`a domain with which a document is registered may be used
`as an indication of the inception date of the document.
`According to yet another implementation, the first time that
`a document is referenced in another document, such as a
`news article, newsgroup, mailing list, or a combination of
`one or more such documents, may be used to infer an
`inception date of the document. According to a further
`implementation, the date that a document includes at least a
`threshold number of pages may be used as an indication of
`the inception date of the document. According to another
`implementation, the inception date of a document may be
`equal to a time stamp associated with the document by the
`server hosting the document. Other techniques, not specifi(cid:173)
`cally mentioned herein, or combinations of techniques could
`be used to determine or infer a document's inception date.
`
`[0040] Search engine 125 may use the inception date of a
`document for scoring of the document. For example, it may
`be assumed that a document with a fairly recent inception
`date will not have a significant number of links from other
`documents (i.e., back links). For existing link-based scoring
`techniques that score based on the number of links to/from
`a document, this recent document may be scored lower than
`an older document that has a larger number of links (e.g.,
`back links). When the inception date of the documents are
`
`

`
`US 2007/0100817 AI
`
`May 3, 2007
`
`4
`
`considered, however, the scores of the documents may be
`modified (either positively or negatively) based on the
`documents' inception dates.
`
`[0041] Consider the example of a document with an
`inception date of yesterday that is referenced by 10 back
`links. This document may be scored higher by search engine
`125 than a document with an inception date of 10 years ago
`that is referenced by 100 back links because the rate of link
`growth for the former is relatively higher than the latter.
`While a spiky rate of growth in the number of back links
`may be a factor used by search engine 125 to score docu(cid:173)
`ments, it may also signal an attempt to spam search engine
`125. Accordingly, in this situation, search engine 125 may
`actually lower the score of a document( s) to reduce the effect
`of spamming.
`
`[0042] Thus, according to an implementation consistent
`with the principles of the invention, search engine 125 may
`use the inception date of a document to determine a rate at
`which links to the document are created (e.g., as an average
`per unit time based on the number of links created since the
`inception date or some window in that period). This rate can
`then be used to score the document, for example, giving
`more weight to documents to which links are generated
`more often.
`
`In one implementation, search engine 125 may
`[0043]
`modifY the link-based score of a document as follows:
`
`H~Lilog (F+2),
`
`where H may refer to the history-adjusted link score, L may
`refer to the link score given to the document, which can be
`derived using any known link scoring technique (e.g., the
`scoring technique described in U.S. Pat. No. 6,285,999) that
`assigns a score to a document based on links to/from the
`document, and F may refer to elapsed time measured from
`the inception date associated with the document (or a
`window within this period).
`
`[0044] For some queries, older documents may be more
`favorable than newer ones. As a result, it may be beneficial
`to adjust the score of a document based on the difference (in
`age) from the average age of the result set. In other words,
`search engine 125 may determine the age of each of the
`documents in a result set (e.g., using their inception dates),
`determine the average age of the documents, and modifY the
`scores of the documents (either positively or negatively)
`based on a difference between the documents' age and the
`average age.
`
`In summary, search engine 125 may generate (or
`[0045]
`alter) a score associated with a document based, at least in
`part, on information relating to the inception date of the
`document.
`
`Content Updates/Changes
`
`[0046] According to an implementation consistent with
`the principles of the invention, information relating to a
`manner in which a document's content changes over time
`may be used to generate (or alter) a score associated with
`that document. For example, a document whose content is
`edited often may be scored differently than a document
`whose content remains static over time. Also, a document
`having a relatively large amount of its content updated over
`time might be scored differently than a document having a
`relatively small amount of its content updated over time.
`
`In one implementation, search engine 125 may
`[0047]
`generate a content update score (U) as follows:
`
`U~f(UF, UA),
`
`where f may refer to a function, such as a sum or weighted
`sum, UF may refer to an update frequency score that
`represents how often a document (or page) is updated, and
`UAmay refer to an update amount score that represents how
`much the document (or page) has changed over time. UF
`may be determined in a number of ways, including as an
`average time between updates, the number of updates in a
`given time period, etc.
`
`[0048] UA may also be determined as a function of one or
`more factors, such as the number of "new" or unique pages
`associated with a document over a period of time. Another
`factor might include the ratio of the number of new or
`unique pages associated with a document over a period of
`time versus the total number of pages associated with that
`document. Yet another factor may include the amount that
`the document is updated over one or more periods of time
`(e.g., n% of a document's visible content may change over
`a period t (e.g., last m months)), which might be an average
`value. A further factor might include the amount that the
`document (or page) has changed in one or more periods of
`time (e.g., within the last x days).
`
`[0049] According to one exemplary implementation, UA
`may be determined as a function of differently weighted
`portions of document content. For instance, content deemed
`to be unimportant if updated/changed, such as Javascript,
`comments, advertisements, navigational elements, boiler(cid:173)
`plate material, or date/time tags, may be given relatively
`little weight or even ignored altogether when determining
`UA. On the other hand, content deemed to be important if
`updated/changed (e.g., more often, more recently, more
`extensively, etc.), such as the title or anchor text associated
`with the forward links, could be given more weight than
`changes to other content when determining UA.
`
`[0050] UF and UA may be used in other ways to influence
`the score assigned to a document. For example, the rate of
`change in a current time period can be compared to the rate
`of change in another (e.g., previous) time period to deter(cid:173)
`mine whether there is an acceleration or deceleration trend.
`Documents for which there is an increase in the rate of
`change might be scored higher than those documents for
`which there is a steady rate of change, even if that rate of
`change is relatively high. The amount of change may also be
`a factor in this scoring. For example, documents for which
`there is an increase in the rate of change when that amount
`of change is greater than some threshold might be scored
`higher than those documents for which there is a steady rate
`of change or an amount of change is less than the threshold.
`
`[0051]
`In some situations, data storage resources may be
`insufficient to store the documents when monitoring the
`documents for content changes. In this case, search engine
`125 may store representations of the documents and monitor
`these representations for changes. For example, search
`engine 125 may store "signatures" of documents instead of
`the (entire) documents themselves to detect changes to
`document content. In this case, search engine 125 may store
`a term vector for a document (or page) and monitor it for
`relatively large changes. According to another implementa(cid:173)
`tion, search engine 125 may store and monitor a relatively
`
`

`
`US 2007/0100817 AI
`
`May 3, 2007
`
`5
`
`small portion (e.g., a few terms) of the documents that are
`determined to be important or the most frequently occurring
`(excluding "stop words").
`[0052] According to yet another implementation, search
`engine 125 may store a summary or other representation of
`a document and monitor this information for changes.
`According to a further implementation, search engine 125
`may generate a similarity hash (which may be used to detect
`near-duplication of a document) for the document and
`monitor it for changes. A change in a similarity hash may be
`considered to indicate a relatively large change in its asso(cid:173)
`ciated document. In other implementations, yet other tech(cid:173)
`niques may be used to monitor documents for changes. In
`situations where adequate data storage resources exist, the
`full documents may be stored and used to determine changes
`rather than some representation of the documents.
`[0053] For some queries, documents with content that has
`not recently changed may be more favorable than documents
`with content that has recently changed. As a result, it may be
`beneficial to adjust the score of a document based on the
`difference from the average date-of-change of the result set.
`In other words, search engine 125 may determine a date
`when the content of each of the documents in a result set last
`changed, determine the average date of change for the
`documents, and modifY the scores of the documents (either
`positively or negatively) based on a difference between the
`documents' date-of-change and the average date-of-change.
`[0054]
`In summary, search engine 125 may generate (or
`alter) a score associated with a document based, at least in
`part, on information relating to a manner in which the
`document's content changes over time. For very large docu(cid:173)
`ments that include content belonging to multiple individuals
`or organizations, the score may correspond to each of the
`sub-documents (i.e., that content belonging to or updated by
`a single individual or organization).
`Query Analysis
`[0055] According to an implementation consistent with
`the principles of the invention, one or more query-based
`factors may be used to generate (or alter) a score associated
`with a document. For example, one query-based factor may
`relate to the extent to which a document is selected over time
`when the document is included in a set of search results. In
`this case, search engine 125 might score documents selected
`relatively more often/increasingly by users higher than other
`documents.
`[0056] Another query-based factor may relate to the
`occurrence of certain search terms appearing in queries over
`time. A particular set of search terms may increasingly
`appear in queries over a period of time. For example, terms
`relating to a "hot" topic that is gaining/has gained popularity
`or a breaking news event would conceivably ap

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket