`Mao et al.
`
`11111 111111111111111111 1111111111111111 IIII IIII IIIII Ill Ill Ill
`
`US006728704B2
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,728,704 B2
`Apr. 27, 2004
`
`(54) ME1HOD AND APPARATUS FOR MERGING
`RESUCT LISTS FROM MULTIPLE SEARCH
`ENGINES
`
`(75)
`
`Inventors: Jianchang Mao, San Jose, CA (US);
`Rajat Mukherjee, San Jose, CA (US);
`Prabhakar Raghavan, Saratoga, CA
`(US); Panayiotis Tsaparas, Toronto
`(CA)
`
`(73) Assignee: Verity, Inc., Sunnyvale, CA (US)
`
`( *) Notice :
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 212 days.
`
`(21) Appl. No.: 09/940,600
`Aug. 27, 2001
`
`(22) Filed:
`
`(65)
`
`Prior Publication Data
`
`US 2003/0041054 Al Feb. Tl, 2003
`(51) Int. Cl.7
`................................................ G06F 17/30
`(52) U.S. CL ........................................................... 707/3
`(58) Field of Search ................................. 7fJ7/1, 3, 5, 7
`
`(56)
`
`References Cited
`
`U.S. P,63'ENT DOCUMENTS
`
`6/1997 Driscoll
`5,642,502 A
`4/1998 Cohen et al.
`5,737,595 A
`5,802,515 A
`9/1998 Adar et al.
`5,806,061 A
`9/1998 Chaudhuri et al.
`6,006,225 A
`• 12/1999 Bowman et al. . ... ... ... ... .. 707/5
`6,078,914 A
`• 6/2000 Redfern ......................... 707/3
`8/2000 Christianson et al.
`6,102,969 A
`6,31:7 ,590 Bl • 12/2001 Chidlovskii et al. .. ... ... ... 707/5
`6,370,51:7 Bl • 4/2002 Singha!
`......................... 707/6
`6,415,281 Bl
`7/2002 Anderson
`6,546,388 Bl • 4/2003 &llund et al.
`
`. .. .. .. .. .. .. ... . 707/5
`
`OTIIER PUBUCXTIONS
`
`Ronald Fagin, et al., " Optimal Aggregation Algorithms for
`Middleware;' IBM Researcli Report RJ 10205, 2000, pp.
`1 40.
`
`Sergio A. Alvarez, et al., " Web Metasearch as Belief Aggre
`gation;' AAAl 2000 Worksliop on Al for Web Search, 2000,
`Copyright ©2000, American Association for Artificial Intel
`ligence (www.aaai.org).
`Clement Yu, et al., " A Methodology to Retrieve Text Docu
`from Multiple Databases;' Teclinical report
`meats
`60607 7053, U. of Illinois at Chicago, 1999, pp. 1 29.
`Yoav Freund, et al., " An Efficient Boosting Algorithm for
`Combining Preferences;' In Macliine Learning: Proceed
`ings of tire Fifteentli International Conference, 1998, pp.
`1 17.
`Luis Gravano, et al., "STARTS: Stanford Proposal for
`Internet Meta Searching," In Proceedings of ACM SIG
`MOD, 1997.
`Luis Gravano, et al., "Merging Ranks from Heterogeneous
`Internet Sources;' In Proceedings oftlie twenty tliird Inter
`national Conference on ~,y Large Databases (VLDB),
`Athens, Greece Aug. 26 29, 1997, pp. I ii, 196 205 .
`Ellen M. \k:iorhess, et al., " Multiple Search Engines in
`Database Merging;' In Proceedings of tire Second ACM
`International Conference on Digital Libraries (DL'97),
`1997.
`
`(List continued on next page.)
`
`Primary Examiner -Safet Metjahic
`Assistant Examiner -Haythim J. Alaubaidi
`(74) Attorney, Agent, or Finn -Cooley, Godward LLP
`
`(57)
`
`ABSTRACT
`
`This invention includes the step of transmitting a query to a
`set of search engines. Any result lists returned from these
`search engines is received, and a subset of entries in each
`result list is selected. Each entry in this subset is assigned a
`scoring value according to a scoring function, and each
`result list is then assigned a representative value according
`to the scoring values assigned to its entries. A merged list of
`entries is produced based upon the reJiesentative value
`assigned to each result list.
`
`23 Claims, 3 Drawing Sheets
`
`i
`
`'IJJ
`
`,..
`,._
`_, ...
`._
`'"
`
`.,-----
`2C
`,c
`,c
`,e
`
`"' "' ,e
`
`001
`
`GOOGLE 1001
`
`
`
`US 6,728,704 B2
`Page 2
`
`OIBER PUBLICATIONS
`
`0. Etzioni, et al., "Efficient Information Gathering on the
`Internet," In Proceedings of the 37th Annual Symposium on
`Foundations of Computer Science (FOCS), 1996.
`Ronald Fagin, et al., "Combining Fuzzy Information from
`Multiple Systems," In Proceedings of Principles of Data
`base Systems (PODS), 1996, pp. 1 33.
`Luis Gravano, et al., "Generalizing GLOSS to Vector Space
`Databases and Broker Hierarchies," In Proceedings of the
`twenty first International Conference on Very Large Data
`bases (VLDB), 1995, pp. 1 24.
`Ellen M. Voorhess, et al., "The Collection Fusion Problem,"
`In Proceedings of the Third Text Retrieval Conference
`(TREC 3), 1995.
`
`James P. Callan,et al., "Searching Disributed Collections
`with Inference Networks," In Proceedings of the 18th ACM
`International Conference on Research and Development in
`Information Retrieval (SIGIR '95), 1995, pp. 1 8.
`Brian T. Bartell, et al., "Automatic Combination of Multiple
`Ranked Retrieval Systems," In Proceedings of the 17th
`Annual International ACM SIGIR Conference on Research
`and Development in Information Retrieval, 1994.
`
`Vagelis HRISTIDIS, et al., "PREFER: A System for the
`Efficient Execution of Multi parametric Ranked Queries,"
`In Proceedings of ACM SIGMOD, May 21 24, 2001, pp.
`259 270.
`
`* cited by examiner
`
`002
`
`
`
`U.S. Patent
`
`Apr. 27, 2004
`
`Sheet 1 of 3
`
`US 6,728,704 B2
`
`20
`
`r 30
`
`Network Connection
`L 34
`
`32 \
`
`CPU
`
`36 ---
`
`40 -
`
`42 -
`
`44 -
`
`18
`r
`~ Communication
`Program
`Search Engine
`Program
`Index
`
`['.___
`
`....____
`
`....____
`
`Merging Program
`
`. .
`.
`
`/
`
`50
`
`54 \
`
`'
`Network Connection
`
`r
`
`52
`
`CPU
`
`56 --.
`
`/58
`
`"' Communication
`
`14\
`
`r
`
`10
`
`Network Connection
`
`r
`
`12
`
`CPU
`
`1 11
`
`,,,- 16
`
`/
`
`Browser
`
`Search Engine
`Program
`
`Merging Program
`.
`.
`.
`
`r-- - 18
`
`i'--....
`
`-- 19
`
`Program
`WWW Data
`Page(s)
`Database
`
`.
`. .
`. . .
`
`60 -
`
`I'--,
`
`62----
`
`I'--.
`
`5 /
`
`,__
`
`FIG. 1
`
`003
`
`
`
`U.S. Patent
`
`Apr. 27, 2004
`
`Sheet 2 of 3
`
`US 6,728,704 B2
`
`r 70
`
`Transmit query
`to search engines
`
`/72
`
`Receive
`result lists
`
`r74
`
`Select a subset of
`entries from each list
`
`/76
`
`Determine scoring value
`for each entry selected
`
`178
`
`For each list, determine
`a representative value
`for all scoring values
`
`/80
`
`Merge all entries into
`single list based on
`each individual
`representative value
`
`FIG. 2
`
`004
`
`
`
`U.S. Patent
`
`Apr. 27, 2004
`
`Sheet 3 of 3
`
`US 6,728,704 B2
`
`100
`
`110
`
`Search
`Engine
`
`Result List
`1A 9A
`2A 10A
`3A
`4A
`5A
`GA
`7A
`8A
`
`102
`
`104
`
`Search
`Engine
`
`Search
`Engine
`
`112
`
`114
`
`Result List
`1B
`2B
`3B
`4B
`5B
`6B
`'78
`
`Result List
`1C
`2C
`3C
`4C
`SC
`GC
`7C
`BC
`
`120
`
`122
`
`124
`
`Subset
`Scoring
`~ Value
`1A ~
`2A
`12
`3A
`10
`4A
`8
`
`Subset
`Scoring
`~ Value
`1B
`17
`28
`14.5
`3B
`21
`48
`7.3
`
`Subset
`Scoring
`Entry
`Value
`~
`14
`2C
`27
`3
`3C
`4C
`8.9
`
`130
`
`132
`
`134
`
`Avg. Scoring
`Value
`11 .25
`
`Avg. Scoring
`Value
`14.95
`
`Avg. Scoring
`Value
`13.225
`
`Merged List
`18
`28
`1C
`3B
`2C
`41:'
`1J\
`
`140 -
`
`005
`
`FIG. 3
`
`
`
`US 6,728,704 B2
`
`1
`METHOD AND APPARATUS FOR MERGING
`RESULT LISTS FROM MULTIPLE SEARCH
`ENGINES
`
`BRIEF DESCRIPTION OF IBE INVENTION
`
`This invention relates generally to search engine technol(cid:173)
`ogy. More specifically, this invention relates to reducing the
`computational overhead associated with merging results
`from multiple search engines.
`
`BACKGROUND OF THE INVENTION
`
`Contemporary computers often operate in a network
`environment that allows them to communicate with each
`other. Accordingly, they can exchange data and search and
`retrieve the contents of another computer in the same
`network. As individual computers can store information,
`large networks of computers can act as vast storehouses of
`information with each computer able to access this store(cid:173)
`house through the network.
`Searches for information in the networked computer
`environment may be cumbersome due to the sheer amount
`of information stored, or due to the complexity of finding
`information in large file structures. Indeed, with the advent
`of the World Wide Web (WWW) as well as other forms of
`computer networking, and the corresponding explosion in
`the amount of information available, it is now simply
`impractical for users to search for information manually. The
`ability of search engines to analyze enormous amounts of
`data and isolate useful information thus becomes of para(cid:173)
`mount importance.
`The use of search engines can speed information retrieval
`by automating the task of collecting information over a
`network of computers. In essence, users direct a computer to
`search for information much faster than a human ever could.
`Search engines are computer programs designed to seek out
`information based on instructions from the user. Typically,
`the user enters a set of instructions, often called a query,
`which instructs the search engine to search for specific types
`of information. Most contemporary search engines are
`designed to take a query, search a group of networked
`computers for information that satisfies the query, and return
`any results to the user. Often termed a result list, the data
`returned to the user normally contains, at a minimum, a
`number of entries or results that describe the locations of
`relevant information. Many times, this result list also
`includes an excerpt of the relevant information for the user's
`inspection, as well as a ranking. This ranking serves as a
`rough indicator of how well the returned information satis- 50
`fies the query, and is usually based on a numerical scoring
`value or metric.
`Almost all search engines work in this general manner.
`However, their architectures vary according to the context in
`which they operate. Search engines are currently constructed 55
`in at least three architectures: federated, peer-to-peer, and
`meta-search engines. Each is used to conduct different types
`of searches.
`Federated search engines are used in the client-server
`environment. A client or server may initiate a search for data
`located in various networked servers. Federated search
`engines are most commonly used in the WWW context, but
`need not be limited in this manner. Typical federated engines
`search the WWW by utilizing programs called bots or
`spiders to examine the content of information available on 65
`other computers and build an index consisting of the words
`or other data stored in these computers, as well as where they
`
`2
`are located. Once users enter a query consisting of words or
`data desired, the search engine searches its index for any
`locations that contain these words/data and returns a list of
`such locations. The result list returned is normally a list of
`5 each such returned location and any associated information,
`and may include Uniform Resource Locators (URLs) for
`finding WWW-based data, or other expressions of data
`location. The results or entries in this list are often ranked
`according to any of a number of criteria currently available,
`with the goal of presenting the most relevant results to the
`10 user first.
`One flaw in this type of search engine is the potential for
`inaccurate information. Because the WWW is so large,
`indices are updated only sporadically, meaning searches
`may not uncover the most recent information. Other types of
`15 search engines avoid the need for spiders and indices, and
`thus present users with up-to-date information more often.
`One example is the peer-to-peer search engine, which can
`also be used for other networks besides the WWW. These
`search engines operate in the peer-to-peer environment,
`20 where computers are simply linked together with no cen(cid:173)
`tralized servers and no distinct clients. They typically work
`by distributing a search to various peer computers, each of
`which can in turn farm out the search to other computers in
`the same network. In this way, individual computers search
`25 only the current contents of a few peers and not the entire
`WWW or other network. This eliminates the need to build
`a large index, and delivers to the user a real-time snapshot
`of the content of the network or the WWW.
`Finally, web meta-search engines can operate in either the
`30 client-server or peer-to-peer environment. These search
`engines typically act as aggregators that farm a WWW
`search out to other public web search engines, then process
`the results.
`A common thread amongst all types of search engines,
`35 including the three listed above, is that all usually involve
`the merging of result lists. Federated search engines typi(cid:173)
`cally farm out a search to different search engines, each of
`which has access to certain server databases. The federated
`search engine must then merge the result lists returned by
`40 each search engine. Peer-to-peer search engines, as men(cid:173)
`tioned above, distribute a search to other engines in the same
`peer network. These engines can then distribute the search to
`other computers, and so on. At each stage, the results
`returned may need to be collected and merged before being
`45 passed back up the chain. Finally, meta-search engines must
`merge the result lists sent back by each public web search
`engine.
`This merging tactic has its drawbacks. Currently, the
`merging of multiple result lists into a single list is usually
`accomplished by examining and ranking every single entry
`of every list. As one can imagine, this ranking process can
`become quite computationally intensive if the number of
`lists or the number of entries per list is large. Thus, for large
`lists or large numbers of lists, the computation time required
`by the merging process can nullify any advantage gained by
`operating multiple search engines at the same time.
`In view of this shortcoming, it would be highly desirable
`to merge entries from multiple result lists into a single list in
`a manner that avoids some of the computational overhead
`60 associated with current methods. Accomplishing this goal
`would improve the speed and efficiency with which useful
`information could be brought to people, thus reducing the
`tedium associated with many different tasks.
`SUMMARY OF IBE INVENTION
`This invention includes a method for merging multiple
`result lists from search engines.
`
`006
`
`
`
`US 6,728,704 B2
`
`10
`
`3
`The invention includes the step of transmitting a query to
`a set of search engines. Any result lists returned from these
`search engines is received, and a subset of entries from each
`result list is selected. Each entry in this subset is assigned a
`scoring value according to a scoring function, and each
`result list is then assigned a representative value according
`to the scoring values assigned to its entries. A merged list of
`entries is produced based upon the representative value
`assigned to each result list.
`The invention further includes a computer-readable
`memory to instruct a computer to merge multiple result lists
`from search engines. Executable instructions stored in the
`memory include instructions for selecting a subset of entries
`from each result list. Each entry in the subset is assigned a
`scoring value according to a scoring function. Each result
`list is assigned a representative value based on a function of
`scoring values assigned to its entries. The entries are then
`ranked based on the representative value assigned to their
`result list.
`This invention allows for a reduction in computational 20
`overhead when merging and re-ranking multiple result lists.
`Ranking of results is accomplished by evaluating a subset of
`entries instead of every single one, thus reducing the number
`of calculations required.
`
`4
`trolled by CPU 32 and connected to the network by network
`connection 34. It includes a computer-readable memory 36
`that contains a communication program 38 to allow the
`exchange of data across network 5. It also includes search
`5 engine program 40 that can be accessed by users or other
`computers 10. In a client-server configuration engine 40
`requires an index 42 to store keywords or other data.
`Merging program 44 is set up to merge different result lists.
`Typical networks may contain a number of computers like
`computer 30.
`Computer 50 may be a server computer or a peer of
`computer 10 or computer 30. It is also a standard computer
`controlled by CPU 52 and connected to network 5 by
`network connection 54. Computer-readable memory 56 con(cid:173)
`tains a communication program 58 that allows the exchange
`15 of electronic information. It also contains one or more
`WWW data pages 60 which can be browsed by those with
`access to network 5. Computer 50 may also contain other
`forms of data stored in database 62. Typical networks may
`contain a number of computers like computer 50.
`The present invention operates within a network of com-
`puters such as those shown in FIG. 1. More specifically, the
`present invention operates by engaging multiple search
`engines to process a query and merge the result lists for
`presentation to the user. In a typical client-server
`25 configuration, a user operating client computer 10 sends
`queries through transmission channel 20 to search engine
`40, which is resident on server 30. Through the use of
`spiders or bots whose operations are known in the art, the
`search engine 40 typically will have already built up a
`30 collection of locations (which can include URLs), along
`with the data contained in those locations, in index 42. For
`example, the bots would have already searched the contents
`of memory 36 of computers configured like computer 30.
`They would have also searched the contents of WWW data
`35 pages 60 and databases 62 of computers configured like
`computer 50. The content from these computers would be
`stored in index 42. Search engine 40 then cross-checks the
`words or other data contained in the query against the data
`contained in index 42 for matches. Locations in index 42
`containing data that matches the query are compiled into a
`result list. Search engine 40 typically transmits the same
`query to other search engines resident on computers con(cid:173)
`DETAILED DESCRIPTION OF THE
`figured like computer 30 and connected to transmission
`INVENTION
`channel 20. These other search engines then perform sepa-
`FIG. 1 illustrates a generalized computer network 5 that
`45 rate searches in the same manner as above, compile their
`may be operated in accordance with an embodiment of the
`own result lists, and return these lists to the computer 30 that
`present invention. This computer network 5 may operate in
`originated the search. The end result of the above is a set of
`a client-server, peer-to-peer, or other configuration, and may
`result lists that must be merged by merging program 44 and
`returned to computer 10 for display to the user.
`also be considered a representation of the WWW. The
`network 5 includes at least one computer 10 connected by 50
`Note that searches using the client-server configuration do
`transmission channel 20 to a group of computers 30 and 50.
`not require the use of search engine 18. The user can enter
`Transmission channel 20 may be any wire or wireless
`a query into computer 10 and transmit it through transmis(cid:173)
`transmission channel.
`sion channel 20 to search engine 40 of computer 20, where
`the search process could begin as described above, with
`Computer 10 is a standard computer controlled by a
`Central Processing Unit (CPU) 12 and connected to the rest 55 search engine 40 searching its index 42 and transmitting the
`query to other servers, and so on. In this case, merging
`of the computers in network 5 by network connection 14.
`program 44 would create a single merged list for transmis(cid:173)
`Computer 10 also includes a memory 16 that can be any
`sion back to computer 10.
`form of computer-readable memory. Memory 16 contains a
`browser program 17 that allows users to browse the WWW.
`The meta-search engine operates in similar fashion, with
`The memory 16 may also contain a search engine program 60
`the added capability of allowing search engine 40 to farm the
`18 and an associated merging program 19 for merging
`search out to other search engines located on other servers
`different result lists, however in a client-server configuration
`configured like computer 30. These search engines in turn
`the search engine is often resident on a different computer.
`compile result lists and return them to engine 40 for merging
`The search and merging operations may be performed on
`into a single list by merging program 44. This list is then
`any computer within the network 5.
`65 returned to computer 10.
`Computer 30 may be a server computer or simply another
`In a typical peer-to-peer configuration, a user operating
`peer of computer 10. It is also a standard computer con-
`computer 10 can often enter a query to search engine 18.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For a better understanding of the nature and objects of the
`invention, reference should be made to the following
`detailed description taken in conjunction with the accom(cid:173)
`panying drawings, where:
`FIG. 1 illustrates a generalized computer network con(cid:173)
`structed in accordance with an embodiment of the invention.
`FIG. 2 illustrates processing operations in accordance
`with an embodiment of the invention.
`FIG. 3 illustrates an example of merging multiple result
`lists into a single list in accordance with an embodiment of
`the invention.
`Like reference numerals refer to corresponding parts
`throughout the several views of the drawings.
`
`40
`
`007
`
`
`
`US 6,728,704 B2
`
`10
`
`5
`Search engine 18 then performs two different tasks: it
`searches other computers on the network for data satisfying
`the query, and distributes that query to other search engines
`on the network 5. Here, search engine 18 searches the
`contents of memory 36 of peer computer 30, as well as
`memory 56 of peer computer 50, for data matching the
`query. A result list containing the locations of relevant data
`is then compiled. Search engine 18 also farms the same
`query out to the search engine 40 of peer computer 30, which
`conducts a search in similar fashion, examining the contents
`of peer computers like computers 30 and 50 and compiling
`the results into a list. Note that this process could continue
`recursively, with search engine 40 farming out the same
`query to other search engines in network 20, which in turn
`could farm the search out to other search engines, and so on.
`Each search engine would search some of its peer
`computers, compile a result list, and pass it back up the
`chain. At each stage in this recursive process, search engines
`have to merge the result lists that have been passed back to
`them, with the process continuing until a single merged list
`is created by merging program 19 and is displayed to the
`user.
`As can be seen above, in either the client-server or
`peer-to-peer configuration, as well as in the meta-search
`engine case, an essential step is the merging of several result
`lists into one list. In accordance with the present invention,
`one way to accomplish this merging is now described.
`FIG. 2 illustrates one embodiment of the processing
`operations according to the present invention. In typical
`operation, a query is transmitted to a first search engine, 30
`which in turn transmits the query to other search engines
`(block 70). Eventually, each of these other search engines
`returns a result list that is received by the first search engine
`(block 72). The first search engine then begins to merge the
`result lists according to the processing steps of the present 35
`invention. In essence, the result lists are merged with the
`goal of placing the most relevant entries first for the user's
`convenience. However, to reduce the associated computa(cid:173)
`tional overhead, lists are not merged based on an examina(cid:173)
`tion of every single entry. Rather, they are merged based on 40
`an examination of only a small number of entries from each
`list. Specifically, there is no requirement for examining the
`content of each result item.
`A subset of entries is selected from each list (block 74).
`Lists are merged according to these subsets, rather than an 45
`evaluation of every single entry of every single list. The
`subsets may be selected according to any technique for
`selecting a few items out of a larger group, but three
`preferred embodiments are given. In the first embodiment, a
`number n is chosen and the top n documents are selected 50
`from each list. In the second embodiment, a number n is
`again chosen. The merging program 19 or 44 then selects n
`documents that are uniformly spaced within each result list.
`In the third embodiment, a number n is chosen and n
`documents are selected at random from each list.
`The next processing step is to determine a scoring value
`for each entry in the various subsets selected (block 76).
`Scoring values are numbers that typically represent how
`closely the entry matches the query, where certain number
`ranges indicate an entry that is likely to be relevant to the
`user. This step is well-known in the art and includes such
`embodiments as setting the scoring value equal to the total
`number of times each word in the query appears in the entry.
`The present invention includes the step of determining
`scoring values according to any known method.
`The next processing step is to determine, for each list, a
`representative score of all scoring values determined for its
`
`6
`entries (block 78). The representative score may be the
`arithmetic average or a value proportional to the average for
`a set of scoring values. The present invention includes the
`step of determining this representative score according to
`5 any number of known techniques.
`The next processing step is to merge or rank all entries
`from every list based on the representative score for each list
`(block 80). Once each result list has a representative value
`assigned to it, it is merged with the others accordingly. Two
`preferred embodiments are given for accomplishing this
`operation. In the first embodiment, entries are merged by
`selecting the list with the highest representative value ( e.g.,
`the highest average scoring value). The first entry on the list
`that has not already been selected is then picked. That list's
`representative value is then decremented by a fixed amount
`and the process is repeated until all entries have been picked.
`If any representative value drops below zero after
`decrementing, it is reset to its initial value. In the second
`embodiment, entries are merged using a probabilistic
`approach. Each list is assigned a probability value equal to
`its representative value's percentage of the total represen(cid:173)
`tative values for all lists. Lists are then selected according to
`their probability value, with lists having higher probability
`values being more likely to be selected. When a list is
`25 selected, the first entry on that list that has not already been
`selected is picked. This process is repeated, with the total
`representative value being revised when all entries of a list
`are picked.
`FIG. 3 illustrates a specific example of the processing
`steps of the present invention. Three search engines 100,
`102, 104 process a query and return result lists 110, 112, 114
`for merging. Subsets 120, 122, 124 are selected, a scoring
`value is assigned to each entry, and in this example average
`scoring values 130, 132, 134 are calculated. Based on these
`average scoring values, result lists 110, 112, 114 are merged
`into merged list 140.
`The example of FIG. 3 shows the merging of three result
`lists, each with a specified number of entries, from three
`different search engines. However, the invention should not
`be construed as limited in this fashion; instead, it should be
`construed to include the merging of an arbitrary number of
`result lists, each including an arbitrary number of entries,
`from an arbitrary number of search engines.
`Result lists 110, 112, 114 each contain a number of
`entries. As above, each entry typically includes the location
`and an excerpt of relevant information. Subsets 120, 122,
`124 of result lists 110, 112, 114 are then selected, consisting
`of fewer entries than in each complete result list. The entries
`in subsets 120, 122, 124 are preferably but not necessarily
`selected according to any of the three embodiments
`described in connection with block 74 of FIG. 2. FIG. 3
`shows subsets selected according to the first embodiment,
`where the first n=4 entries are chosen from each list. If the
`55 subsets were selected according to the second embodiment,
`where say n is chosen as n=3, then 3 evenly-spaced entries
`would be selected from each list. In this case, subset 120
`would contain entries lA, 6A (rounding up), and lOA.
`Subset 122 would contain entries lB, 4B, and 7B. Subset
`60 124 would contain entries lC, SC (again rounding up), and
`SC. If the subsets were selected according to the third
`embodiment, with say n=S, then each subset 120, 122, 124
`would include 5 entries selected at random from result lists
`110, 112, 114 respectively.
`Once the entries in each subset are determined, a scoring
`value is assigned to them. As above, this scoring value can
`be assigned according to any scoring function known in the
`
`15
`
`20
`
`65
`
`008
`
`
`
`US 6,728,704 B2
`
`8
`
`15
`
`7
`art. Typically, scoring functions assign a numerical value
`based on the relevance of the entry to the query, with higher
`numerical values indicating greater relevance to the query.
`Typical scoring values are shown next to the entries in
`subsets 120, 122, 124. In this example, the scoring values are 5
`averaged to produce average scoring values 130, 132, 134.
`For example, the scoring values assigned to the entries in
`subset 120 are 15, 12, 10 and 8. The average of these four
`numbers is 11.25, shown in average scoring value 130.
`Result lists 110, 112, 114 can now be merged into merged 10
`list 140. As above, this is accomplished using the represen(cid:173)
`tative value assigned to each list. In this example, the
`representative value assigned to each list is an average
`scoring value.
`In this example, the list with the highest average scoring
`value is selected first. In FIG. 3, this is result list 112, having
`an average scoring value 132 of 14.95. The first unselected
`entry, lB, is selected first. Average scoring value 132 is then
`decremented by some amount. If that amount is chosen to be
`1.0, average scoring value 132 takes on a value of 14.95-
`1.0=13.95. Because 13.95 is still the highest average scoring 20
`value, 2B is chosen next and average scoring value 132 is
`decremented by another 1.0 to take on a value of 12.95.
`Now, the highest average scoring value is value 134, or
`13.225. Entry lC is thus the next entry selected. Scoring
`value 134 is then decremented to 12.225; value 132, which 25
`is at 12.95, is now the highest value again. Entry 3B is thus
`chosen next, and value 132 is decremented to 11.95. This
`means value 134 is now the highest value. Entry 2C is then
`chosen, and value 134 is decremented to 11.225. This means
`value 132 is again the highest value, so entry 4B is selected 30
`next and value 132 is decremented to 10.95. Value 130 is
`now the highest value, so entry lA is chosen and value 130
`is decremented to 11.25-1.0=10.25. This process repeats
`until all entries from all three lists are selected.
`According to another embodiment, each list 110, 112, 114 35
`is assigned a probability value equal to its average scoring
`value's percentage of the total of all average scoring values.
`Entries are then selected from each list based on its prob(cid:173)
`ability value. Here, for instance, the total of all average
`scoring values 130, 132, 134 is 11.25+14.95+13.225=
`39.425. This means result list 110 is assigned a probability
`value equal to (11.25/39.425)100%=28.54%. In like manner,
`result list 112 is assigned a probability value of (14.95/
`39.425)100%=37.92%, and result list 114 is assigned a
`probability value of (13.225/39.425)100%=33.54%. Result 45
`lists are then selected in pseudorandom fashion, where at
`each selection result list 110 has a 28.54% chance of being
`picked, list 112 has a 37.92% chance, and list 114 has a
`33.54% chance. Once a list is selected, the first entry that has
`not already been selected is picked. Once every entry in a list 50
`is selected, the total of all average scoring values is recal(cid:173)
`culated without that list's average scoring value, and the
`process continues until every entry of every list has been
`selected.
`The foregoing descriptions of specific embodiments of the 55
`present invention are presented for purposes of illustration
`and description. They are not intended to be exhaustive or to
`limit the invention to the precise forms disclosed. Obviously
`many modifications and variations are possible in view of
`the above teachings. The embodiments were chosen and 60
`desc