`Koppel et al.
`
`I 11111111111111111 1111111111111111 IIII IIII IIII 11111111111111111
`US 7,257,766 Bl
`Aug. 14, 2007
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US007257766Bl
`
`(54) SITE FINDING
`
`(75)
`
`Inventors: Moshe Koppel, Efrat (IL); Eyal
`Lanxner, Jerusalem (IL)
`
`(73) Assignee: Egocentricity Ltd., Efrat (IL)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U .S.C. 154(b) by 652 days .
`
`(21) Appl. No .: 09/60S,987
`
`(22) Filed:
`
`Jun. 29, 2000
`
`(51)
`
`Int. Cl.
`G06F 17/30
`
`(2006.01)
`
`..................................... 715/501.1 ; 715/513
`(52) U.S. Cl .
`(58) Field of Classification Search .............. 7 15/ 501.1,
`7 15/ 513; 703/3, 5, 104.1
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,884,305 A
`5,890, 172 A
`•
`5,960,429 A
`•
`6, 189,018 Bl •
`6,211,874 Bl •
`6,226,655 Bl •
`6,421,675 Bl •
`6,517,587 B2 •
`
`3/ 1999 Kleinberg et al.
`3/ 1999 Borman et al. .......... 7 15/501.1
`9/ 1999 Peercy et al. .................. 707 /5
`2/2001 Newman et al .
`......... 7 15/501.1
`4/2001 Himmel et al. ............. 345/781
`5/2001 Borman et al. .......... 7 15/501.1
`7/2002 Ryan et al. ................. 707/ 100
`2/2003 Satyavolu et al.
`....... 7 15/501.1
`
`OTHER PUBLICATIONS
`
`Carriere, J. et al. ; "WebQuery: Searching and Visualizing the
`Web through Connectivity"; http://www.cgl.uwaterloo.ca/
`ProjectsNanish/webquery l .hbnl.
`"Google Search-Search the Web with Google!"; http://
`www.enablelink.com/google.htm.
`Chakrabarti,S. et al.; "Mining the Web's Link Strncture";
`IEEE Computer; Aug. 1999; pp. 60 67.
`Kleinberg,] .M.; "AuthorativeSources in a Hyperlinked
`Enviroment"; IBM Reasearch Report RJI 0076(91892),
`topic area "ComputerScience"; May 29, 1997; pp. I 31.
`
`Bharat,K. et al. ; "Improved Algorithmsfor Topic Distillation
`in a Hyperlinked Environment;" 1998; Research and Devel
`opment in Information Retrieval; pp. I 04 111 ; Retrieved
`from
`Internet:
`<http://citeseer.nj .nec.com/
`bharat98improved.html>.
`Chakrabarti,S. et al. ; "Automatic Resource Compilation by
`Analyzing Hyperlink Strncture and Associated Text;" 1998;
`Proceeding of the 7th International World Wide Web Con
`ference; Retrieved from Internet: <http://citeseer.nj .nec.
`com/chakrabarti98automatic.html>.
`Kleinberg,J.M .; "AuthoritativeSources in a Hyperlinked
`Environment;" Sep. 5, 1999; Journal of the ACM; vol.46,
`No.5; pp. 604 632; Retrieved from Internet: <http://citese
`er.nj .nec.com/klienberg97authoritative.hbnl>.
`Pirolli,? . et al. ; "Silk from a Sow's Ear: Extracting Usable
`Strnctures from the Web;" 1996; Proc.(ACM)Conf.Human
`Factors in ComputingSystems; ACM Press; Retrieved from
`Internet: <http://citeseer.nj.nec.com/pirolli96silk.html>.
`
`• cited by examiner
`
`Primary Examiner-Stephen S. Hong
`Assistant Examiner Gregory J. Vaughn
`
`(57)
`
`ABSTRACT
`
`A method of finding WWW pages, each of which includes
`at least one list of links to desired Internet resources,
`comprising:
`
`providing a list of ULRs;
`
`automatically generating at least one query for an Internet
`search tool for WWW pages that include links to at least one
`URL of said list of URLs;
`
`executing said at least one generated query to provide search
`results that include at least one of said searched for WWW
`pages; and
`
`generating a response comprising at least one indication of
`one of said WWW pages, responsive to said search results.
`
`73 Claims, 2 Drawing Sheets
`
`KEY \IOIID S£NDI
`
`FUER A£SUtTS lO
`OIITMI N SlltS
`
`CROUP Rams MO
`GROUPS OF Slzt K
`
`Pm'Olll ~/1( S£A1104[S
`10 FINO P01tlllW. HUBS
`
`IWIK l'OTEliTIU HI.BS
`
`SCILCT ~ NI.IIBER
`OF HUBS
`
`fl.'![.~ SW:CIED HUBS
`
`PRDDIT HIJBS TO USER
`
`001
`
`GOOGLE 1013
`
`
`
`U.S. Patent
`
`Aug. 14, 2007
`
`Sheet 1 of 2
`
`US 7,257,766 Bl
`
`·104
`
`INDEX
`DATABASE
`
`SEARCH
`ENGINE
`
`108
`
`106
`
`RESULT
`ANALYl.ER
`
`110
`
`100--_
`
`"
`
`102·"
`
`USER
`
`FIG.1
`
`120
`\
`\
`\
`~
`
`KEY WORD S~RCH
`
`FILTER RESULTS TO
`OBTAIN N SITES
`
`GROUP RESULTS INTO
`GROUPS OF SIZE K
`
`PERFORM N/K SEARCHES
`TO FIND POTENTIAL HUBS
`, ·
`
`RANK POTENTIAL HUBS
`
`SELECT SMALL NUMBER
`OF' HUBS
`
`122
`
`124
`
`126
`
`128
`.
`
`130
`
`132
`
`134
`
`136
`
`FIG.2
`
`FILTER SELECTED HUBS
`
`PRESENT HUBS TO USER
`
`002
`
`
`
`U.S. Patent
`
`Aug. 14, 2007
`
`Sheet 2 of 2
`
`US 7,257,766 Bl
`
`160
`~
`
`PROVIDE LIST OF LINKS l
`
`.
`
`I
`
`62
`1
`
`FIND POTENTIAL HUBS
`FOR Ll~~KS
`---
`
`-
`
`1 64
`
`1 66
`FILTER POTE~rlAL HUBS ..__/
`
`FIG.3
`
`003
`
`
`
`US 7,257,766 Bl
`
`1
`SITE FINDING
`
`FIELD OF THE INVENTION
`
`The present invention relates to searching for information
`on a data network, and especially to searching utilizing an
`analysis of the results of search engines.
`
`BACKGROUND OF THE INVENTION
`
`It is known in the art to analyze data networks, such as
`journals and journal citations, to determine meta knowledge
`about the field.
`IBM Inc., described a method of determining hubs and
`authorities on the Internet, in U.S. Pat. No. 5,884,305, in a
`U.S. patent application No. 08/813,749 filed Mar. 7, 1997,
`mentioned in the patent and in "Authoritative Sources in a
`Hyperlinked Environment", by Jon M. Kieinberg, in IBM
`research report RJ10076(91892), topic area "Computer
`Science", May 29, 1997, the disclosures of which are
`incorporated herein by reference. Hubs are Internet sites that
`contain links to many other sites in a same field and
`authorities are sites that are pointed to by a significant
`number of relevant sites in a field. An iterative process was
`suggested to determine, from among a predetermined set of
`sites, a kernel of sites that match a hub or authority defini(cid:173)
`tion. In the Kleinberg paper, it is noted that the Internet is to
`be considered a different type of data network than journal
`articles.
`A paper entitled "Mining the Web's Link Structure", by S. 30
`Chakrabarti et al., in IEEE Computer, August 1999, the
`disclosure of which is incorporated herein by reference,
`describes analyzing link structures of WWW pages to deter(cid:173)
`mine hubs and authorities. At a site "http://
`www:google.com", available on Feb. 1, 2000 and for some 35
`time before, a tool "googlescout" is generated for detecting
`WWW sites that are similar to a shown site, for example for
`finding competition.
`A WWW page "www.cgl.uwaterloo.ca/Project/Vanish/
`webquery_l.html", apparently available at least from Dec.
`11, 1996, the disclosure of which is incorporated herein by
`reference, describes the "webquery" project, in which a
`quality of a site that turns up in a search is evaluated based
`on the number of sites linked to the site and the number of
`sites links in the site.
`
`40
`
`2
`then ranking the resulting potential hubs, based on the
`number of groups pointed to by each potential hub. As a
`potential hub might include links to more than one site in an
`input group, the approximation may be significantly differ-
`s ent from the actual number oflinks between a potential hubs
`and individual member sites of input groups. It is noted that
`in some embodiments, that is no final determination of
`which particular site is pointed to by the potential hub.
`An aspect of some embodiments of the invention relates
`10 to a method of automatically determining a hub-potential of
`a site, for example for ranking hubs in a set of potential hub
`or for finding potential hubs in a search. In one embodiment
`of the invention, a hub potential of a site is determined based
`on structural properties of the site, for example, the exist-
`15 ence of a list of links and/or the existence of a text paragraph
`(e.g., a review or description) of many of the links.
`Optionally, the number of links is determined by counting
`the occurrence of the phrases which indicate the presence of
`links, such as "http:" or "href'. Alternatively or additionally,
`20 a hub-potential may be determined based on the usage of key
`terms of the topic the site in general and/or in anchor
`portions of the site in particular, such as a main title or a
`section heading. Alternatively or additionally, a hub(cid:173)
`potential may be determined based on a usage of hub-typical
`25 words or phases, such as "list of links", "links", "index",
`"list", "compilation" and/or "resources". Optionally, these
`words or phrases receive a higher scoring based on their
`location in the site, for example in a title or before a long list
`of links.
`In one embodiment of the invention, the potential hubs are
`ranked and/or filtered before being analyzed in greater
`depth. Alternatively or additionally, the hub generation
`process may create a small set of potential hubs to begin
`with, for example using a threshold setting. Such ranking
`may include, for example, selecting only a subset of those
`sites that point to the input set of sites, for example based on
`the existence of a topic word in those sites, prior to analyzing
`the sites for hub-potential. In another example, potential
`hubs that are found using a search engine are required to
`both include a topic word and at least one link to one group
`of sites from the input site.
`In an embodiment of the invention, hub potential is
`characterized by rules, which may be phrased in a search
`engine command language, so a search for the hubs using
`the search engine returns sites with a higher potential of
`being desired hubs. In an embodiment of the invention, the
`particular features of a search engine, for example, searching
`for URLs or links, disjunctive search and/or pipes, are used
`to perform one or more of the above activities, for example,
`group comparison, rule application and/or thresholding of
`potential hubs, more efficiently.
`In one embodiment of the invention, an input set of sites
`is generated by a user providing a topic or topic words and
`generating, for example by one or more search engine(s)
`and/or internet indexes, a list of sites relevant to that topic.
`Optionally, the list of sites is filtered prior to being used as
`a basis for finding hubs, for example by removing redundant
`and/or mirroring sites.
`Alternatively or additionally, an input set of sites is
`generated from a user provided site. The user provided site
`can be analyzed to find a second set of sites that is similar
`to the provided site. One exemplary method of determining
`similarity is by finding hubs as defined above which point to
`65 the site and selecting links from those hubs as similar sites.
`Another exemplary method is to receive a short list of
`example for such similar sites. Another exemplary method is
`
`45
`
`SUMMARY OF THE INVENTION
`
`50
`
`An object of some embodiments of the invention is
`finding one or more hub sites or lists of WWW pages that
`cover a topic presented by a set of input sites. In an
`embodiment of the invention, the hubs or page lists are
`selected by virtue of their including links to a significant
`number of the sites in the set of the input sites. An expected
`advantage of using hubs is that each may concentrate in it a 55
`large number of links to relevant sites, beyond those pro(cid:173)
`vided in the input set, and also include additional informa(cid:173)
`tion which can help a human user select certain sites for
`browsing.
`An aspect of some embodiments of the invention relates 60
`to selecting a potential hub based on a statistical analysis of
`an Internet link structure, for example, using an approxima(cid:173)
`tion of a number of links from the potential hub to a set of
`input sites; rather than determining which sites from the
`input set are actually pointed to. In one embodiment of the
`invention, this determination is made by searching for
`potential hubs that include links to groups of input sites and
`
`004
`
`
`
`US 7,257,766 Bl
`
`3
`finding sites that contain similar to test to the provided site.
`Optionally, the user provides a set of sites, rather than a
`single site.
`Optionally, hubs that point to the similar sites and not to
`the provided sites are determined. In some embodiments,
`these hubs are treated as hubs to which a link to the provided
`site should be added, for example by suggested to the hub
`operators.
`Alternatively or additionally, an input set of sites is
`generated by analyzing n user provided hub or a hub
`obtained from previous use of hub-finder or a hub con(cid:173)
`structed by combining search results/analysis of existing
`hubs or other user provided information.
`Alternatively or additionally to providing a hub as an
`input, a list of a user's favorite bookmarks or recently or 15
`frequently traveled sites may be used as an input instead.
`Such lists may be considered to comprise a profile of a user,
`for example for advertisement targeting or for finding
`friends or partners. Such a user profiling tool can be used, in
`some embodiments of the invention, to extrapolate from an 20
`existing, studied group of users to a large group which is not
`studied in detail but whose brewsing habits are known.
`A set of sites may be filtered, manually or automatically,
`prior to being used as an input set, for example, a user
`manually selecting a subset of links or a topic word for use
`in analyzing the suitability of the links.
`Optionally, the resulting hubs are considered a set of hubs
`which are similar to the input hub or at least an aspect of the
`input hub, and may thus be presented to a user.
`In one embodiment of the invention, a set of similar hubs
`is analyzed, to harvest information which may be useful, for
`example to the owner of the provided hub. In one example,
`the links of the similar hubs are collated, filtered and/or
`ranked, to detect links or textual descriptive material of links 35
`that are missing from the input hub and might be desirable.
`In another example, links that exist in the provided hub are
`ranked based on the particulars of the appearance of such
`links in the similar hubs. In another example, a new hub is
`created, possibly ad-hoc, based on the analyzed similar 40
`hubs.
`The similar hubs that are found may be real hubs searched
`for in the Internet. Alternatively to finding Internet hubs,
`interest hubs of users may be determined. A database of
`user's browsing habits or favorite links may be considered
`as hubs, one for each user. The search for hubs then
`comprises searching in this database for users, whose inter-
`est hubs are relevant to a provided set of input hubs. The
`expansion of sites into hubs may be performed on the
`Internet, in which case the found hubs reflect the common
`association of links. These hubs may be used to find links
`that exist in the database of user habits. Alternatively also
`the expansion of sites into hubs is performed in the user
`habits database, in which case the found hubs reflect the
`preferences of the particular user in the database. A simi- 55
`larity between user browsing habits ( or favorite links) and
`hub sites, which may be noted, is that both are lists of links
`that are organized by a thinking being to reflect a particular
`thought, topic or personality.
`An aspect of some embodiments of the invention relates 60
`to a method of presenting a list of hub sites. Alternatively or
`additionally, to providing as a list of sites, the sites may be
`provided along with auxiliary information, for example,
`information about link structure, such as number of links,
`number of unique links (not in other pages), number of 65
`popular links ( on at least k pages), amount of explanation for
`each link, method of ordering of links in the page
`
`4
`(alphabetic, topical, regional, racked, etc.), information cop(cid:173)
`ied from the target pages, such as the links themselves and/or
`explanations about the links. Copied information may be
`collated, for example, by target link ( or equivalent links), or
`5 grouped according to other criteria, such as length,
`alphabetic, topic, rank, region and/or repetition.
`There is thus provided in accordance with an exemplary
`embodiment of the invention, a method of finding WWW
`pages, each of which includes at least one list of links to
`10 desired Internet resources, comprising:
`providing a list of URLs;
`automatically generating at least one query for an Internet
`search tool for WWW pages that include links to at least one
`URL of said list of URLs;
`executing said at least one generated query to provide
`search results that include at least one of said searched for
`WWW pages; and
`generating a response comprising at least one indication
`of one of said WWW pages, responsive to said search
`results. Optionally, the method comprises displaying said
`response to a user. Alternatively or additionally, said at least
`one URL comprises a plurality of URLs. Alternatively or
`additionally, said response is generated using a singe search
`step and no iterations. Alternatively or additionally, said
`method comprises ranking said search results. Optionally,
`ranking of a WWW page is responsive to a number of groups
`of URLs pointed to by said WWW page.
`In an exemplary embodiment of the invention, said gen-
`30 erating at least one search query, comprises:
`dividing said list of URLs into a plurality of groups and
`generating at least a single query for each group, wherein
`said at least a single query does not differentiate which URL
`in said group is pointed to by the results of the search,
`wherein said executing comprises executing said gener(cid:173)
`ated at least one query for a plurality of said groups,
`generating a plurality of result lists. Optionally, all of said
`groups have a same number of members. Alternatively, at
`least three of said groups have a different number of mem(cid:173)
`bers from each other.
`In an exemplary embodiment of the invention, the method
`comprises collating said result lists into a single list of search
`results. Optionally, the method comprises ranking the con-
`45 tents of at least one of said result lists. Optionally, said
`collating is responsive to said ranking of said at least one of
`said result lists. Alternatively or additionally, said ranking is
`applied to said result list after it is generated. Optionally, the
`method comprises filtering said at least one result list
`responsive to said ranking.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing is applied to said result list during said execution.
`Optionally, said ranking is applied by adding at least one
`limitation to said at least one generated search query.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing comprises ranking responsive to a number of said URLs
`pointed to by said result list. Alternatively or additionally,
`said ranking comprises ranking responsive to a morphologi(cid:173)
`cal property of pages of said at least one result list.
`Optionally, said morphological property comprises the exist(cid:173)
`ence of a link list.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing indicates a probability of a ranked page being a hub.
`Alternatively or additionally, said ranking comprises rank(cid:173)
`ing responsive to the presence of at least one key word in
`pages of said at least one result list. Optionally, said key
`word comprises a word that is related to a content of said list
`
`25
`
`50
`
`005
`
`
`
`US 7,257,766 Bl
`
`6
`method comprises displaying said response to a user. Alter(cid:173)
`natively or additionally, said at least one WWW page
`comprises a plurality of WWW pages. Optionally, said
`providing comprises providing a WWW page including
`5 having a link to said at least one URL.
`In an exemplary embodiment of the invention, said pro(cid:173)
`viding comprises providing a list of a plurality of URLs.
`Alternatively or additionally, generating a list of related
`URLs, comprises generating a list of competition URLs.
`10 Alternatively or additionally, generating a list of related
`URLs, comprises generating a list of similar URLs. Alter(cid:173)
`natively or additionally, generating a list of related URLs,
`comprises finding WWW pages characterized in that a
`common WWW page includes links to at least one of said
`15 WWW pages and at least one of said at least one URL.
`Alternatively or additionally, said determining comprises
`executing a query on a search engine.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`5
`ofURLs. Alternatively or additionally, said key word com(cid:173)
`prises a word that serves as a statistical indicator that the
`page is a hub. Optionally, said key word is selected from the
`group "links", "index" and "resource".
`In an exemplary embodiment of the invention, said pro(cid:173)
`viding comprises a user providing a list of URLs.
`Optionally, said use provided list ofURLs comprises at least
`a part of a URL bookmark file.
`In an exemplary embodiment of the invention, a method
`according to claim 1, wherein said providing comprises a
`user providing a WWW page including a list of URLs.
`Alternatively or additionally, said providing comprises:
`a user providing one or more topic words; and
`executing a preliminary search to find a list of URLs
`related to said one or more topic words. Alternatively or
`additionally, said providing comprises:
`a user providing a WWW page; and
`executing a preliminary search to find a list of URLs that
`point to pages similar to the provided WWW page. 20
`Optionally, said executing said at least one generated query
`comprises executing said at least one query to ignore WWW
`pages that include links to said user provided URL.
`In an exemplary embodiment of the invention, the method
`comprises filtering said search results before said generat(cid:173)
`ing. Alternatively or additionally, said search tool comprises
`a search engine. Optionally, said executing said at least one
`query comprises executing using a pipe feature of said
`search engine to limit a second search step to a list of sites
`found in a first search step using said search engine.
`In an exemplary embodiment of the invention, said
`response comprises a list of said WWW pages. Optionally,
`said response includes link statistics for said WWW pages.
`Optionally, said link statistics include a number of links in
`each WWW page. Alternatively or additionally, said link
`statistics include an indicator of a uniqueness oflinks in each
`WWW page. Alternatively or additionally, said link statistics
`include an indicator of an amount of information associated
`with links in each WWW page.
`In an exemplary embodiment of the invention, said
`response comprises a list of links listed in at least one of said
`WWW pages. Optionally, said response comprises a list of
`links listed in at least a given number of said WWW pages.
`Optionally, said given number is greater than 1.
`Alternatively, said given number is greater than 2.
`In an exemplary embodiment of the invention, said list is
`arranged by WWW pages. Alternatively or additionally, said
`list comprises information associated with a link in its
`corresponding WWW page. Alternatively or additionally, 50
`said list indicates pages not including a link to any URL in
`a predetermined list of URLs. Alternatively or additionally,
`said list indicates pages not including a link from the
`contents of any URL in a predetermined list of URLs.
`Optionally, said predetermined list is provided by a user.
`There is also provided in accordance with an exemplary
`embodiment of the invention, a method of finding WWW
`pages, each of which includes at least one list of links to
`desired Internet resources, comprising:
`providing at least one URL;
`generating a list ofURLs related to said at least one URL;
`determining at least one WWW page that includes links to
`at least one URL of said list ofURLs but not to said provided
`at least one URL; and
`generating a response comprising at least one indication
`of one of said at least one WWW page. Optionally, the
`
`40
`
`Particular embodiments of the invention will be described
`with reference to the following description of some embodi(cid:173)
`ments of the invention in conjunction with the figures,
`wherein identical structures, elements or parts which appear
`25 in more then one figure are optionally labeled with a same
`or similar number in all the figures in which they appear, in
`which:
`FIG. 1 is a schematic illustration of a configuration of a
`search engine in accordance with an exemplary embodiment
`of the invention;
`FIG. 2 is a flowchart of a method for finding hubs, in
`accordance with a exemplary embodiment of the invention;
`and
`FIG. 3 is a flowchart of a method of finding sites similar
`35 to a provided list of sites, in accordance with an exemplary
`embodiment of the invention.
`
`30
`
`DETAILED DESCRIPTION OF SOME
`EMBODIMENTS GENERAL
`
`FIG. 1 is a schematic illustration of a configuration 100 of
`a search engine 106 in accordance with an exemplary
`embodiment of the invention. A user 102 uses search engine
`106 for finding sites of interest on an Internet 104. The
`45 connection to search engine 106 is typically also through
`Internet 104, but is not required. Typically, search engine
`106 utilizes a database 108 that contains indexes and other
`information relating to WWW pages known to search engine
`106. In a typical search engine, a user provides terms and the
`engine responds with a list of sites that include some of the
`terms. Some, more advanced search engines also provide
`sites that appear to be related for various reasons. In some
`embodiments of the invention, a directory including an
`index of which sites link to which other sites is used as a
`55 search tool.
`A search engine result analyzer 110 is optionally
`provided, to analyze the results of the search of index 108 by
`search engine 106 and to provide analyzed results to user
`102. Optionally, as will be described below, analyzer 110
`60 also executes particular searches on search engine 106.
`Although result analyzer 110 is optionally configured to
`work best with a particular search engine 106, a same
`analyzer can work with a plurality of search engines.
`An analysis of search results is generally desired as search
`65 engines do not typically provide a single or small number of
`exactly matching sites, rather, based on keyboards or subject
`fields, a large number of sites that might be suitable are
`
`006
`
`
`
`US 7,257,766 Bl
`
`7
`provided. Wading through a long list of sites is extremely
`time consuming. One reason for this required wading is the
`lack of suitable software for determining if a particular site
`is really relevant to user 102. Also, valuable sites are often
`missed. Even indexing sites, such as Yahoo!, which use 5
`human indexers, often do not supply a suitable site, for
`several reasons, including, (a) not being up to date; (b) lack
`of coverage over much of the Internet; ( c) lack of suitable
`manpower and/or time for such manpower to cover all the
`myriad subjects on the Internet; and (d) lack of a suitable 10
`index structure.
`Typical reasons that a user browses the Internet for
`information include:
`(a) searching for an answer to a particular question,
`optionally answered by an authority on that question;
`(b) looking for an overview of a particular field; and
`( c) searching for a set of sites, from which the user can
`derive his or her own conclusions.
`The inventors of the present invention have realized that
`in many fields there are interested people who have com(cid:173)
`piled their own listing of relevant sites and an analysis of
`each relevant site; such listing sites are known as hubs. Thus,
`it is generally useful to provide a user with a short list of
`such hubs. The inventors have also realized that reviewing 25
`such hubs by a user may be a better way for the user to find
`a dependable and knowledgeable authority in a field, than by
`merely relying on an automated program that analyses links
`between sites. Neglecting a search for potential authorities,
`in accordance with some embodiments of the invention can 30
`allow a faster method to be used for finding hubs. Once these
`hubs are determined, there are other, further types of analy-
`sis that can be usefully presented to a user and answer other
`information gathering questions the user might have.
`Following are several methods of analyzing search results
`to assist an interested user 102 in finding one or a small
`number of relevant sites or hub sites. Although not explicitiy
`described in each of the below-described methods, addi(cid:173)
`tional filtering steps for rejecting certain sites as being
`unsuitable may be provided. Also it is noted that a block
`portion of one method may be suitable for inclusion, as is,
`in another method, as is also described in the exemplary
`implementation method, described herein.
`A user may have a particular question to which be desires
`an answer. However, the words used in the question often do
`not match the words used in the field, or in the particular site
`that holds the answer to the user's question. In some cases,
`there is no common way of describing the subject of the
`question. Each hub can be considered, among other things,
`to be a dictionary of synonyms. Once a user finds one hub,
`the common usage of names of describe the subject of the
`question, generally becomes clear.
`
`40
`
`35
`
`8
`removing sites with low ratings on based on the site size or
`creation/revision date. After filtering, there are N sites,
`where N may be a result of the filtering or the filtering may
`be adapted to achieve a desired value for N.
`The filtered search results are then grouped into groups
`(126), of size K, for example K=4. In some embodiments of
`the invention, K is a function of N. Alternatively or
`additionally, K is a function of the results in the group, for
`example, one group may have a small number of high
`ranking results while another group has a large number of
`low-ranking results. It is not, however, required that all
`groups be the same size. Various grouping methods may be
`used, for example, randomly selecting sites, based on order
`in search results ( either selecting blocks or sites, or selecting
`15 evenly or non-evenly spaced sites from the search results),
`based on a ranking method, to create groups with balanced
`ranks or search order ( e.g., two high and two low ranks)
`and/or grouping similar sites together. Optionally, the size of
`the group may be inversely related to the ranking of sites in
`20 the groups.
`A plurality of potential hubs are determined in step 128,
`by searching for sites that include links to any site in one of
`the groups. Only NIK searches are required. In an exemplary
`search engine 106, for each group the search is for sites that
`include reference to or link to the http address of at least one
`of the sites in the group, e.g. the search term being: "www/
`sitel.com OR www.site2.com OR www.site3.com/stuff OR
`www.site4.com". Optionally, this search includes a request
`for ranking of search results by the search engine.
`The results of all the searches are collated and then
`optionally ranked (130). In an exemplary ranking scheme, a
`two digit number is used, the tens being the number of
`searches the site came up on and the ones being the existence
`and number of special keywords that appear in the potential
`hub (to be described below). A four digit scheme may also
`be used. Also, different ranking methods or different weights
`for the different factors may be used. Exemplary special
`keywords are words that indicate that a site is more likely to
`be a hub (described below) or words from the subject topic
`or from the original search. In some cases, such topic words
`can be gleaned from the original search results (122), for
`example from the page topics or provided by a user.
`In a step 132, a small number of hubs are selected for
`45 further consideration, for example based on the ranking.
`In an optional step 134, the selected hubs are filtered to
`remove sites that are not desirable, for example based on an
`analysis of their content. Exemplary analysis rules that can
`be applied are: counting the number of links from the site;
`50 counting the number of links which also appear on other
`potential hubs; eliminating potential hubs which are almost
`identical to other potential hubs. Typically, but not
`necessarily, a larger number of links indicates a more
`desirable hub. If however, the number of links is too high,
`55 this may indicate an omnibus hub that may be too difficult
`to use, if it is not organized. A later optional step of
`analyzing the amount of content associated with each link
`and/or the organization of the links may be used to deter(cid:173)
`mine if such an onmibus hub is suitable for the user.
`The filtered hubs are then presented to user 102 (136). In
`an exemplary embodiment of the i