throbber
(12) United States Patent
`Koppel et al.
`
`I 11111111111111111 1111111111111111 IIII IIII IIII 11111111111111111
`US 7,257,766 Bl
`Aug. 14, 2007
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US007257766Bl
`
`(54) SITE FINDING
`
`(75)
`
`Inventors: Moshe Koppel, Efrat (IL); Eyal
`Lanxner, Jerusalem (IL)
`
`(73) Assignee: Egocentricity Ltd., Efrat (IL)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U .S.C. 154(b) by 652 days .
`
`(21) Appl. No .: 09/60S,987
`
`(22) Filed:
`
`Jun. 29, 2000
`
`(51)
`
`Int. Cl.
`G06F 17/30
`
`(2006.01)
`
`..................................... 715/501.1 ; 715/513
`(52) U.S. Cl .
`(58) Field of Classification Search .............. 7 15/ 501.1,
`7 15/ 513; 703/3, 5, 104.1
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,884,305 A
`5,890, 172 A
`•
`5,960,429 A
`•
`6, 189,018 Bl •
`6,211,874 Bl •
`6,226,655 Bl •
`6,421,675 Bl •
`6,517,587 B2 •
`
`3/ 1999 Kleinberg et al.
`3/ 1999 Borman et al. .......... 7 15/501.1
`9/ 1999 Peercy et al. .................. 707 /5
`2/2001 Newman et al .
`......... 7 15/501.1
`4/2001 Himmel et al. ............. 345/781
`5/2001 Borman et al. .......... 7 15/501.1
`7/2002 Ryan et al. ................. 707/ 100
`2/2003 Satyavolu et al.
`....... 7 15/501.1
`
`OTHER PUBLICATIONS
`
`Carriere, J. et al. ; "WebQuery: Searching and Visualizing the
`Web through Connectivity"; http://www.cgl.uwaterloo.ca/
`ProjectsNanish/webquery l .hbnl.
`"Google Search-Search the Web with Google!"; http://
`www.enablelink.com/google.htm.
`Chakrabarti,S. et al.; "Mining the Web's Link Strncture";
`IEEE Computer; Aug. 1999; pp. 60 67.
`Kleinberg,] .M.; "AuthorativeSources in a Hyperlinked
`Enviroment"; IBM Reasearch Report RJI 0076(91892),
`topic area "ComputerScience"; May 29, 1997; pp. I 31.
`
`Bharat,K. et al. ; "Improved Algorithmsfor Topic Distillation
`in a Hyperlinked Environment;" 1998; Research and Devel
`opment in Information Retrieval; pp. I 04 111 ; Retrieved
`from
`Internet:
`<http://citeseer.nj .nec.com/
`bharat98improved.html>.
`Chakrabarti,S. et al. ; "Automatic Resource Compilation by
`Analyzing Hyperlink Strncture and Associated Text;" 1998;
`Proceeding of the 7th International World Wide Web Con
`ference; Retrieved from Internet: <http://citeseer.nj .nec.
`com/chakrabarti98automatic.html>.
`Kleinberg,J.M .; "AuthoritativeSources in a Hyperlinked
`Environment;" Sep. 5, 1999; Journal of the ACM; vol.46,
`No.5; pp. 604 632; Retrieved from Internet: <http://citese
`er.nj .nec.com/klienberg97authoritative.hbnl>.
`Pirolli,? . et al. ; "Silk from a Sow's Ear: Extracting Usable
`Strnctures from the Web;" 1996; Proc.(ACM)Conf.Human
`Factors in ComputingSystems; ACM Press; Retrieved from
`Internet: <http://citeseer.nj.nec.com/pirolli96silk.html>.
`
`• cited by examiner
`
`Primary Examiner-Stephen S. Hong
`Assistant Examiner Gregory J. Vaughn
`
`(57)
`
`ABSTRACT
`
`A method of finding WWW pages, each of which includes
`at least one list of links to desired Internet resources,
`comprising:
`
`providing a list of ULRs;
`
`automatically generating at least one query for an Internet
`search tool for WWW pages that include links to at least one
`URL of said list of URLs;
`
`executing said at least one generated query to provide search
`results that include at least one of said searched for WWW
`pages; and
`
`generating a response comprising at least one indication of
`one of said WWW pages, responsive to said search results.
`
`73 Claims, 2 Drawing Sheets
`
`KEY \IOIID S£NDI
`
`FUER A£SUtTS lO
`OIITMI N SlltS
`
`CROUP Rams MO
`GROUPS OF Slzt K
`
`Pm'Olll ~/1( S£A1104[S
`10 FINO P01tlllW. HUBS
`
`IWIK l'OTEliTIU HI.BS
`
`SCILCT ~ NI.IIBER
`OF HUBS
`
`fl.'![.~ SW:CIED HUBS
`
`PRDDIT HIJBS TO USER
`
`001
`
`GOOGLE 1013
`
`

`

`U.S. Patent
`
`Aug. 14, 2007
`
`Sheet 1 of 2
`
`US 7,257,766 Bl
`
`·104
`
`INDEX
`DATABASE
`
`SEARCH
`ENGINE
`
`108
`
`106
`
`RESULT
`ANALYl.ER
`
`110
`
`100--_
`
`"
`
`102·"
`
`USER
`
`FIG.1
`
`120
`\
`\
`\
`~
`
`KEY WORD S~RCH
`
`FILTER RESULTS TO
`OBTAIN N SITES
`
`GROUP RESULTS INTO
`GROUPS OF SIZE K
`
`PERFORM N/K SEARCHES
`TO FIND POTENTIAL HUBS
`, ·
`
`RANK POTENTIAL HUBS
`
`SELECT SMALL NUMBER
`OF' HUBS
`
`122
`
`124
`
`126
`
`128
`.
`
`130
`
`132
`
`134
`
`136
`
`FIG.2
`
`FILTER SELECTED HUBS
`
`PRESENT HUBS TO USER
`
`002
`
`

`

`U.S. Patent
`
`Aug. 14, 2007
`
`Sheet 2 of 2
`
`US 7,257,766 Bl
`
`160
`~
`
`PROVIDE LIST OF LINKS l
`
`.
`
`I
`
`62
`1
`
`FIND POTENTIAL HUBS
`FOR Ll~~KS
`---
`
`-
`
`1 64
`
`1 66
`FILTER POTE~rlAL HUBS ..__/
`
`FIG.3
`
`003
`
`

`

`US 7,257,766 Bl
`
`1
`SITE FINDING
`
`FIELD OF THE INVENTION
`
`The present invention relates to searching for information
`on a data network, and especially to searching utilizing an
`analysis of the results of search engines.
`
`BACKGROUND OF THE INVENTION
`
`It is known in the art to analyze data networks, such as
`journals and journal citations, to determine meta knowledge
`about the field.
`IBM Inc., described a method of determining hubs and
`authorities on the Internet, in U.S. Pat. No. 5,884,305, in a
`U.S. patent application No. 08/813,749 filed Mar. 7, 1997,
`mentioned in the patent and in "Authoritative Sources in a
`Hyperlinked Environment", by Jon M. Kieinberg, in IBM
`research report RJ10076(91892), topic area "Computer
`Science", May 29, 1997, the disclosures of which are
`incorporated herein by reference. Hubs are Internet sites that
`contain links to many other sites in a same field and
`authorities are sites that are pointed to by a significant
`number of relevant sites in a field. An iterative process was
`suggested to determine, from among a predetermined set of
`sites, a kernel of sites that match a hub or authority defini(cid:173)
`tion. In the Kleinberg paper, it is noted that the Internet is to
`be considered a different type of data network than journal
`articles.
`A paper entitled "Mining the Web's Link Structure", by S. 30
`Chakrabarti et al., in IEEE Computer, August 1999, the
`disclosure of which is incorporated herein by reference,
`describes analyzing link structures of WWW pages to deter(cid:173)
`mine hubs and authorities. At a site "http://
`www:google.com", available on Feb. 1, 2000 and for some 35
`time before, a tool "googlescout" is generated for detecting
`WWW sites that are similar to a shown site, for example for
`finding competition.
`A WWW page "www.cgl.uwaterloo.ca/Project/Vanish/
`webquery_l.html", apparently available at least from Dec.
`11, 1996, the disclosure of which is incorporated herein by
`reference, describes the "webquery" project, in which a
`quality of a site that turns up in a search is evaluated based
`on the number of sites linked to the site and the number of
`sites links in the site.
`
`40
`
`2
`then ranking the resulting potential hubs, based on the
`number of groups pointed to by each potential hub. As a
`potential hub might include links to more than one site in an
`input group, the approximation may be significantly differ-
`s ent from the actual number oflinks between a potential hubs
`and individual member sites of input groups. It is noted that
`in some embodiments, that is no final determination of
`which particular site is pointed to by the potential hub.
`An aspect of some embodiments of the invention relates
`10 to a method of automatically determining a hub-potential of
`a site, for example for ranking hubs in a set of potential hub
`or for finding potential hubs in a search. In one embodiment
`of the invention, a hub potential of a site is determined based
`on structural properties of the site, for example, the exist-
`15 ence of a list of links and/or the existence of a text paragraph
`(e.g., a review or description) of many of the links.
`Optionally, the number of links is determined by counting
`the occurrence of the phrases which indicate the presence of
`links, such as "http:" or "href'. Alternatively or additionally,
`20 a hub-potential may be determined based on the usage of key
`terms of the topic the site in general and/or in anchor
`portions of the site in particular, such as a main title or a
`section heading. Alternatively or additionally, a hub(cid:173)
`potential may be determined based on a usage of hub-typical
`25 words or phases, such as "list of links", "links", "index",
`"list", "compilation" and/or "resources". Optionally, these
`words or phrases receive a higher scoring based on their
`location in the site, for example in a title or before a long list
`of links.
`In one embodiment of the invention, the potential hubs are
`ranked and/or filtered before being analyzed in greater
`depth. Alternatively or additionally, the hub generation
`process may create a small set of potential hubs to begin
`with, for example using a threshold setting. Such ranking
`may include, for example, selecting only a subset of those
`sites that point to the input set of sites, for example based on
`the existence of a topic word in those sites, prior to analyzing
`the sites for hub-potential. In another example, potential
`hubs that are found using a search engine are required to
`both include a topic word and at least one link to one group
`of sites from the input site.
`In an embodiment of the invention, hub potential is
`characterized by rules, which may be phrased in a search
`engine command language, so a search for the hubs using
`the search engine returns sites with a higher potential of
`being desired hubs. In an embodiment of the invention, the
`particular features of a search engine, for example, searching
`for URLs or links, disjunctive search and/or pipes, are used
`to perform one or more of the above activities, for example,
`group comparison, rule application and/or thresholding of
`potential hubs, more efficiently.
`In one embodiment of the invention, an input set of sites
`is generated by a user providing a topic or topic words and
`generating, for example by one or more search engine(s)
`and/or internet indexes, a list of sites relevant to that topic.
`Optionally, the list of sites is filtered prior to being used as
`a basis for finding hubs, for example by removing redundant
`and/or mirroring sites.
`Alternatively or additionally, an input set of sites is
`generated from a user provided site. The user provided site
`can be analyzed to find a second set of sites that is similar
`to the provided site. One exemplary method of determining
`similarity is by finding hubs as defined above which point to
`65 the site and selecting links from those hubs as similar sites.
`Another exemplary method is to receive a short list of
`example for such similar sites. Another exemplary method is
`
`45
`
`SUMMARY OF THE INVENTION
`
`50
`
`An object of some embodiments of the invention is
`finding one or more hub sites or lists of WWW pages that
`cover a topic presented by a set of input sites. In an
`embodiment of the invention, the hubs or page lists are
`selected by virtue of their including links to a significant
`number of the sites in the set of the input sites. An expected
`advantage of using hubs is that each may concentrate in it a 55
`large number of links to relevant sites, beyond those pro(cid:173)
`vided in the input set, and also include additional informa(cid:173)
`tion which can help a human user select certain sites for
`browsing.
`An aspect of some embodiments of the invention relates 60
`to selecting a potential hub based on a statistical analysis of
`an Internet link structure, for example, using an approxima(cid:173)
`tion of a number of links from the potential hub to a set of
`input sites; rather than determining which sites from the
`input set are actually pointed to. In one embodiment of the
`invention, this determination is made by searching for
`potential hubs that include links to groups of input sites and
`
`004
`
`

`

`US 7,257,766 Bl
`
`3
`finding sites that contain similar to test to the provided site.
`Optionally, the user provides a set of sites, rather than a
`single site.
`Optionally, hubs that point to the similar sites and not to
`the provided sites are determined. In some embodiments,
`these hubs are treated as hubs to which a link to the provided
`site should be added, for example by suggested to the hub
`operators.
`Alternatively or additionally, an input set of sites is
`generated by analyzing n user provided hub or a hub
`obtained from previous use of hub-finder or a hub con(cid:173)
`structed by combining search results/analysis of existing
`hubs or other user provided information.
`Alternatively or additionally to providing a hub as an
`input, a list of a user's favorite bookmarks or recently or 15
`frequently traveled sites may be used as an input instead.
`Such lists may be considered to comprise a profile of a user,
`for example for advertisement targeting or for finding
`friends or partners. Such a user profiling tool can be used, in
`some embodiments of the invention, to extrapolate from an 20
`existing, studied group of users to a large group which is not
`studied in detail but whose brewsing habits are known.
`A set of sites may be filtered, manually or automatically,
`prior to being used as an input set, for example, a user
`manually selecting a subset of links or a topic word for use
`in analyzing the suitability of the links.
`Optionally, the resulting hubs are considered a set of hubs
`which are similar to the input hub or at least an aspect of the
`input hub, and may thus be presented to a user.
`In one embodiment of the invention, a set of similar hubs
`is analyzed, to harvest information which may be useful, for
`example to the owner of the provided hub. In one example,
`the links of the similar hubs are collated, filtered and/or
`ranked, to detect links or textual descriptive material of links 35
`that are missing from the input hub and might be desirable.
`In another example, links that exist in the provided hub are
`ranked based on the particulars of the appearance of such
`links in the similar hubs. In another example, a new hub is
`created, possibly ad-hoc, based on the analyzed similar 40
`hubs.
`The similar hubs that are found may be real hubs searched
`for in the Internet. Alternatively to finding Internet hubs,
`interest hubs of users may be determined. A database of
`user's browsing habits or favorite links may be considered
`as hubs, one for each user. The search for hubs then
`comprises searching in this database for users, whose inter-
`est hubs are relevant to a provided set of input hubs. The
`expansion of sites into hubs may be performed on the
`Internet, in which case the found hubs reflect the common
`association of links. These hubs may be used to find links
`that exist in the database of user habits. Alternatively also
`the expansion of sites into hubs is performed in the user
`habits database, in which case the found hubs reflect the
`preferences of the particular user in the database. A simi- 55
`larity between user browsing habits ( or favorite links) and
`hub sites, which may be noted, is that both are lists of links
`that are organized by a thinking being to reflect a particular
`thought, topic or personality.
`An aspect of some embodiments of the invention relates 60
`to a method of presenting a list of hub sites. Alternatively or
`additionally, to providing as a list of sites, the sites may be
`provided along with auxiliary information, for example,
`information about link structure, such as number of links,
`number of unique links (not in other pages), number of 65
`popular links ( on at least k pages), amount of explanation for
`each link, method of ordering of links in the page
`
`4
`(alphabetic, topical, regional, racked, etc.), information cop(cid:173)
`ied from the target pages, such as the links themselves and/or
`explanations about the links. Copied information may be
`collated, for example, by target link ( or equivalent links), or
`5 grouped according to other criteria, such as length,
`alphabetic, topic, rank, region and/or repetition.
`There is thus provided in accordance with an exemplary
`embodiment of the invention, a method of finding WWW
`pages, each of which includes at least one list of links to
`10 desired Internet resources, comprising:
`providing a list of URLs;
`automatically generating at least one query for an Internet
`search tool for WWW pages that include links to at least one
`URL of said list of URLs;
`executing said at least one generated query to provide
`search results that include at least one of said searched for
`WWW pages; and
`generating a response comprising at least one indication
`of one of said WWW pages, responsive to said search
`results. Optionally, the method comprises displaying said
`response to a user. Alternatively or additionally, said at least
`one URL comprises a plurality of URLs. Alternatively or
`additionally, said response is generated using a singe search
`step and no iterations. Alternatively or additionally, said
`method comprises ranking said search results. Optionally,
`ranking of a WWW page is responsive to a number of groups
`of URLs pointed to by said WWW page.
`In an exemplary embodiment of the invention, said gen-
`30 erating at least one search query, comprises:
`dividing said list of URLs into a plurality of groups and
`generating at least a single query for each group, wherein
`said at least a single query does not differentiate which URL
`in said group is pointed to by the results of the search,
`wherein said executing comprises executing said gener(cid:173)
`ated at least one query for a plurality of said groups,
`generating a plurality of result lists. Optionally, all of said
`groups have a same number of members. Alternatively, at
`least three of said groups have a different number of mem(cid:173)
`bers from each other.
`In an exemplary embodiment of the invention, the method
`comprises collating said result lists into a single list of search
`results. Optionally, the method comprises ranking the con-
`45 tents of at least one of said result lists. Optionally, said
`collating is responsive to said ranking of said at least one of
`said result lists. Alternatively or additionally, said ranking is
`applied to said result list after it is generated. Optionally, the
`method comprises filtering said at least one result list
`responsive to said ranking.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing is applied to said result list during said execution.
`Optionally, said ranking is applied by adding at least one
`limitation to said at least one generated search query.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing comprises ranking responsive to a number of said URLs
`pointed to by said result list. Alternatively or additionally,
`said ranking comprises ranking responsive to a morphologi(cid:173)
`cal property of pages of said at least one result list.
`Optionally, said morphological property comprises the exist(cid:173)
`ence of a link list.
`In an exemplary embodiment of the invention, said rank(cid:173)
`ing indicates a probability of a ranked page being a hub.
`Alternatively or additionally, said ranking comprises rank(cid:173)
`ing responsive to the presence of at least one key word in
`pages of said at least one result list. Optionally, said key
`word comprises a word that is related to a content of said list
`
`25
`
`50
`
`005
`
`

`

`US 7,257,766 Bl
`
`6
`method comprises displaying said response to a user. Alter(cid:173)
`natively or additionally, said at least one WWW page
`comprises a plurality of WWW pages. Optionally, said
`providing comprises providing a WWW page including
`5 having a link to said at least one URL.
`In an exemplary embodiment of the invention, said pro(cid:173)
`viding comprises providing a list of a plurality of URLs.
`Alternatively or additionally, generating a list of related
`URLs, comprises generating a list of competition URLs.
`10 Alternatively or additionally, generating a list of related
`URLs, comprises generating a list of similar URLs. Alter(cid:173)
`natively or additionally, generating a list of related URLs,
`comprises finding WWW pages characterized in that a
`common WWW page includes links to at least one of said
`15 WWW pages and at least one of said at least one URL.
`Alternatively or additionally, said determining comprises
`executing a query on a search engine.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`5
`ofURLs. Alternatively or additionally, said key word com(cid:173)
`prises a word that serves as a statistical indicator that the
`page is a hub. Optionally, said key word is selected from the
`group "links", "index" and "resource".
`In an exemplary embodiment of the invention, said pro(cid:173)
`viding comprises a user providing a list of URLs.
`Optionally, said use provided list ofURLs comprises at least
`a part of a URL bookmark file.
`In an exemplary embodiment of the invention, a method
`according to claim 1, wherein said providing comprises a
`user providing a WWW page including a list of URLs.
`Alternatively or additionally, said providing comprises:
`a user providing one or more topic words; and
`executing a preliminary search to find a list of URLs
`related to said one or more topic words. Alternatively or
`additionally, said providing comprises:
`a user providing a WWW page; and
`executing a preliminary search to find a list of URLs that
`point to pages similar to the provided WWW page. 20
`Optionally, said executing said at least one generated query
`comprises executing said at least one query to ignore WWW
`pages that include links to said user provided URL.
`In an exemplary embodiment of the invention, the method
`comprises filtering said search results before said generat(cid:173)
`ing. Alternatively or additionally, said search tool comprises
`a search engine. Optionally, said executing said at least one
`query comprises executing using a pipe feature of said
`search engine to limit a second search step to a list of sites
`found in a first search step using said search engine.
`In an exemplary embodiment of the invention, said
`response comprises a list of said WWW pages. Optionally,
`said response includes link statistics for said WWW pages.
`Optionally, said link statistics include a number of links in
`each WWW page. Alternatively or additionally, said link
`statistics include an indicator of a uniqueness oflinks in each
`WWW page. Alternatively or additionally, said link statistics
`include an indicator of an amount of information associated
`with links in each WWW page.
`In an exemplary embodiment of the invention, said
`response comprises a list of links listed in at least one of said
`WWW pages. Optionally, said response comprises a list of
`links listed in at least a given number of said WWW pages.
`Optionally, said given number is greater than 1.
`Alternatively, said given number is greater than 2.
`In an exemplary embodiment of the invention, said list is
`arranged by WWW pages. Alternatively or additionally, said
`list comprises information associated with a link in its
`corresponding WWW page. Alternatively or additionally, 50
`said list indicates pages not including a link to any URL in
`a predetermined list of URLs. Alternatively or additionally,
`said list indicates pages not including a link from the
`contents of any URL in a predetermined list of URLs.
`Optionally, said predetermined list is provided by a user.
`There is also provided in accordance with an exemplary
`embodiment of the invention, a method of finding WWW
`pages, each of which includes at least one list of links to
`desired Internet resources, comprising:
`providing at least one URL;
`generating a list ofURLs related to said at least one URL;
`determining at least one WWW page that includes links to
`at least one URL of said list ofURLs but not to said provided
`at least one URL; and
`generating a response comprising at least one indication
`of one of said at least one WWW page. Optionally, the
`
`40
`
`Particular embodiments of the invention will be described
`with reference to the following description of some embodi(cid:173)
`ments of the invention in conjunction with the figures,
`wherein identical structures, elements or parts which appear
`25 in more then one figure are optionally labeled with a same
`or similar number in all the figures in which they appear, in
`which:
`FIG. 1 is a schematic illustration of a configuration of a
`search engine in accordance with an exemplary embodiment
`of the invention;
`FIG. 2 is a flowchart of a method for finding hubs, in
`accordance with a exemplary embodiment of the invention;
`and
`FIG. 3 is a flowchart of a method of finding sites similar
`35 to a provided list of sites, in accordance with an exemplary
`embodiment of the invention.
`
`30
`
`DETAILED DESCRIPTION OF SOME
`EMBODIMENTS GENERAL
`
`FIG. 1 is a schematic illustration of a configuration 100 of
`a search engine 106 in accordance with an exemplary
`embodiment of the invention. A user 102 uses search engine
`106 for finding sites of interest on an Internet 104. The
`45 connection to search engine 106 is typically also through
`Internet 104, but is not required. Typically, search engine
`106 utilizes a database 108 that contains indexes and other
`information relating to WWW pages known to search engine
`106. In a typical search engine, a user provides terms and the
`engine responds with a list of sites that include some of the
`terms. Some, more advanced search engines also provide
`sites that appear to be related for various reasons. In some
`embodiments of the invention, a directory including an
`index of which sites link to which other sites is used as a
`55 search tool.
`A search engine result analyzer 110 is optionally
`provided, to analyze the results of the search of index 108 by
`search engine 106 and to provide analyzed results to user
`102. Optionally, as will be described below, analyzer 110
`60 also executes particular searches on search engine 106.
`Although result analyzer 110 is optionally configured to
`work best with a particular search engine 106, a same
`analyzer can work with a plurality of search engines.
`An analysis of search results is generally desired as search
`65 engines do not typically provide a single or small number of
`exactly matching sites, rather, based on keyboards or subject
`fields, a large number of sites that might be suitable are
`
`006
`
`

`

`US 7,257,766 Bl
`
`7
`provided. Wading through a long list of sites is extremely
`time consuming. One reason for this required wading is the
`lack of suitable software for determining if a particular site
`is really relevant to user 102. Also, valuable sites are often
`missed. Even indexing sites, such as Yahoo!, which use 5
`human indexers, often do not supply a suitable site, for
`several reasons, including, (a) not being up to date; (b) lack
`of coverage over much of the Internet; ( c) lack of suitable
`manpower and/or time for such manpower to cover all the
`myriad subjects on the Internet; and (d) lack of a suitable 10
`index structure.
`Typical reasons that a user browses the Internet for
`information include:
`(a) searching for an answer to a particular question,
`optionally answered by an authority on that question;
`(b) looking for an overview of a particular field; and
`( c) searching for a set of sites, from which the user can
`derive his or her own conclusions.
`The inventors of the present invention have realized that
`in many fields there are interested people who have com(cid:173)
`piled their own listing of relevant sites and an analysis of
`each relevant site; such listing sites are known as hubs. Thus,
`it is generally useful to provide a user with a short list of
`such hubs. The inventors have also realized that reviewing 25
`such hubs by a user may be a better way for the user to find
`a dependable and knowledgeable authority in a field, than by
`merely relying on an automated program that analyses links
`between sites. Neglecting a search for potential authorities,
`in accordance with some embodiments of the invention can 30
`allow a faster method to be used for finding hubs. Once these
`hubs are determined, there are other, further types of analy-
`sis that can be usefully presented to a user and answer other
`information gathering questions the user might have.
`Following are several methods of analyzing search results
`to assist an interested user 102 in finding one or a small
`number of relevant sites or hub sites. Although not explicitiy
`described in each of the below-described methods, addi(cid:173)
`tional filtering steps for rejecting certain sites as being
`unsuitable may be provided. Also it is noted that a block
`portion of one method may be suitable for inclusion, as is,
`in another method, as is also described in the exemplary
`implementation method, described herein.
`A user may have a particular question to which be desires
`an answer. However, the words used in the question often do
`not match the words used in the field, or in the particular site
`that holds the answer to the user's question. In some cases,
`there is no common way of describing the subject of the
`question. Each hub can be considered, among other things,
`to be a dictionary of synonyms. Once a user finds one hub,
`the common usage of names of describe the subject of the
`question, generally becomes clear.
`
`40
`
`35
`
`8
`removing sites with low ratings on based on the site size or
`creation/revision date. After filtering, there are N sites,
`where N may be a result of the filtering or the filtering may
`be adapted to achieve a desired value for N.
`The filtered search results are then grouped into groups
`(126), of size K, for example K=4. In some embodiments of
`the invention, K is a function of N. Alternatively or
`additionally, K is a function of the results in the group, for
`example, one group may have a small number of high
`ranking results while another group has a large number of
`low-ranking results. It is not, however, required that all
`groups be the same size. Various grouping methods may be
`used, for example, randomly selecting sites, based on order
`in search results ( either selecting blocks or sites, or selecting
`15 evenly or non-evenly spaced sites from the search results),
`based on a ranking method, to create groups with balanced
`ranks or search order ( e.g., two high and two low ranks)
`and/or grouping similar sites together. Optionally, the size of
`the group may be inversely related to the ranking of sites in
`20 the groups.
`A plurality of potential hubs are determined in step 128,
`by searching for sites that include links to any site in one of
`the groups. Only NIK searches are required. In an exemplary
`search engine 106, for each group the search is for sites that
`include reference to or link to the http address of at least one
`of the sites in the group, e.g. the search term being: "www/
`sitel.com OR www.site2.com OR www.site3.com/stuff OR
`www.site4.com". Optionally, this search includes a request
`for ranking of search results by the search engine.
`The results of all the searches are collated and then
`optionally ranked (130). In an exemplary ranking scheme, a
`two digit number is used, the tens being the number of
`searches the site came up on and the ones being the existence
`and number of special keywords that appear in the potential
`hub (to be described below). A four digit scheme may also
`be used. Also, different ranking methods or different weights
`for the different factors may be used. Exemplary special
`keywords are words that indicate that a site is more likely to
`be a hub (described below) or words from the subject topic
`or from the original search. In some cases, such topic words
`can be gleaned from the original search results (122), for
`example from the page topics or provided by a user.
`In a step 132, a small number of hubs are selected for
`45 further consideration, for example based on the ranking.
`In an optional step 134, the selected hubs are filtered to
`remove sites that are not desirable, for example based on an
`analysis of their content. Exemplary analysis rules that can
`be applied are: counting the number of links from the site;
`50 counting the number of links which also appear on other
`potential hubs; eliminating potential hubs which are almost
`identical to other potential hubs. Typically, but not
`necessarily, a larger number of links indicates a more
`desirable hub. If however, the number of links is too high,
`55 this may indicate an omnibus hub that may be too difficult
`to use, if it is not organized. A later optional step of
`analyzing the amount of content associated with each link
`and/or the organization of the links may be used to deter(cid:173)
`mine if such an onmibus hub is suitable for the user.
`The filtered hubs are then presented to user 102 (136). In
`an exemplary embodiment of the i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket