throbber
( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) byO days.
`
`(21) Appl. No.: 09/544,733
`
`(22) Filed:
`
`Apr. 6, 2000
`
`Related U.S. Application Data
`(60) Provisional application No. 60/155,277, filed on Sep. 20,
`1999.
`Int. Cl.7
`••••....••••••••.••••.••.. G06F 17/00; G06F 17/30
`(51)
`(52) U.S. CI ..................... 715/501.1; 715/514; 715/513;
`707/3; 707/4; 707/5
`(58) Field of Search .............................. 715/501.1, 514,
`715/513; 707/4, 5, 3
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,895,470 A • 4/1999 Pirolli et al. ................ 707/102
`6,285,999 Bl • 9/2001 Page ............................. 707/5
`6,457,028 Bl • 9/2002 Pitkow et al. .............. 715/513
`6,591,261 Bl • 7/2003 Arthurs ......................... 707/2
`6,665,837 Bl
`12/2003 Dean et al ............... 715/501.1
`
`OTHER PUBUCATIONS
`
`Bharat & Henzinger, "Improved Algorithms for Topic Dis(cid:173)
`tillation in a Hyperlinked Environment" 1998 http:/ I
`216.239.37.104/search?q=
`
`(12) United States Patent
`Law et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6, 754,873 Bl
`Jun.22,2004
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006754873Bl
`
`(54) TECHNIQUES FOR FINDING RELATED
`HYPERLINKED DOCUMENTS USING LINK(cid:173)
`BASED ANALYSIS
`
`cache:PllPOZ9wxQsJ:gatekeeper.dec.com/pub/DEC/SRC/
`publicatioos/monika!sigir98/pdf+lmoved+Algorilhms+for+
`topic&hl=en&ie-UIF-8. *
`
`(75)
`
`Inventors: Kin Lun Law, Redwood City, CA
`Dean & Henzinger, "Finding related pages in the World
`---- _(US);.Georges_R. Harlk,Mountain ___ -- --- Wide-Web" Mar.-1;-1999; http://216~239c39cl04/search?q=
`View, CA (US)
`cache:tWTI2B22VcEJ:www.unizh.ch/home/mazzo/re-
`(73) Assignee: Google Inc., Moutain View, CA (US)
`ports/www8conf/2148/pdf/pdl.pdf+Finding+Related+
`Pages+in+the+ World+ Wide+ Web&hl=en&ie=UIF-8. *
`
`Bharat & Henzinger, "Improved Algorithms for Topic Dis(cid:173)
`tillation in a Hyperlioked Environment" 1998 http://
`216.239.37.104/search?q=
`cache:PJIPOZ9wxQsJ:gatekeeper.dec.com/pub/DEC/SRC/
`publicatioos/moriika/sigir98/pdf+lmoved+Algorithms+for+
`topic&hl=en&ie-UIF-8. *
`
`* cited by examiner
`
`Primary Examiner-stephen S. Hong
`Assistant Examiner-Adam L. Basehoar
`(74) Attorney, Agent, or Finn-Harrity & Snyder LLP
`
`(57)
`
`ABSTRACT
`
`Techniques for finding related hyperliriked documents using
`link-based analysis are provided. Backlirik and forwardliok
`sets can be utilized to find web pages that are related to a
`selected web page. The scores for links from web pages that
`are from the same host and links from web pages with
`numerous links can be reduced to achieve a better list of
`related web pages. The list of related web pages can be
`utilized as a feature to a word-based search engine or an
`addition to a web browser.
`
`19 Claims, 14 Drawing Sheets
`
`EXHIBIT 2107
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 1 of 14
`
`US 6,754,873 Bl
`
`..,..---1
`
`11
`
`FIG. 1
`
`51
`
`53
`
`55
`
`57
`
`..,..---1
`
`DISPLAY
`ADAPTER
`
`59
`
`3
`
`SOUND CARD
`
`61
`
`63
`
`65
`
`NETWORK
`INTERFACE
`
`9
`
`11
`
`FIG. 2
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 2 of 14
`
`US 6,754,873 Bl
`
`1
`
`NETWORK
`
`1
`
`D
`
`-
`~o:
`
`-
`
`1
`
`D
`
`-
`I oJo:
`
`-
`
`FIG. 3
`
`

`

`U.S. Patent
`US. Patent
`
`Jun.22,2004
`Jun. 22, 2004
`
`Sheet 3 0f 14
`Sheet 3 of 14
`
`US 6,754,873 Bl
`US 6,754,873 B1
`
`-------- - -- -
`
`FIG.‘{
`
`,--
`
`I-~
`,<""'
`I
`I
`I
`
`I
`I
`
`-------
`
`'=I
`c+
`
`()o -
`"9|?
`~
`
`.
`~
`
`

`

`U.S. Patent
`US. Patent
`
`Jun.22,2004
`Jun. 22, 2004
`
`Sheet 4 0f 14
`Sheet 4 of 14
`
`US 6,754,873 Bl
`US 6,754,873 B1
`
`- - -
`
`-- (
`
`
`
`I ' I
`
`--
`
`l
`
`\
`
`\
`
`\
`
`\
`
`I
`\
`
`

`

`U.S. Patent
`US. Patent
`
`Jun.22,2004
`Jun. 22, 2004
`
`Sheet 5 0f 14
`Sheet 5 of 14
`
`US 6,754,873 Bl
`US 6,754,873 B1
`
`.\"
`
`~I
`:4
`0
`
`I
`
`If?
`
`I
`
`K
`
`6
`~· ----
`:9;
`‘L
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 6 of 14
`
`US 6,754,873 Bl
`
`SCAN WEB PAGES AND SAVE
`FORWARDLINK AND BACKLINK
`SETS
`
`CALCULATE RELATED LISTS FOR
`THE WEB PAGES
`
`GENERATE A POPULAR LIST OF
`WEB PAGES THAT OCCUR MOST
`OFTEN IN THE RELATED LISTS
`
`301
`
`303
`
`305
`
`DONE
`
`FIG. 7
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 7 of 14
`
`US 6,754,873 Bl
`
`GENERATING A
`RELATED LIST
`
`+
`PROVIDE A BACKLINK SET OF WEB V
`PAGES FOR A SELECTED WEB
`PAGE
`
`+
`PROVIDE A FORWARDLINK SET OF V
`WEB PAGES FROM THE BACKLINK
`SET
`
`+
`
`ASSIGN A VALUE TO EACH
`FORWARD LINK IN THE WEB PAGES
`OF THE BACKLINK SET
`
`~
`GENERATE A SCORE FOR EACH
`WEB PAGE IN THE FORWARD LINK
`SET
`
`/
`
`/
`
`351
`
`353
`
`355
`
`357
`
`+
`
`GENERATE A LIST OF RELATED
`WEB PAGES
`
`+
`
`DISPLAY THE LIST OF RELATED
`WEB PAGES
`
`v 359
`
`361
`
`/
`
`+
`
`DONE
`
`FIG. 8
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 8 of 14
`
`US 6,754,873 Bl
`
`GET A FORWARD LINK FROM A
`WEB PAGE IN THE BACKLINK SET
`
`401
`
`403
`
`>----YES------+1
`
`REDUCE VALUE ASSIGNED TO
`FORWARD LINK
`
`407
`
`>----YES,------+1
`
`REDUCE VALUE ASSIGNED TO
`FORWARD LINK
`
`405
`
`409
`
`YES
`
`NO
`
`NO
`
`ASSIGN THE FORWARD LINK A
`VALUE
`
`411
`
`413
`
`NO +
`
`(_Do -NE )
`
`FIG. 9
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 9 of 14
`
`US 6,754,873 Bl
`
`DISPLAYING THE
`LIST
`
`GENERATE SYMMETRIC AND
`UNSYMMETRIC LISTS
`
`DISPLAY LINKS TO WEB PAGES IN
`SYMMETRIC LIST
`
`DISPLAY LINKS TO WEB PAGES IN
`UNSYMMETRIC LIST IF NOT IN
`POPULAR LIST
`
`451
`
`453
`
`455
`
`DONE
`
`FIG. 10
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 10 of 14
`
`US 6,754,873 Bl
`
`Try our special searches:
`
`Uncle Sam Search milllons of US govemmeni documents
`
`Linux Search the web's Linux resources
`
`F I &-. ! f
`
`http://v.n.vw.google.com/
`
`11/J/W
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 11 of 14
`
`US 6,754,873 Bl
`
`About 104143 matches for New York Times
`S hm,.j ng re.sults 1-10, Search took 0. 08 seconds
`How do I interpret the results?
`
`~~S"li"i
`~ ')1
`ww.nytim.es.co-m/
`Ncm! Tty out Goog!eScom ~ ·
`
`www .. nytimg,&2m/~~ght!ttml
`New! Try out GoogleScout
`
`www.nvtim.es.comJinfotwntents/~
`New, Tryout.~~
`
`\.VWW.nytimes.oomlyr/mo/dayJ
`New! Try out GoogleScout
`
`www.nytimes.oomllearningl
`Newt Try out GoogleScout
`
`The New York Times Syndicate
`... a. ne11· era, The New York Thnl!!l Syndicate is ready to ...
`.. mism, either. The New York TimllS Magazine is celebrating ...
`nytl!Jin.oomf Cached (Sk} New! Try out GooglcScout
`
`New York Today citv guide and event calendar
`... Q,d~ home delivery ofThe New York Times JUNE 30- JULY 6 ...
`... doors in New Yorit," says William Grimes in roday's Times ...
`www.nytoday.roml Cacl;ed ($91::) New! Try out GoogleSoout
`
`www.nytcoml
`New! Try uut GoogleSoout
`
`Hom~ Page
`nytimeofax.coml Cached Ok) Newl Try out GoosleSrout
`
`n:vtime.s. i;QJti/
`Newt Try out ~.ikS.~\tt
`
`f f G. I J A
`
`GotHJooo~:)ovog l e ~.-..
`
`1 ~ :J. 1 l'i
`
`!'!
`
`'1.
`
`!l. Ei
`
`:til: ~
`
`Resui1Page:
`
`. Eew Yo~::k Time~
`
`,
`
`ewquery,
`N
`
`http:l/w"'w. goo gle. cornlsearch?q= New+ York:+' l'Imes
`
`11/3JW
`
`'fry your query on other mgines
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 12 of 14
`
`US 6,754,873 Bl
`
`Alta Vista Excite HQW.Qt Info seek~ ~ fib.oo!. Amazon Open Directory e<iro\q!s
`
`Copyright 01999 Google Inc. - AbQyt
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 13 of 14
`
`US 6,754,873 Bl
`
`G. '
`
`-'1·-·e_· __ .,,
`
`. ·,
`.·•'·'
`.·.
`
`,:•
`
`GoogleS&9ill has automatically scouted the web for pages related to ~11$ime:>,oomf
`!Jf:::,'!!e ('TOogieScaut ~?..!f!.~t! list of a company~5 r:ompetitors . .. ---~------------
`24 related pages for www..nytimeli.com/
`Showing related pages 1-10, Search took 0_05 seconds
`How do I intemr~ltlll=<_I:e.@!ts!
`
`www.nytirus.cgmt
`Newt Try out GOQ.SIP.~llt
`
`CNN Intemc.tiY~
`category books flowers music office travel video MA1N PAGE WORLD US_ LOCAL POLITICS WEATHER
`BUSINESS SPORTS SCI· TECH NATURE ENTERTAINMENT BOOKS TRAVEL FOOD HEALTH STYLE IN-__ _
`www.cMooml Ca.cl:wd (61kl Newt Try oot ~!S9om
`
`U.SATODAY
`USA iODAYdeskTOPnew~ Swcll the web: Nationline Wasbington World Stocks Scores Ba!eball NBA NHL Tech
`Books Ca~ Small Business Travel Mi!J~rmium Hot Site$ W~b tech Politic .. _
`www.WU!t<Jday.com/ C~ N!!Wl Try out GooWeSoout
`
`washlngt~.~om: News Front
`-----;---About !he $lte
`Advertising lnfo_ --:::-:::----- Send us feedback and
`correctio!lll
`Olb.e~ Po:rt Co. W<ID sires Wednesday, .fune 30, ...
`www . .....-ashingtQnpoot.rom/ Newl Try out Goo¢eSoout
`
`Los Angeles Tima Web Site
`Wedn~y. Jun~ 30, 1999 06:59PM News--~--- Main Page News Wite:<J AP The Wire--------- Today's Paper
`----~--- Co)QQm~ C<;JiliJnelliMJi Food Gov't & Polmcs ...
`WWI\'.Iatime5.coml Cached (35k) New! 'fry oot Googh:Soout
`
`~
`New! Try out Qpog{e!kou:t
`
`W'OCi'/&_hlgtgo.tnbune.com/
`New! fry out Gm!deSoout
`
`www.abcnews.coml
`New! Try out QQ.ggl~:SJmrt
`
`The Christian ~en~ MQnitor Electronic Edition
`This page uses frames, but your browser doesn'l suppo-rt them.
`www.(smonitor.com.l Cached (Ok) New, Try out ~~ut
`
`:W.!ll~<QJlle to Mercury Cente~
`Silicon Valley news and information from the PWi~ Prize.winning San Jo-se M~rcmy News. ·-Last updEit~ Ell S:21 p.m.
`PDT Wednesday, June 30, 1999 Fed raises rate.s as expei)L
`http:ltwv.w. google.cornlsearch'fslte=sea:rch&nrun~ I O&q=related~bJ91 U&. ... :www.nytunes.com
`
`l 1 /319'1
`
`

`

`U.S. Patent
`
`Jun.22,2004
`
`Sheet 14 of 14
`
`US 6,754,873 Bl
`
`Gooogle~ ..... ~
`R>:sult Page;
`
`~- ~
`
`1
`
`;!
`
`Copyright .01999 Coogle Inc. -~
`
`·""'-'--"~~--=--~---·---'~~-···-·---·~-==
`
`prG./38
`
`http:/ /w'A'W. googl e. com/search'lstte=s-earch&num= l O&q=related:oJYl u&_ .. : www. nyt1mes.com
`
`ll/3!9Y
`
`

`

`US 6,754,873 Bl
`
`1
`TECHNIQUES FOR FINDING RELATED
`HYPERLINKED DOCUMENTS USING LINK(cid:173)
`BASED ANALYSIS
`
`2
`Therefore, what is needed are innovative techniques for
`finding related hyperlinked documents without requiring
`human categorization of the information.
`
`This application claims the benefit of U.S. Provisional 5
`Application No. 60/155,277, filed Sep. 20, 1999, which is
`hereby incorporated by reference.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides innovative techniques for
`finding related hyperlinked ocuments using link-based
`analysis. The link structure of the hyperlinked documents is
`analyzed in order to find hyperlinked documents that are
`related to and at the same level of generality of a hyperlinked
`document. The invention can be utilized any number of
`ways including as an additional feature for a word-based
`search engine or as an addition on a web browser. Some
`specific embodiments of the invention are described below.
`In one embodiment, the invention provides a computer
`implemented method of generating lists of hyperlinked
`documents that are related to a given or selected hyperlinked
`document. A first set of hyperlinked documents that have a
`20 forward link to the selected hyperlinked document is pro(cid:173)
`vided. Additionally, a second set of hyperlinked documents
`that are pointed to by the forward links in the hyperlinked
`documents in the first set is provided. A value is assigned to
`each forward link in each of the hyperlinked documents in
`25 the first set, with the value being reduced for a forward link
`if there are multiple hyperlinked documents from the same
`host as the hyperlinked document that includes the forward
`link. A score is generated for each hyperlinked document in
`the second set according to the values of the forward links
`30 pointing to the hyperlinked document. Accordingly, a list of
`related hyperlinked documents is generated from the second
`set according to the score of the hyperlinked documents. In
`a preferred embodiment, the related hyperlinked documents
`are displayed in an order based on their score.
`In another embodiment, the invention provides a com(cid:173)
`puter implemented method of generating lists of related
`hyperlinked documents. A first set of hyperlinked documents
`that have a forward link to a selected hyperlinked document
`is provided. A second set of hyperlinked documents that are
`pointed to by the forward links in the hyperlinked docu(cid:173)
`ments of the first set is also provided. A value is assigned to
`each forward link in each of the hyperlinked documents in
`the first set, with the value being reduced according to the
`number of forward links in the hyperlinked document that
`includes the forward link. A score is generated for each
`hyperlinked document in the second set according to the
`values of the forward links pointing to the hyperlinked
`document. Lastly, a list of related hyperlinked documents is
`generated from the second set according to the score of the
`hyperlinked documents.
`Other features and advantages of the invention will
`become readily apparent upon review of the following
`description in association with the accompanying drawings,
`where the same or similar structures are designated with the
`55 same reference numerals.
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to U.S. patent application No.
`08/655,149, filed May 30, 1996, now U.S. Pat. No. 5,754,
`760, which is hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`35
`
`The present invention relates to hyperlinked document
`systems. More specifically, the invention relates to tech(cid:173)
`niques for finding related hyperlinked documents using
`link-based analysis.
`The Internet, and more specifically the World Wide Web,
`provides users all over the world with virtually unlimited
`amounts of information in the form of hyperlinked docu(cid:173)
`ments. As new information is added to the Web, more
`hyperlinked documents are added that include links to the
`existing web of information.
`One of the reasons for the almost explosive growth of
`information on the Web is that virtually anyone can add
`hyperlinked documents, which will be immediately avail(cid:173)
`able to users around the world. For better or worse, the Web
`is virtually unstructured, meaning that users are free to add
`information to the Web in almost any way they desire.
`Although this provides great flexibility in adding inform a(cid:173)
`tion to the Web, it can significantly increase the difficulty in
`finding information that is desired.
`Probably the most popular mechanism for finding infor(cid:173)
`mation on the Web is to use word-based search engines.
`Word-based search engines allow a user to enter words,
`phrases, and other search criteria so that the search engine
`can retrieve the hyperlinked documents that best match the 40
`user's search criteria.
`Word-based search engines have been tremendously suc(cid:173)
`cessful in allowing users to find the information they desire
`on the Web. There are times, however, when a user wants to
`find hyperlinked documents that are related to and at the 45
`same level of generality to a selected hyperlinked document.
`For example, a user may be viewing a company's web site
`and wish to see other web sites for competitive companies.
`As another example, a user may have found a university's
`computer science department web site and the user may 50
`desire to see computer science department web sites of other
`universities. Traditional word-based search engines may not
`provide satisfactory results for these types of desired infor(cid:173)
`mation.
`Some web sites have recognized this deficiency and have
`taken on the pain staking process of categorizing the infor(cid:173)
`mation on the Web. Although it is possible that the related
`hyperlinked documents that are desired are in a single
`category, it often happens that the related hyperlinked docu(cid:173)
`ments are spread throughout multiple categories. For 60
`example, if information regarding each university is placed
`in a separate category, one will not find a single category that
`includes information regarding the computer science depart(cid:173)
`ments of multiple universities. Additionally, categorizing the
`information on the Web takes a considerable amount of time 65
`and typically requires human decision making to categorize
`the information.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates an example of a computer system that
`can be utilized to execute the software of an embodiment of
`the invention.
`FIG. 2 illustrates a system block diagram of the computer
`system of FIG. 1.
`FIG. 3 illustrates a network of multiple computer systems
`such as the Internet.
`FIG. 4 shows an example of linked web pages in order to
`demonstrate scoring techniques of the present invention.
`
`

`

`US 6,754,873 Bl
`
`3
`FIG. 5 shows linked web pages of FIG. 4 to more clearly
`show a scoring technique that reduces values of links if the
`web pages are from the same host.
`FIG. 6 shows a web page of FIG. 4 in order to more
`clearly indicate the scoring technique that reduces a value of 5
`a link if there are multiple links in a hyperlinked document.
`FIG. 7 shows a flowchart of a process of preprocessing a
`web of information.
`FIG. 8 shows a flowchart of a process of generating a list
`of related web pages.
`FIG. 9 shows a flowchart of a process of assigning values
`to forward links in a backlink set of web pages.
`FIG. 10 shows a flowchart of a process of displaying the
`list of related web pages.
`FIG. 11 shows an example of a web page for a word-based
`search engine that includes an embodiment of the invention.
`FIGS. 12A and 12B show a web page including word(cid:173)
`based search results and a link to find related web pages
`according to a link-based analysis.
`FIGS. 13A and 13B show a web page including related 20
`web pages from a link-based analysis.
`
`10
`
`4
`The system bus architecture of computer system 1 is
`represented by arrows 67. However, these arrows are illus(cid:173)
`trative of any interconnection scheme serving to link the
`subsystems. For example, a local bus could be utilized to
`connect the central processor to the system memory and
`display adapter. Computer system 1 shown in FIG. 2 is but
`an example of a computer system suitable for use with the
`invention. Other computer architectures having different
`configurations of subsystems can also be utilized.
`FIG. 3 shows a network of multiple computer systems. A
`network 101 provides communication between multiple
`computer systems 1. In a wide area network such as the
`Internet, some of the computer systems are servers (or hosts)
`and provide access to resources or services to client com(cid:173)
`puter systems on the network. With respect to web pages,
`15 there are multiple server computer systems that store the
`web pages that make up the Web. The web pages typically
`include links in the form of uniform resource locators
`(URLs) that are a link to another web page, whether it is on
`the same server or a different one.
`As described above, the Web is a distributed network of
`web pages. Networks of hyperlinked documents can also be
`present in local area networks (e.g., intranets). The operation
`of these intranets is very similar to the Internet except that
`it is not uncommon for all or a majority of the hyperlinked
`documents of an intranet to be stored on a single server
`computer system.
`Now that typical computer systems and networks have
`been described, it may be beneficial to show an example of
`30 related web pages. FIG. 4 shows an example of linked web
`pages to demonstrate how related web pages can be obtained
`from linked-based analysis. A web page 201 is the selected
`web page of interest in which it is desirable to find related
`web pages.
`Web pages 203-207 include one or more links to web
`page 201 as indicated by the arrows. A backlink set 209
`includes web pages 203-207 meaning that the backlink set
`of web pages is the set of web pages that include a link to
`a specific web page, shown here as web page 201. A host 211
`40 stores web pages 205-207 as indicated, the importance of
`which will be described in more detail below.
`Web pages 213-216 are web pages that are pointed to by
`at least one web page in backlink set 209. In other words,
`there is at least one web page in backlink set 209 that
`45 includes a link that points to one of web pages 213-216. A
`forwardlink set 218 includes web pages 213-216 and is
`called such because it is derived from forward links from the
`backlink set. In order to facilitate describing the invention,
`web page 201 will not be described as being a member of
`50 forwardlink set 218. However, web page 201 can be con(cid:173)
`sidered to be a ember of forwardlink set 218 in some
`embodiments as will be described below.
`For some very popular web pages, the backlink set can be
`quite large (e.g., a million links or more). Therefore, in a
`preferred embodiment, for backlink sets that have more than
`a predetermined number of links (e.g., 10,000), a random
`sampling of the links is utilized. In other words, if the
`backlink set has more than a predetermined number of links,
`fewer links can be selected at random (or selected in another
`manner) to be processed.
`The web pages of forwardlink set 218 can be thought of
`as being related to and at the same level of generality of web
`page 201. This is because there is at least one web page that
`includes a link to both web page 201 and each of the web
`pages in forwardlink set 218. Thus, a list of related web
`pages can be generated from the web pages and the backlink
`set.
`
`25
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`In the description that follows, the present invention will
`be described in reference to embodiments that generate lists
`of related web pages from the Word Wide Web. More
`specifically, the embodiments will be described in reference
`to generating lists of related web pages utilizing link-based
`analysis requiring little or no human decision making.
`However, embodiments of the invention are not limited to
`any particular environment, application or specific imple(cid:173)
`mentation. For example, the embodiments described below
`are in reference to web pages but the invention can be
`advantageously applied to any type of hyperlinked docu- 35
`ment. Therefore, the description of the embodiments that
`follows is for purposes of illustration and not limitation.
`FIG. 1 illustrates an example of a computer system that
`can be used to execute the software of an embodiment of the
`invention. FIG. 1 shows a computer system 1 that includes
`a display 3, screen 5, cabinet 7, keyboard 9, and mouse 11.
`Mouse 11 can have one or more buttons for interacting with
`a graphical user interface. Cabinet 7 houses a CD-ROM
`drive 13, system memory and a hard drive (see FIG. 2)
`which can be utilized to store and retrieve software pro(cid:173)
`grams incorporating computer code that implements the
`invention, data for use with the invention, and the like.
`Although CD-ROM 15 is shown as an exemplary computer
`readable storage medium, other computer readable storage
`media including floppy disk, tape, flash memory, system
`memory, and hard drive can be utilized. Additionally, a data
`signal embodied in a carrier wave (e.g., in a network
`including the Internet) can be the computer readable storage
`medium.
`FIG. 2 shows a system block diagram of computer system 55
`1 used to execute the software of an embodiment of the
`invention. As in FIG. 1, computer system 1 includes monitor
`3 and keyboard 9, and mouse 11. Computer system 1 further
`includes subsystems such as a central processor 51, system
`memory 53, fixed storage 55 (e.g., hard drive), removable 60
`storage 57 (e.g., CD-ROM drive), display adapter 59, sound
`card 61, speakers 63, and network interface 65. Other
`computer systems suitable for use with the invention can
`include additional or fewer subsystems. For example,
`another computer system could include more than one 65
`processor 51 (i.e., a multi-processor system) or a cache
`memory.
`
`

`

`US 6,754,873 Bl
`
`6
`the number of links in the web page. As shown in FIG. 6,
`web page 204 includes 4 links and so the value of each link
`from the web page is V.. In this way, web pages that have
`relatively many links do not get more than their fair share of
`"votes" for related web pages. In a preferred embodiment,
`the value for each link is 1 divided by the number of links
`in the web page plus a predetermined amount (e.g., 10).
`Although the techniques described in reference to FIGS.
`5 and 6 can be used alone, in preferred embodiments, the
`10 techniques are used in conjunction to generate an accurate
`list of related web pages. The techniques can be combined
`by multiplying all the values for a given link in order to
`determine the final value for the link. As an example, the
`following table shows the scores that would be generated for
`each of the web pages of forwardlink set 218 utilizing this
`technique:
`
`Web Page
`
`Links
`
`213
`214
`215
`216
`
`(1/2) + (1/4)
`(1/4) + (1/3 * 1/2)
`(1/4) + (1/3 * 1/2)
`(1/4) + (1/3 * 1/2)
`
`Score
`
`0.75
`0.415
`0.415
`0.415
`
`25
`
`5
`One way that the web pages of forwardlink set 218 can be
`scored according to "relatedness" to web page 201 is accord(cid:173)
`ing to the number of links to each web page from backlink
`set 209. Utilizing this technique, each web page in forward(cid:173)
`link set 218 has a score of 2 because there are two links to 5
`each web page from the web pages of the backlink set.
`Accordingly, the same scores indicate that all of the web
`pages of web page 218 are equally related to web page 201.
`For at least the following reasons, this result may not be
`satisfactory.
`Within a single host, the web pages typically include
`many links to other web pages within the same host. For
`example, a company's web site may include many links that
`interlink different web pages on the same host (or domain or
`other grouping of web pages). Thus, referring back to FIG. 15
`4, web page 201 could describe a product of the company
`while web pages 214, 215, and 216 could correspond to
`other web pages of the company on the same host that are
`relatively unrelated to the product from web page 201.
`Assuming that web page 213 describes a competing product 20
`from a competitor company (and hence likely a different
`host), it would be desirable for web page 213 to get a higher
`score of "relatedness" than web pages 214-216 because it is
`more likely the user is interested in a competing product than
`a relatively unrelated web page from the same host.
`Another problem is that some web pages have relatively
`few links while others have relatively many links. If each
`link is counted equally, the web pages with relatively many
`links gets more "votes" for the "relatedness" of the web
`pages. For example, referring to FIG. 4, web pages 203 and
`205-207 all include two links. However, web page 204 has
`four links, which is twice as many as in the other web pages
`of backlink set 209. Because web page 204 has more links,
`the web page has a greater impact on selecting related web
`pages than the other web pages of the backlink set. It would 35
`be desirable if the web pages of the backlink set were
`considered relatively equal regardless of the number of links
`each web page includes.
`It should be noted that FIG. 4 shows a very simple 40
`example in order to illustrate some of the problems that can
`be encountered in selecting related web pages. As one can
`imagine, typically the web pages include many more links
`and the number of web pages involved in determining
`related web pages is far greater than that shown in FIG. 4.
`Nevertheless, FIG. 4 is useful in illustrating some problems
`related to generating a list of related web pages. Also, the
`links were shown as coming from the bottom of the web
`pages for simplicity, but links are typically embedded within
`the text of the web pages as will be shown in subsequent
`figures.
`FIG. 5 shows one technique that can be utilized to reduce
`the importance of web pages from the same host. As
`described above in reference to FIG. 4, web pages 205-207
`are in the same host. Instead of giving each link from these
`web pages a value of, for example, 1, the links are given a
`value of 1 divided by the number of web pages that are from
`the same host, which in this case is 3. Accordingly, each link
`from web pages 205-207 are given a value of Ij3 as shown.
`By dividing the value for a link by the number of web pages 60
`in the same host, the amalgam of web pages from the same
`host have a total of 6* lf3=2 "votes," which would be the
`same as a single web page with two links from another host.
`FIG. 6 shows a technique in which to reduce the impor(cid:173)
`tance of the individual links from a web page with relatively
`many links. In order to reduce the value for each link from
`a web page with many links, the value of one is divided by
`
`45
`
`One or more of these techniques can also be combined with
`a measure of text-based similarity of the web pages.
`FIG. 7 shows a flowchart of a process of preprocessing the
`Web in order to generate banklink sets. The Web does not
`30 include backlink sets so it is beneficial to scan the Web and
`identify backlink sets for each of the web pages on the Web.
`This allows related web pages to be identified in a much
`more efficient manner because the backlink sets are already
`calculated.
`At step 301, web pages are scanned and forwardlink and
`backlink sets are saved. As each web pages is processed, the
`forward links in the web page are identified and saved as a
`forwardlink set. Additionally, the current web page is added
`to the backlink set of each of the web pages that are pointed
`to by links in the current web page. The generation of
`forwardlink and backlink sets is well suited for an automated
`process and can be continually run to identify and save
`changes in the Web.
`In addition to calculating backlink sets, lists of related
`web pages can be calculated for each of the web pages on the
`Web at a step 303. The list of related web pages can be
`calculated as described below in reference to FIGS. 8 and 9.
`Although optional, the lists of related web pages can be
`utilized to generate a popular list of web pages that occur
`most often in the related lists at a step 305.
`The popular list includes web pages that occur most
`frequently in the lists of related web pages. For example, a
`popular site like the yahoo web page may occur very
`frequently in the lists of related web pages. The popular list
`55 may be a predetermined number (e.g., 2000) of web pages
`that occur most often in the lists of related web pages. The
`popular list can be utilized to more accurately display the list
`of related web pages as described in more detail in reference
`to FIG. 10.
`FIG. 8 shows a flowchart of a process of generating a list
`of related web pages. For example, the list of related web
`pages can be generated upon request from a user for a web
`page being displayed in a web browser or for a web page that
`is being displayed (e.g., as a link) in results from a web
`65 search.
`At a step 351, a backlink set of web pages is provided for
`a given or selected web page. The selected web page is the
`
`50
`
`

`

`US 6,754,873 Bl
`
`7
`web page for which the list of related web pages should be
`generated. The backlink set of web pages can be easily
`determined from the backlink set generated during prepro(cid:173)
`cessing of the web as described in FIG. 7.
`A forwardlink set of web pages is provided from the 5
`backlink set at a step 353. The forwardlink set of web pages
`are the web pages that are pointed to by the forward links in
`the backlink set for the selected web page. The forwardlink
`set can be generated during preprocessing but since the time
`to generate the forwardlink set is minimal, in some embodi- 10
`ments the forwardlink set is generated in real time when the
`user asks for a list of related web pages for the selected web
`page.
`At a step 355, a value is assigned to each forward link of
`the web pages of a backlink set. A process of assigning a 15
`value to the forward links will be described in more detail in
`reference to FIG. 9.
`A score is generated for each web page in the forwardlink
`set at a step 357. The score can be generated by adding
`together all the values for each of the forward links that
`points to each web page in the forwardlink set.
`Once the scores are generated for each of the web pages
`in the forwardlink set, a list of related web pages can be
`generated at a step 359. The list of related web pages can be
`generated from the forwardlink set according to the score of 25
`the web pages. In other words, the score is an indication of
`the relatedness to the selected web page and the higher the
`score, the more related the web page is.
`At a step 361, the list of related web pages is displayed.
`The displayed list of related web pages can be a predeter(cid:173)
`mined number of the most highly related web pag

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket