throbber
111111111111111111111111111111111111111111111111111111111111111111111111111
`US008516357Bl
`
`(12) United States Patent
`Harik
`
`(IO) Patent No.:
`(45) Date ofPatent:
`
`US 8,516,357 Bl
`*Aug. 20, 2013
`
`(54) LINK BASED CLUSTERING OF
`BYPERLINKED DOCUMENTS
`
`(75)
`
`Inventor: Georges R. Harik, Mountain View, CA
`(US)
`
`(73) Assignee: Google Inc., Mountain View, CA (US)
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 3 5
`U.S.C. 154(b) by 446 days.
`This patent is subject to a terminal dis(cid:173)
`claimer.
`
`(21) Appl. No.: 121959,141
`
`(22) Filed:
`
`Dec. 2,2010
`
`Related U.S. Application Data
`(63) Continuation of application No. 11/693,522, filed on
`Mar. 29, 2007, now abandoned, which
`is a
`continuation of application No. 09/636,052, filed on
`Aug. 10,2000, now Pat. No. 7,213,198.
`(60) Provisional application No. 60/148,854, filed on Aug.
`12, 1999.
`
`(51)
`
`(2006.01)
`(2006.01)
`
`Int. Cl.
`G06F 17100
`G06F 17120
`(52) U.S. CI.
`USPC ............ 715/200; 715/206; 715/207; 715/208
`(58) Field of Classification Search
`USPC .................................. 715/200,206,207,208
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,761,436 A
`6/1998 Nielsen
`5,835,905 A + 1111998 Pirolli et al ........................... 1/1
`5,847,708 A + 1211998 Wolff ............................ 715/764
`5,875,446 A
`2/1999 Brown etal.
`
`401
`
`--
`
`5,890,172 A
`5,920,859 A
`5,941,944 A
`5,999,664 A "'
`6,038,574 A
`6,049,799 A *
`--6;078;913 A +
`6,112,203 A +
`6,122,647 A
`6,167,397 A
`6,182,066 Bl *
`6,272,505 B I *
`6,285,999 B I
`6,345,253 B1
`6,353,827 B1 •
`6,356,898 B2
`
`3/1999 Borman et al.
`7/1999 Li
`8/1999 Messerly
`1211999 Mahoney et al .............. 382/305
`3/2000 Pitkow et al.
`4/2000 Mangat et a!. ........................ 1/1
`612000 Aoki-et ai;· . .-.. : ............... ~ .. ~.-.. ; 1!1-
`8/2000 Bharat eta!. .......................... 1/1
`9/2000 Horowitz et at.
`12/2000 Jacobson et al.
`lf2001 Marques ............................... 1/1
`8/2001 DeLaHuerga .............. 715/235
`912001 Page
`2/2002 Viswanalhan
`3/2002 Davies eta!. ................. 707/769
`3/2002 Cohen et al.
`(Continued)
`
`OTHER PUBLICATIONS
`
`Judit Bar-IIan et al., "3"' International Workshop on Web Dynamics",
`May 18, 2004, 125 pages.
`
`(Continued)
`
`Primary Examiner- Doug Hutton, Jr.
`Assistant Examiner- Soumya Dasgupta
`(74) Attorney, Agent, or Firm- Harrity & Harrity, LLP
`
`(57)
`
`ABSTRACT
`
`Techniques for grouping hyperlinked documents are pro(cid:173)
`vided. Links near or in the neighborhood of the hyperlinked
`documents are analyzed in order to group the hyperlinked
`documents by topic. For example, links that are search results
`can be grouped by identi:fYing other hyperlinked documents
`that have multiple forward links to the search results. The
`search results can then be grouped according to the forward
`links of the other hyperlinked documents.
`
`20 Claims, 18 Drawing Sheets
`
`SATURN
`PLANET
`PAGE
`
`. .
`
`/
`/
`
`/
`
`401
`
`SATURN
`PLANET
`PAGE
`
`403
`
`EXHIBIT 2108
`Facebook, Inc. et at.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`

`

`US 8,516,357 Bl
`Page 2
`
`(56)
`
`References Cited
`
`200110047375 A1
`200110049698 A1
`2002/0035611 A1
`
`1112001 Fest
`12/2001 Hsu eta!.
`3/2002 Dooley
`
`OTHER PUBLICATIONS
`
`Haffner eta!., "Request-Prediction and Hyperlink-Proposals", Nov.
`2000, 177 pages.
`Blank et al., "Interactive Gradebook: The Missing (Hyper)Link",
`1998, 5 pages.
`Rennie et al., "Using Reinforcement Learning to Spider the Web
`Efficiently", 1999, 10 pages.
`Rennie eta!., "Efficient Web Spidering with Reinforcement Learn(cid:173)
`ing", 1999, 13 pages.
`* cited by examiner
`
`U.S. PATENT DOCUMENTS
`6,385,602 B1
`5/2002 Tso et al.
`6,446,061 B1 * 9/2002 Doerre eta!.
`6,457,028 B1
`9/2002 Pitkow eta!.
`6,510,406 B1
`112003 Marchisio
`6,549,935 B1
`4/2003 Lapstun et a!.
`6,562,077 B2 * 5/2003 Bobrow eta!.
`6,574,644 B2 * 6/2003 Hsu eta!.
`6,665,715 B1
`12/2003 Houri
`6,675,170 B1
`112004 Flake
`6,728,752 B1
`4/2004 Chen eta!.
`6,990,628 B1 *
`112006 Palmer eta!.
`7,213,198 B1
`5/2007 Harik
`
`................. 707/738
`
`............... 715/204
`715/205
`
`................. 715/234
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 1 of 18
`
`US 8,516,357 Bl
`
`,.--1
`
`11
`
`FIG.1
`
`51
`
`53
`
`55
`
`,.--1
`
`57
`
`67
`
`65
`
`61
`
`63
`
`59
`
`9
`
`11
`
`FIG. 2
`
`

`

`U.S. Patent
`US. Patent
`
`Aug. 20, 2013
`Aug. 20, 2013
`
`Sheet 2 of 18
`Sheet 2 of 18
`
`US 8,516,357 B1
`US 8,516,357 Bl
`
`1
`
`
`
`1
`
`D_
`
`~a:
`
`D_
`
`~o:
`
`FIG. 3
`HG. 3
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 3 of 18
`
`US 8,516,357 Bl
`
`205
`
`CARS
`PAGE
`
`207
`
`CARS
`PAGE
`
`215
`
`217
`
`ASTRONOMY
`PAGE
`
`SCIENCE
`PAGE
`
`SATURN
`CAR
`PAGE
`
`SATURN
`CAR
`PAGE
`
`SATURN
`PLANET
`PAGE
`
`SATURN
`PLANET
`PAGE
`
`201
`
`203
`
`209
`
`211
`
`SftJURN
`PLANET
`PAGE
`
`213
`
`FIG.4
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 4 of 18
`
`US 8,516,357 Bl
`
`RECEIVE LINKS TO HYPERLINKED
`DOCUMENTS
`
`IDENTIFY OTHER HYPERLINKED
`DOCUMENTS THAT HAVE MULTIPLE
`FORWARD LINKS TO THE
`HYPERLINKED DOCUMENTS
`
`GROUP THE HYPERLINKED
`DOCUMENTSACCORrnNGTOTHE
`FORWARD LINKS OF THE OTHER
`HYPERLINKEDDOCUMENTS
`
`301
`
`303
`
`305
`
`FIG. 5
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 5 of 18
`
`US 8,516,357 Bl
`
`401
`
`403
`
`FIG. 6A
`
`401
`
`SATURN
`PLANET
`PAGE
`
`I
`
`I
`
`I
`I
`
`I
`
`I
`
`I
`
`I
`I
`
`;
`
`SATURN
`PLANET
`PAGE
`
`403
`
`401
`
`FIG. 68
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 6 of 18
`
`US 8,516,357 Bl
`
`501
`
`503
`
`505
`
`507
`
`509
`
`511
`
`515
`
`RECEIVE LINKS TO HYPERLINKED
`DOCUMENTS AS THE SEARCH SET
`
`EXPAND THE SEARCH SET TO
`GENERATE AN EXPANDED SET
`
`GENERATE A BACK LINK LIST OF
`HYPERLINKED DOCUMENTS THAT
`POINT TO THE EXPANDED SET
`
`CALCULATE SIMILARITY MEASURE
`FOR THE HYPERLINKED
`DOCUMENTS IN THE EXPANDED
`SET
`
`MERGE HYPERLINKED
`DOCUMENTS (OR GROUPS) IN THE
`EXPANDED SET
`
`CALCULATE SIMILARITY MEASURE
`FOR THE MERGED GROUPS IN THE
`EXPANDED SET
`
`NO
`
`513
`
`YES
`
`DISPLAY SEARCH RESULTS IN
`GROUPS
`
`DONE
`
`FIG. 7
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 7 of 18
`
`US 8,516,357 Bl
`
`jaguar
`
`I 0 results
`
`Google Searchi'm feeling lucky
`
`At least 9749 matches for jaguar
`Showing results 1-10, Search took 0.14 seconds
`Clickinz on a red bor searches tbr backlinks (cim:ions).
`How do I interpret the results?
`
`--=:::1 Jaguar Cars Global Home Page
`... Jaguar Cars Official Web Site. Jaguar Cars Official Web Site ...
`... Jaguar Cars Official Web Site. Jaguar Cars Official Web Site ...
`www.jaguarcars.com/ Cached (5k)
`
`t USS Jaguar
`-
`... to the USS Jaguar, the first and largest Star Trek site for kids. I am ...
`... be like a fictitious starship named Jaguar. Here anyone under 18 can join ...
`www.worldkids.net/jaguar/ Cached (5k)
`
`1 The Jaguar Collection - Official Web site
`-
`www.collection.co.ukf Cached (2k)
`
`• Welcome to Jag-lovers- the Jaguar Enthusiasts place on the Internet
`-
`... -INFORMATION Brochures 1 Buyer's Guide I Jaguar Specialists Web!.-
`... (Windows) I Windows Wallpaper I-JAGUAR MODEL RELATED E-Type I Lump ...
`www.jag-lovers.org/ Cached (l4k)
`
`1 Jaguar/Daimler used parts exchange
`-
`... Brochures Buyer's Guide Jaguar Specialists Web ...
`... (Windows) Windows Wallpaper -JAGUAR MODEL RELATED E-Type ...
`www.jag-lovers.org/HyperNews/get/jagswap.html Cached (87k)
`
`-
`
`• Jaguar Interactive II-- The Premier 24-Hour Atari Jaguar Discussion Board
`. Jaguar Interactive II- Hosted by Atari Gaming Headquarters- Other.. .
`.. .Jag-@·Zine I JagFest '98 1 JagFest '991 Jaguar, MA I Jaguar's ...
`www.atarihq.com/interactive/ Cached (30k)
`
`1 Classic Car Source -- Welsh Jaguar Classic Car Museum
`-
`... WELSH JAGUARCLASS1C CAR MUSEUM And JAGGIN' AROUND RESTAURANT AND PUB "A ...
`... CELEBRATION OF THE GLORY DAYS OF THE JAGUAR" The WELSH JAGUAR. ..
`www.ciassicar.com/museums/welshjag/welshjag.htm Cached (3k)
`
`1 Jaguar Club of Florida
`-
`... Welcome to the Jaguar Club of Florida! This i.~ nur place on the ...
`... meetings, sponsored events and activities! Jaguar Cooks 1 -A brand new ...
`www.magicnet.net/-jaguartjcof.html Cached (6k)
`
`-
`
`1 Tomb of the Jade Jaguar Slide Lecture, Maya treasures ofTikal, Guatemala
`
`FIG. 8A
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 8 of 18
`
`US 8,516,357 Bl
`
`... the JADE JAGVAR, TIKAL, GUATEMALA an informative slide lecture on Vlaya ...
`... available. The Tomb of the Jade Jaguar at Tikal (Burial 196, Structure ...
`www.maya-art-books.org/html/jade_Jecture.html Cached (9k)
`
`• Jaguar Sovereign
`-
`... us] Alexandra a 1986 Jaguar XJ6 Sovereign 4.2 IMPORTANT.. .
`... way affiliated with or connected to Jaguar Cars Ltd. The official website ...
`www.bitcon.no/-gunnarisovereign.htmlCached (5k)
`GOOOOOOOOOGLE--+
`Resu!! Page: I 2 3 4 5 6 7 8 9 10 Next ;)age
`
`New query: .
`Jaguar Google Search
`Search within results?
`Try your query on other engines
`AltaVista Excite HotBot lnfoseek Lycos DejaNews Yahoo! Amazon Open Directory eGroups
`
`About Google!
`Copyright© 1999 Google Inc.
`
`FIG. 88
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 9 of 18
`
`US 8,516,357 Bl
`
`603
`
`605
`
`607
`
`FORD MOTOR
`COMPANY
`PAGE
`
`100 HOT
`CAR& TRUCK
`PAGE
`
`PRN
`AUTOMOBILE
`LINKS PAGE
`
`JAGUAR CARS
`GLOBAL PAGE
`
`601
`
`JAGUAR
`CARS EUROPE
`PAGE
`
`JAGUAR
`CARS ASIA
`PAGE
`
`JAGUAR CARS
`SOUTH AMERICA
`PAGE
`
`609
`
`611
`
`613
`
`FIG. 9
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 10 of 18
`
`US 8,516,357 Bl
`
`EXPANDED SEARCH
`SET
`
`FIG. 10
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 11 of 18
`
`US 8,516,357 Bl
`
`801
`\
`
`HASHTABLE
`0
`
`2
`
`E,D
`
`A,B,C
`
`A
`
`• • •
`FIG. 11
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 12 of 18
`
`US 8,516,357 Bl
`
`A
`
`B
`c
`
`D
`
`E
`
`CO-CITATION MATRIX
`A B C D E
`
`1
`
`1
`
`2
`
`851
`
`I
`
`1
`
`FIG. 12A
`
`901
`
`RELATIONSHIP VECTOR
`
`I
`U) (~) O) (n
`FIG. 128
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 13 of 18
`
`US 8,516,357 Bl
`
`CO-CITATION (A, B)_ a NumBL(A) • NumBL(B)
`SIMILARITY (A, B) =
`w
`NumBL(A) • NumBL(B}
`
`EXAMPLE:
`CO-CITATION (A, B) = 1
`NumBL(A) = 2
`NumBL(B) = 2
`a= 10
`w = 100,000,000
`
`SIMILARITY (A, B) =
`
`10. 2. 2
`1 - - -
`100,000,000
`
`F
`
`~ 112
`
`FIG. 13
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 14 of 18
`
`US 8,516,357 Bl
`
`AUGMENTED RELATIONSHIP
`VECTOR
`
`951
`/
`f
`
`(J.s) Us) Uo) Uo)
`FIG. 14
`
`SORTED RELATIONSHIP
`VECTOR
`
`1001
`/
`f
`
`Uo) Oo) (Js) Us)
`FIG. 15
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 15 of 18
`
`US 8,516,357 Bl
`
`MERGE VECTOR
`A B C D E
`
`1051
`
`I F I F I G I G I ~
`FIG. 16
`
`1101 I
`
`CO-CITATION MATRIX AFTER
`THE MERGE
`A F G
`
`2
`
`0
`
`0
`
`A
`
`F
`
`G
`
`FIG. 17
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 16 of 18
`
`US 8,516,357 Bl
`
`Clustering for 'saturn'
`
`Invoking Project Georges ... please wait!
`
`Running grouper ...
`
`Waiting for it to finish ...
`
`Looking for results ...
`
`Reading the results ...
`
`Cleaning up ...
`
`Waiting for cleanup ...
`
`~1201
`
`• Saturn
`pds.jplnasa.gov/planets/welcome/saturn.htm (C:0.866863 P:0.55lll)
`• Cassini: Voyage to Saturn
`www.jpl.nasa.gov/cassini/ (C:0.855673 P:O.S8622l)
`• NSSDC Photo Gallery: Saturn
`nssdc.gsfc.nasa.gov/photo_gallery/photogallery-satum.html (C:0.855624 P:0.492393)
`• Saturn
`ww w.hawastsoc.org/solar/eng/satum.h!Jn (C:0.834942 P:0.51 0658)
`• The 1995-6 Saturn Ring Plane Crossings
`rillgside.arc.nasa.gov/wwwlrpxlrpx.html (C:0.810368 P:0.473472)
`• Saturn
`www.seds.org/nineplanets/nineplanets/sarurn.html (C:0.805337 P:0.55l736)
`• Saturn
`nssdc.gsfc.nasa.gov/planetary/planets/satumpage.html (C:O. 774866 P:0.492622)
`• No Title
`ringside.arc.na~a.govfwwwfsaturn/saturn.html (C:0,755906 P:0.45388)
`• Saturn Fact Sheet
`nssdc.gsfc.nasa.gov/p!anetary/factsheet/satumfact.hnnl (C :0.7 49!32 P:0.4 7 4296)
`• No Tille
`www.jpl.nasa.gov/cassini (C:0.7l0059 P:0.575479)
`• No Title
`
`FIG. 18A
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 17 of 18
`
`US 8,516,357 Bl
`
`newproducts.jpLnasa.govfsatuml (C:0.7055! P:0.483345)
`• Saturn Picture List
`www.seds.org/nineplanets/nineplanetsfpxsat.html (C:0.695906 P:0.49987)
`• Saturn
`spacer.com/nineplanetsfsaturn.html (C:0.688069 P:O .3 61 059)
`• Saturn's spokes in rings
`www.hawas:soc.org/solar!cap!satfspoke.htm (C:0.657886 P:O .416526)
`• Saturn Ring Plane Crossings of 1995-1996 (JPL)
`www.jpl.nnsn.gov/saturnf (C:0.656777 P:0.383i34)
`• Hubble Space Telescope Views Major Storm on Saturn
`www.hawa~tsoc.org/solar/cap/sat/vsatstrm.htm (C:0.65J!63 P:0.386542)
`• Saturn
`www.star.le.ac.uk!edu/nineplanetslsaturn.html (C:0.650382 P:0.341283)
`• Saturn Events
`www.isc.tamu.edu/-astrofsaturn.html (C:0.646718 P:0.464393)
`• Cassini UVIS Inernet Site
`lasp.colorado.edu/cassinif (C:0.645669 P:0.493858)
`• StarDate Online I Solar System Guide I Saturn
`stardate.utexas.edu/resourcesfssguidefsatum.html (C:0.64 P:0.482643)
`• SATURN IMAGES
`vraptor.jpl.nasa.govlvoyager/vgrsat_irr.g.html (C:0.639535 P:0.456519)
`• The Planet Saturn
`wwwflag.wr.usgs.gov/USGSF!ag/Space/wallisaturn.html (C:0.639053 P:0.452598)
`• No Title
`ringside.a:-c.nasa.gov/saturn/satrun.html (C:0.634215 P:0.43450 1)
`• Voyager Saturn Science Summary
`www.hawastosc.org/so lar/eng/v grsat.htm (C:0.628713 P:0.453529)
`• The Solar System - Saturn
`www.jpl.nasa.gov/solarsystem/satum/ (C:0.627907 P:0.523598)
`• Saturn Image & Animation Index
`www.hawastsoc.org/solar/capfsatJ (C:0.61 0045 P:0.386236)
`• Artwork of Saturn and Its Satellites
`www.jpLnasa.gov/cassini/Images/artwork! (C:0.608!87 P:0.446021)
`• Exploring The Planets - Pioneer Encounters Saturn
`www.nasm.edufceps/ETP/SATURN/satpioneer.html (C:0.606299 P:0.408835)
`• Saturn
`www.hpcc.astro. washiogton.edu/mirrors/nineplanets/saturn.html (C:O .587209 P:O .397482)
`• Saturn [OuluJ
`www.oulu.fi/-spaceweb/textbook/saturn.htm (C:0.581992 P:0.36847 5)
`
`• www.jpl.nasa.gov/sarurnfiau6!92.html (C:0.566929 P:0.355337)
`
`1201~
`
`• No Title
`www.sierranet.netf-elengyellastro.html (C:0.566929 P:0.3i5875)
`• No Title
`bang.lanl.govfsolarsys/saturn.htm (C:0.566929 P:0.430716)
`• No Title
`esther.la.asu.edu/asu _tes/TES _Editor/SOLAR_ SYST _ TOUR/Saturn.html (C:O .566929 P:0.442466)
`[nfo about Saturn and Outer Space
`www.jpl.nasa.gov/cassini/Morelnfofoutrchinfo.html (C:0.56 P:0.4!6022)
`• Pseudo-image of Saturn's Ring
`www.hawastsoc.org/solar/cap/sat/s-ring.htm ( C: 0. 5 56922 P:0.292241)
`• Saturn Without a Ring
`
`•
`
`FIG. 188
`
`

`

`U.S. Patent
`
`Aug. 20, 2013
`
`Sheet 18 of 18
`
`US 8,516,357 Bl
`
`www.hawastsoc.org/solar/cap/sat/satnr.htm (C:0.556922 P:0.29224l)
`• Saturn
`www.pitagoras.g 12.br/planetas/nineplanets/satum.html (C:0.554887 P:0.291173)
`• Saturn Images Menu
`emma.la.asu.edu/SOLAR~SYST_TOUR/Satum.html (C:0.551181 P:0.45784')
`• Saturn
`www.star.le.ac.uk/edu/solar/saturn.html (C:0.54 386 P:0.385916)
`• Saturn Viewer 1.1 Short Form
`ringside.arc.nasa.gov/www/rpx/viewer/shortform .html (C:O. 543 86 P:0.393 744)
`
`• No Title
`nssdc.gsfc.nasa.gov/photo _gallery/PhotoGallery-Saturn.html (C:0.535433 P:0.3 81491)
`• Saturn's Possible New Moons
`www.seds.org/nineplanets/nineplanets/newsats.html (C:0.535433 P:0.482292)
`
`~
`1201
`
`• Welcome to the Saturn Cars Web Site
`www.saturncars.com/ (C:0.997514 P:0.557076)
`
`• www.saturncars.com/communicationlindex3.htrnl (C:0.952756 P:0.508095)
`
`• The 1998 Saturn Line
`www.saturncars.com/car/models/ (C:0.666667 P:0.4589 15)
`e Saturn's Frequently Asked Questions
`www.saturncars.com/communication/faq.html (C:0.597633 P:0.432074)
`• Saturn of Arkansas
`www.saturnar.com/ (C:0.594143 P:0.31 I 772)
`• No Title
`www.gmcanada.com/eng!ishlvehicles/satu.html (C:0.584 795 P:O .418006)
`• Saturnalia -the Saturn Enthusiasts Site
`www.erols.com/core/ (C:0.581 395 P:0.353033)
`• Saturn of Fremeont Home Page
`www.saturnfr.com/ (C:0.579882 ?:0.392782)
`• About the SATURN web site
`www.saturncars.com/about/ {C:0.55118J P:0.411856)
`• Saturn Albany Home Pao-e
`www.saturnalbat;y.com/ (C:0.5?J 181 P:0.317662)
`
`1203
`
`• Sega Online- Products: Saturn
`www.sega.com/products/hardware/saturn/ (C:0.700085 P:0.402884
`• Game Revolution Sega Saturn Reviews
`www.game-revoluliou.com/games/saturn/saturn.htm (C:0.685039 P:0.465736
`• Phil's site--Sega Saturn World
`www.mcn.net'-montanasites/segaSaturn.htm (C:0.657798 P:0.345 I 74)
`e New and Used Video Games at Used GAMES!- SEGA SATURN:-)
`www.usedvideogame.com/segalsaturn/sat.htm (C:0.62992J P:0.343644)
`• Sega Saturn Index (Sega Force: Index: Systems: Saturn)
`www.cyberdrive.netf-gskalbalsatum.htm (C:0.543307 P:0.325 I 24)
`
`>- 1205
`
`FIG. 18C
`
`

`

`US 8,516,357 Bl
`
`1
`LINK BASED CLUSTERING OF
`HYPERLINKED DOCUMENTS
`
`RELATED APPLICATIONS
`
`This application is a continuation of U.S. patent applica(cid:173)
`tion Ser. No. 11/693,522, filed Mar. 29, 2007, which is a
`continuation of U.S. patent application Ser. No. 09/636,052,
`filed Aug. 10, 2000 (now U.S. Pat. No. 7,213,198), which
`claims the benefit of U.S. Provisional Application No.
`60/148,854, filed Aug. 12, 1999, the disclosures of which are
`hereby incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`2
`forces the user to go through extra hoops in specifying her
`query. The user is in fact forced to think like a textual search
`engine and attempt to guess what words would be included in
`the topic she is interested in and not in other topics matching
`5 her original query. Guessing at the search engine's behavior
`can become an iterative and counterproductive nightmare.
`As the size of the Web continues to increase, it becomes
`increasingly more desirable to have innovative techniques for
`efficiently grouping hyperlinked documents by topic. By dis-
`10 playing the links to web pages grouped by topic, a search
`engine can provide some coherence (i.e., not jump from one
`topic to another and then back again) to the search results.
`Additionally, it would be beneficial to have techniques that
`are efficient at grouping search results by topic by analyzing
`15 the link structure of the Web.
`
`The present invention relates to grouping (or clustering)
`hyperlinked documents. More specifically, the invention
`relates to techniques for grouping hyperlinked documents
`from a search so that the links to hyperlinked documents
`about the same (or similar) topic can be displayed together.
`The World Wide Web (or "Web") contains a vast amount of
`information in the form ofhyperlinked documents (e.g., web
`pages). One of the reasons for the virtually explosive growth
`in the number ofhyperlinked documents on the Web is that
`just about anyone can upload hyperlinked documents, which 25
`can include links to other hyperlinked documents. Although
`there is no doubt that there is a vast amount of useful infor(cid:173)
`mation on the Web, the unstructured nature of the Web can
`make it difficult to find the information that is desired.
`A search engine attempts to return relevant information in
`response to requests from users. These requests usually come
`in the form of queries (e.g., sets of words that are related to the
`desired topic). Search engines typically return a number of
`links to web pages, with a brief description of those pages.
`Because the number of pages on the Web is huge, ensuring 35
`that the returned pages are relevant to the topic the user had in
`mind is a central problem in web searching. Possibly the
`simplest and most prevalent way of doing this is to find web
`pages containing all or many of the words included in the
`query, which can be called text-based searching.
`Text-based searching over the Web can be notoriously
`imprecise and several problems arise in its use. To begin with,
`usually a large number of web pages match the user's query.
`Displaying all of these pages to the user becomes impractical
`and some method of ordering these results should be used. 45
`Methods that assess and use the quality of the pages, returning
`only well-linked pages for example, can significantly
`improve the quality of the returned results. However, the
`returned results can still range over a number of different
`topics, only one of which the user had in mind.
`Consider, for example, a search query including only the
`word "Saturn." This query can refer to the Saturn brand of car,
`the planet Saturn, the Sega Saturn game system, or the Roman
`god Saturn. Most likely, the user is interested in only one of
`the above topics. However, a search engine searching for this 55
`word would come up with a mishmash of results from among
`all of these topics. A user interested in the Saturn car would
`have to wade through many irrelevant search results to get the
`information she desires.
`One solution to this problem utilizes user feedback. After 60
`the user notices that the results include links from a number of
`different topics, the user can narrow or focus the search to
`only the one topic in which the user is interested. For example,
`the user could add the word "car" to the query. This solution
`has at least two negative aspects. First, it can exclude high
`quality relevant pages, which include the word Saturn but not
`"car" (e.g., uses "automobile" instead). More importantly, it
`
`SUMMARY OF THE INVENTION
`
`20
`
`The present invention provides innovative techniques for
`grouping hyperlinked documents. In general, links near or in
`the neighborhood of the hyperlinked documents are analyzed
`in order to group the hyperlinked documents by topic. For
`example, links that are search results can be grouped by
`identifying other hyperlinked documents that have multiple
`forward links to the search results. The search results can then
`be grouped according to the forward links of the other hyper-
`linked documents. Embodiments of the invention can there(cid:173)
`fore utilize the existing link structure of the Web to group the
`search results according to topic. Some specific embodiments
`30 of the invention are described below.
`In one embodiment, the invention provides a computer
`implemented method of grouping hyperlinked documents.
`Links to hyperlinked documents are received and other hyper(cid:173)
`linked documents are identified that have forward links to the
`initial hyperlinked documents. The hyperlinked documents
`are grouped according to the forward links of the other hyper(cid:173)
`linked documents. Where the hyperlinked documents are
`results from a search, the links to the hyperlinked documents
`can be displayed in groups separated graphically on the
`40 screen.
`In another embodiment, the invention provides a computer
`implemented method of grouping hyperlinked documents.
`An initial set of links to hyperlinked documents is received.
`The initial set is expanded to include links to additional
`hyperlinked documents that have a forward link to or are
`pointed to by a hyperlinked document in the initial set,
`thereby generating an expanded set oflinks. A back link list of
`links to hyper! inked documents is identified that have forward
`links to the hyperlinked documents in the expanded set. The
`50 hyperlinked documents in the initial set are then grouped
`according to the forward links of the hyperlinked documents
`in the back link list of links. In other cases, the hyperlinked
`documents may be grouped according to link information that
`is more than one level back from those documents.
`In another embodiment, the invention provides a computer
`implemented method of grouping hyperlinked documents.
`Groups of hyperlinked documents are received, each group
`including at least one hyperlinked document. A similarity
`measure is calculated between a pair of groups ofhyperlinked
`documents, the similarity measure based on the number of
`hyperlinked documents that have forward links to each of the
`pair of groups ofhyperlinked documents. The pair of groups
`ofhyperlinked documents are merged if the similarity mea(cid:173)
`sure crosses a threshold. In some embodiments, the similarity
`65 measure can be reduced by the product of the number of
`forward links to each of the pair of groups of hyperlinked
`documents.
`
`

`

`US 8,516,357 Bl
`
`3
`In another embodiment, the invention provides a computer
`implemented method of grouping hyperlinked documents.
`Links to hyperlinked documents are received. The hyper(cid:173)
`linked documents are grouped according to forward links to
`the hyperlinked documents in other hyperlinked documents.
`The links to the hyperlinked documents are displayed in
`groups.
`Other features and advantages of the invention will become
`readily apparent upon review of the following description and
`association with the accompanying drawings, where the same 10
`or similar structures are designated with the same reference
`numerals.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`4
`geously applied to grouping hyperlinked documents on a
`local intranet for a number of diverse uses, such as for orga(cid:173)
`nizing hyperlinked documents at an institution. Therefore,
`the description of the embodiments that follows is for pur(cid:173)
`poses of illustration and not limitation.
`FIG.1 illustrates an example of a computer system that can
`be used to execute the software of an embodiment of the
`invention. FIG. 1 shows a computer system 1 that includes a
`display 3, screen 5, cabinet 7, keyboard 9, and mouse 11.
`Mouse 11 can have one or more buttons for interacting with a
`graphical user interface. Cabinet 7 houses a CD-ROM drive
`13, system memory andaharddrive(seeFIG. 2)whichcan be
`utilized to store and retrieve software programs incorporating
`computer code that implements the invention, data for use
`15 with the invention, and the like. Although CD-ROM 15 is
`shown as an exemplary computer readable storage medium,
`other computer readable storage media including floppy disk,
`tape, flash memory, system memory, and hard drive can be
`utilized. Additionally, a data signal embodied in a carrier
`20 wave (e.g., in a network including the Internet) can be the
`computer readable storage medium.
`FIG. 2 shows a system block diagram of computer system
`1 used to execute the software of an embodiment of the
`invention. As in FIG. 1, computer system 1 includes monitor
`3 and keyboard 9, and mouse 11. Computer system 1 further
`includes subsystems such as a central processor 51, system
`memory 53, fixed storage 55 (e.g., hard drive), removable
`storage 57 (e.g., CD-ROM drive), display adapter 59, sound
`card 61, speakers 63, and network interface 65. Other com-
`30 puter systems suitable for use with the invention can include
`additional or fewer subsystems. For example, another com(cid:173)
`puter system could include more than one processor 51 (i.e.,
`a multi-processor system) or a cache memory.
`The system bus architecture of computer system 1 is rep-
`35 resented by arrows 67. However, these arrows are illustrative
`of any interconnection scheme serving to link the subsystems.
`For example, a local bus could be utilized to connect the
`central processor to the system memory and display adapter.
`Computer system 1 shown in FIG. 2 is but an example of a
`40 computer system suitable for use with the invention. Other
`computer architectures having different configurations of
`subsystems can also be utilized.
`FIG. 3 shows a network of multiple computer systems. A
`network 101 provides communication between multiple
`45 computer systems 1. In a wide area network such as the
`Internet, some of the computer systems are servers (or hosts)
`and provide access to resources or services to client computer
`systems on the network. With respect to web pages, there are
`multiple server computer systems that store the web pages
`50 that make up the Web. The web pages typically include links
`in the form of uniform resource locators (URLs) that are a link
`to another web page, whether it is on the same server or a
`different one.
`As described above, the Web is a distributed network of
`55 web pages. Networks ofhyperlinked documents can also be
`present in local area networks (e.g., intranets ). The operation
`of these intranets is very similar to the Internet except that it
`is not uncommon for all or a majority of the hyperlinked
`documents of an intranet to be stored on a single server
`60 computer system.
`Clustering is the general term for a technology that identi(cid:173)
`fies groups of similar objects among a given set. For example,
`consider the words {cat, dog, airplane, bus, automobile,
`giraffe }.A clustering of this set could divide it into two parts.
`65 One part, {cat, dog, giraffe}, would describe animals, while
`the other, {automobile, bus, airplane} would describe
`vehicles of travel. This is essentially what the invention can
`
`FIG. 1 illustrates an example of a computer system that can
`be utilized to execute the software of an embodiment of the
`invention.
`FIG. 2 illustrates a system block diagram of the computer
`system of FIG. 1.
`FIG. 3 illustrates a network of multiple computer systems
`including wide area networks and local area networks.
`FIG. 4 shows an example of links between web pages
`relating to Saturn cars and the planet Saturn.
`FIG. 5 shows a flow chart of a process of grouping hyper- 25
`linked documents.
`FIG. 6A shows of a web page on the planet Saturn having
`a forward link to another web page on the planet Saturn and
`FIG. 6B shows how the links can be considered to enhance
`grouping the web pages by topic.
`FIG. 7 shows a flow chart of a process of grouping hyper(cid:173)
`linked documents that have been obtained from a search and
`utilizes a similarity measure to penalize popular web pages
`and groups.
`FIGS. SA and SB show a web page example of the search
`results for the query ')aguar."
`FIG. 9 shows additional web pages in the neighborhood of
`a Jaguar cars global page.
`FIG. 10 shows an example of a back link list ofhyperlinked
`documents that have forward links to an expanded search set
`ofhyperlinked documents.
`FIG. 11 shows a hash table that can be utilized to organize
`the relationships between the web pages in FIG. 10.
`FIG. 12A shows a co-citation matrix and FIG. 12B shows
`a relationship vector.
`FIG. 13 shows an example of an equation that can be
`utilized to measure the similarity of hyperlinked documents
`or groups ofhyperlinked documents.
`FIG. 14 shows an augmented relationship vector.
`FIG. 15 shows a sorted relationship vector.
`FIG. 16 shows a merge vector.
`FIG. 17 shows a co-citation matrix after the merge.
`FIGS.18A-18C show a web pageexampleofsearchresults
`being grouped by topic.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`In the description that follows, the present invention will be
`described in reference to embodiments that group hyper(cid:173)
`linked documents (e.g., web pages). More specifically, the
`embodiments will be described in reference to embodiments
`that group the web pages from a search by topic so that the
`user can better find the desired information. However,
`embodiments of the invention are not limited to any particular
`environment, application or specific implementation. For
`example, the embodiments described below can be advanta-
`
`

`

`US 8,516,357 Bl
`
`5
`provide with respect to web pages, grouping together sets of
`web pages that are on the same topic. In our example however,
`we used our background knowledge of the given entities to do
`this grouping ourselves. In the case of web pages, it would be
`desirable to provide a suitable alternative to this knowledge 5
`base that lends itself to being automated. One aspect of the
`invention is that the link structure of the Web itself provides
`such information.
`Consider that pages on the same general topic are likely to
`be linked to from the same set of pages. Take, for example, the 10
`Saturn query described above. It makes sense that two pages
`on the Saturn car could be linked to from a common page
`(e.g., a page reviewing cars). However, it is extremely
`unlikely that any page on the Web links to one page talking
`about the Saturn automobile and another about the Sega Sat- 15
`urn game system. Although this can happen, statistical meth(cid:173)
`ods can be used to discount the effects of such an unlikely
`occurrence. Thus, by utilizing the link structure of the Web,
`pages that are on the same topic can be grouped together.
`Standard clustering techniques usually require the defini- 20
`tion of a distance between pairs of the objects to be clustered.
`Pages that are close together are then more likely to be about
`the same topic than those that are far apart. One way to define
`this distance is to consider the co-citation number of a pair of
`web pages. The co-citation number is the number of pages 25
`that link to both pages simultaneously. The higher this num(cid:173)
`ber, the more related are the pair of pages, and thus the smaller
`the distance between them should be. Therefore, the inverse
`of the co-citation number can be utilized as a definition of
`distance.
`FIG. 4 shows an example of links between web pages
`relating to Saturn cars and the planet Saturn.

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket