`6,144,962
`Weinberg etal.
`Nov. 7, 2000
`[45] Date of Patent:
`
`[11] Patent Number:
`
`US006144962A
`
`[54] WISUALIZATION OF WEB SITES AND
`HIERARCHICAL DATA STRUCTURES
`
`[75]
`
`Inventors: Amir Weinberg, Zoran; Michael
`Pogrebisky, Herzliya, both of Israel
`
`[73] Assignee: Mercury Interactive Corporation,
`Sunnyvale, Calif.
`
`[21] Appl. No.: 08/843,265
`
`[22]
`
`Filed:
`
`Apr. 11, 1997
`
`Related U.S. Application Data
`[60]
`Provisional application No. 60/028,474, Oct. 15, 1996.
`PSD] Um, C07 ccecccccscccssssssssccssssessssesssssesesee GO6F 17/30
`[52] U.S. Che ceecccccccssscssseeees 707/10; 707/104; 345/356;
`345/357
`[58] Field of Search............... 707/10, 104; 395/200.48,
`395/200.49, 200.54; 345/356, 357
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`3/1994 Simonetti .
`5,295,261
`.
`2/1995 Pytlik et al.
`5,388,255
`5/1996 Hoppeetal. .
`5,515,488
`6/1996 Yokohama.
`5,524,202
`8/1996 Formanetal. .
`5,544,310
`8/1996 Bowerset al.
`.
`5,546,529
`5,590,250 12/1996 Lamping etal. .
`5,619,632
`4/1997 Lamping et al. wwe 395/141
`5,870,559
`2/1999 Lesham et al. wee 395/200.54
`
`OTHER PUBLICATIONS
`
`Product Brochure For Graph Layout Toolkit from Tom
`Sawyer Software, 8 pages (undated).
`“Getting Started” Manual for Netcarta WebMapper 1.0 for
`Windows NT/95, dated 1996.
`User’s Guide for NetCarta WebMapper 1.0 for Windows
`NT/95, dated 1996.
`
`(List continued on next page.)
`Primary Examiner—Paul R. Lintz
`Attorney, Agent, or Firm—Knobbe, Martens, Olson & Bear
`LLP
`
`[57]
`
`ABSTRACT
`
`implemented as a
`A visual Web site analysis program,
`collection of software components, provides a variety of
`features for facilitating the analysis and management of Web
`sites and Web site content. A mapping component scans a
`Web site over a network connection and builds a site map
`which graphically depicts the URLs and links of the site.
`Site maps are generated using a unique layout and display
`methodology which allows the user to visualize the overall
`architecture of the Web site. Various map navigation and
`URLfiltering features are provided to facilitate the task of
`identifying and repairing common Website problems, such
`as links to missing URLs. A dynamic page scan feature
`enables the user to include dynamically-generated Web
`pages within the site map by capturing the output of a
`standard Web browser whena form is submitted by the user,
`and then automatically resubmitting this output during sub-
`sequent mappingsofthe site. The Web site analysis program
`is implemented using an extensible architecture which
`includes an APIthat allows plug-in applications to manipu-
`late the display of the site map. Various plug-ins are pro-
`vided which utilize the API to extend the functionality of the
`analysis program,
`including an action tracking plug-in
`which detects user activity and behavioral data (link activity
`levels, commonsite entry and exit points, etc.) from server
`log files and then superimposes such data onto the site map.
`
`67 Claims, 24 Drawing Sheets
`
`Discovering Web Access Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xim and Han, Proceedings of the IEEE Forum on Research
`and Technology Advances in Digital LibrariesIEEE ADL
`°98), pp. 19-29 (1998).
`Tree Visualization with Tree—-Maps: 2-d Space—Filling
`Microfiche Appendix Included
`Approach, Shneiderman, Ben, ACM Transactions on Graph-
`(1 Microfiche, 45 Pages)
`ics vol. 11, No. 1, Jan. 1992, pp. 92-99.
` Mercury] - Acta
`Fie Yew Sean Mop URL Tock Heb
` oles [e| fs
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of
`
`46
`
`
` 47
`
`
`
`
`
`
`Se BEeE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1
`
`Exhibit 1038
`Samsung v. DoDots
`IPR2023-00701
`
`1
`
`Exhibit 1038
`Samsung v. DoDots
`IPR2023-00701
`
`
`
`6,144,962
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`Product packaging (front and back) for NetCarta WebMap-
`per 1.0, dated 1996.
`“Getting Started” Guide for InContext WebAnalyzer 1.0,
`dated 1996.
`Print—out of online help manual for InContext WebAnalyzer
`1.0 product (taken from CD-ROM dated 1996).
`Product packaging (front and back) for InContext WebAna-
`lyzer 1.0, dated 1996.
`Article titled “Top Tools to Manage your Web Site,” Net-
`guide Magazine, Apr., 1997.
`Selected documents from Microsoft.com Website describ-
`ing Microsoft FrontPage 1.1, downloaded and printed on
`Oct. 8, 1996 (5 printed pgs.).
`Documenttitled “XSoft Licenses Information Visualization
`Technology to NetCarta Corp.”, dated Sep. 17, 1996, printed
`from NetCarta.com Website.
`Pressreleasetitled “Ararat Software Announces Availability
`of Inwebstigator 1.0”, dated Sep. 23, 1996, printed from
`Ararat.com Website.
`Documenttitled “Incontext Announces Webanalyzer for
`Windows 95”, dated Feb. 15, 1996.
`Press release titled “Mercury Interactive Announces Indus-
`try’s First Testing Technology for the Web”, dated Jan. 1,
`1996.
`
`Press release titled “Mercury Interactive Make Stress Test-
`ing a Web Site Simple and Affordable; Astra SiteTest Gen-
`erates over 4 Million Hits Using a Single Windows 95 or NT
`Work Station”, dated Dec. 9, 1996.
`Press release titled “Out-of—Control Web Sites Made More
`
`Effective, Profitable with New Web Management Tool;
`Mercury Interactive’s Astra Instantly Pinpoints Problems
`and Displays User—Behavior Patterns for Higher Quality
`Web Sites”, dated Oct. 9, 1996.
`Press release titled “Tool For Webmasters Provides Open
`API For Java, C++ and Visual Basic Developers; Mercury
`Interactive Announces Astra—Industry’s First, Comprehen-
`sive, Graphical Web Management Tool”, dated Oct. 9, 1996.
`Press releasetitled “Webmasters Eager for Site Management
`Solutions Rally Around New Web Management Tool; Mer-
`cury Interactive Ships Astra SiteManager, Developers Begin
`Leveraging Open API”, dated Dec. 9, 1996.
`Product brochure for Astra SiteManager, dated 1997.
`Product brochure for net-Analysis product of net.Genisis
`(undated).
`Product brochure for CyberPilot Pro Product of NetsCarta
`(undated).
`Article entitled “Web site management betas show major
`advances,” dated Nov. 11, 1996 in PC Week.
`
`2
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 1 of 24
`
`6,144,962
`
`
`
`
`
`
`
`OPHsootTHMdewueosMalaay
`
`
`
`Riysy-Linsey2F
`
`OV
`
`
`rr.12]
`
`
`
`
`
`
`
`
`3
`
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 2 of 24
`
`6,144,962
`
` CcOIA
`gEE>ellealdSaaCeeaeeaa
`
` __glo}leGEES»[e[s)[els[=[o)
`
`“onpolgerfunduegyLyPyte
`
`deHsectTWAdeWveoMaRey
`
`4
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 3 of 24
`
`6,144,962
`
`
`
`
`
`|
`[Sel=]|e]a]e[o] OeHSOTUAdewUeOswaaony
`‘onpoldelsAind19Wa:eh
`iNSUUINIX
`Ss
`
`
`Wo[ovo][eA][+]-<]
`
`AOLednFv
`
`
`
`enry-Léinaioy
`
`5
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 4 of 24
`
`6,144,962
`
`
`
`~~fuoneoydde
`
`“yuojesydde
`
`~yuoneaydde
`
`[ALY/Rey
`
`j6abeut
`
`MUZAK}
`
`jByabeur
`
`jOyabeun
`
`pOabou
`
`Aunsuayy
`
`pFTY
`“UIQSAORIa}U|FindapywyKapUIBUUSs099:doqaaqss:dyy
`
`
`
`SAVOBayU|fMoe—faiNyooq”onpodss}anpoid9g9doqaaqys:dyy
`
`“dug-aAgaeayyy
`
`
`
`
`jeroueuyfuedwossggg:doqaaq;?:dyyqpeay~“aunysoiqyonpoud/syonpoidsn9g:doqaaq/s:dyyjpd'snyoedsoid§~-gedsoid
`
`fuedwoo~fuedwoo/sisuueqssyonpaidsggq-doqaaq?7:dny
`jpd'ixoid|jpdfxoidyjeueuysiuedwoone9-doqaaqs/:dyy
`
`PAO,«=PayLfelouPuy/iuedwios/ogg:doqaaqes:dny
`
`
`
`UOoI|eqpO-uooyeqsieuuas4999:doqeaqs/:dyy
`
`
`
`
`$203jO's20/9eulas799g:doqaaqss:dyy
`
`isAY”
`
`Auedwo3
`
`aullU
`
`
`
`BAIEaqU)AUNTY]
`a—_
`
`6
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 5 of 24
`
`6,144,962
`
`S‘DIA
`
`
`
`@w@[leo]eal[els]PePole][ele/a]
`
`1AINdIOY2
`
`diesolqyndeyueoGMeaary
`
`ensy-
`
`
`
`7
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 6 of 24
`
`6,144,962
`
`9DIA
`
`aENALacey
`
`
`aeURNA22UTAA“Teowt4a(al
`UCOTESSeng24|AV,
`
`8
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 7 of 24
`
`6,144,962
`
`
`
`Ld
`
`|SS—waANgS||
`Ney
`waANasLe
`LISGam
`gam°|:i
`gam|
`&wold
`
`YOLINYSLNI
`
`LANVYLNI
`
`C6.
`
`36JoD}SNI-9M1deWZOod9
`3ynLns‘1Tm}foldeYMgdo}U
`4dd31a|]P]
`|p}4d|5oO
`
`Ss‘71i27Vv
`dD3!,a}!
`
`POaIdV
`
`JYNOOVYLSV
`
`(MNOMINVYS)
`
`
`
`
`
`(Od)YSLNdWOOLNIMO
`
`9
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 8 of 24
`
`6,144,962
`
`ASTRA OBJECT
`
`GA
`
`174
`
`SITE GRAPH OBJECT
`
`1719
`
`IIS
`
`i EDGES OBJECT
`
`L NODES OBJECT
`
`776
`
`775
`
`So
`
`L EDGE OBJECT
`
`| NODE OBJECT
`
`FIG. 8
`
`10
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 9 of 24
`
`6,144,962
`
`PLL
`
`HdVuo
`
`SYNLONYLS
`
`
`
`J9VdJNOH
`
`Tun
`
`QVAYHLNIV
`
`ONINNVOS
`
`QVAYHL
`
`ININNVOS
`SININNVOS
`
`QVAYHL
`
`QVAYHL
`
`col
`
`11
`
`
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 10 of 24
`
`6,144,962
`
`SCAN URL
`
`
`
`
`
`URL
`PREVIOUSLY
`MAPPED
`
`
`
`
`
`
`
`REQUEST URL
`SEND COMMAND
`SEND COMMAND
`HEADER FROM
`
`"GET URL HEADER
`"GET URL IF
`REQUEST HTML
`
`
`
`
`IF MODIFIED
`LAST MODIFIED
`FILE FROM
`SERVER
`
`
`
`
`AFTER <DATE/TIME>"
`AFTER<DATE /TIME>"
`SERVER
`
`
`
`TO SERVER
`TO SERVER
`
`
`
`
`
`
`
`
`
`
`WAIT FOR SERVER RESPONSE;
`GENERATE STATUS CODE IF
`TIMEOUT OR IF ERROR
`CODE RETURNED
`
`HTML
`FILE RETURNED >YES
`?
`
`
`PARSE HTML
`
`FIG. 10
`
`12
`
`12
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 11 of 24
`
`6
`
`144,962
`
`YSAYN5SDam
`
`Tt“Old
`
`13
`
`oO.
`
`OLL
`
`
`
`YSLAdWOOLNAITO
`
`YASMOYNEGIM
`
`(AXONd)
`
`VHLSV
`
`13
`
`
`
`
`
`Sheet 12 of 24
`
`96
`
`144,962
`
`U.S. Patent
`
`
`
`YALADWOODLNANS
`
`oO
`
`Nov.7, 2000
`
`OS!OINYNAG
`
`v39Vd
`
`
`
`YSAYASJIM“
`
`
`
`NOISNALX3JAYLSIDSY
`
`LNAINOdWODFEL
`
`olOld
`
`14
`
`
`
`
`
`
`
`AXOUdSVVULSVLASOLAYLSIOZYNINOILVYNDIANODYASMONDSAIZIGOWVULSVVv
`
`
`
`
`
`WYO430NOISSINENSOLASNOdS3YNIVYLSVOLFOVSSSWdlLHS3ASSVdYISMONSO
`
`
`
`
`
`AYLSIDSYNIHLIMNOILVYNSISNOD
`
`
`
`VYLSV4ASA3YNLASYOsHdVYDALISNISAYOLSUNVL3ASVLIVGSLOVYLXSVYLSV‘d
`
`
`YSAYNaSPSMOLJOVSSSWdLLHSGYVMYO4
`
`
`
`
`
`
`
`
`
`
`
`
`YSISMONSTWNIDINOSAYOLSSNN3HLONV“YSSMONESSHONNVTVULSV‘g
`
`
`
`
`
`
`
`
`
`
`
`SNISMONSYVINDAYJONOISSAYdW!ALVAYDOLYSSMONEOLADVdSAYVMYOSVULSV‘H
`
`
`
`
`
`
`
`
`
`SNYNLAYYSANZSGam‘JHdV¥DSLISSALV0dNGNV39VdGALVYSNSD-ATIVOINVNACSASHVdYSANASSSM9JOVdGAMCSLVYSNSD-ATIVOINVNAG
`
`
`
`
`
`
`
`
`
`
`
`14
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 13 of 24
`
`6,144,962
`
`FAteleelelolSleleleulelalelsleleelala)LadcSO0]U| @w|[o[oo]Al
`
`[S]<][oPols][elele/o]
`
`15
`
`15
`
`
`
`U.S. Patent EALelaelololelelelelleleeeasleua
`
`advosjanFy©[ool]eal1)Pele[see15)
`
`oe
`
`suondGsymurpooHof)mhmHond
`
`ée[ecg]ew
`peoiaypaPert04%||2Sad
`
`apRaopuyyAoyanq
`
`7
`
`Nov.7, 2000
`
`Sheet 14 of 24
`
`6,144,962
`
`981
`
`16
`
`16
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 15 of 24
`
`6,144,962
`
` w/e[©[w[O]
`
`
`
`
`
`[eA][5-[-<][ee[S][@[ale]o
`
`FSLela]eleloleielelelvicieeaeegele
`
`\\|
`
`erecScI
`
`17
`
`17
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 16 of 24
`
`6,144,962
`
`Aunaaayy wks)[Y[10]
`
`
`
`
`
`~BUIJUDaAdjaesayU)
`
`18
`
`[a][+]-4]elTs][ee/2[a]
`
`18
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 17 of 24
`
`6,144,962
`
`
`
`APPLY FILTER
`
`(OR COMBINATION
`
`
`
`
` SET VISIBILITY ATTRIBUTE
`OF ALL NODES AND
`LINKS TO "HIDDEN"
`
`OF FILTERS)
`
`
`
`
`
`FOR EACH NODE (URL) WHICH
`
`
`SATISFIES FILTER, SET VISIBILITY
`
`ATTRIBUTE TO "ON" AND SET
`
`
`COLOR ATTRIBUTE TO
`HIGHLIGHTED COLOR
`
`
`
`SET VISIBILITY ATTRIBUTE TO
`"ON" FOR ALL NODES AND LINKS
`
`
`OF VISUAL WEB DISPLAY MAP
`
`
`THAT ARE NEEDED TO MAINTAIN
`CONNECTIVITY TO HOME PAGE NODE
`
`
`19
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 18 of 24
`
`6,144,962
`
` lo)[oolo}fal
`
`8.OI
`
`
`
`[S]<)ele[2][elele[o
`
`20
`
`20
`
`
`
`
`
`
`
`
`
`SyullPaiojoounspiy[A]=SY)Peojoounapi[>
`
`
`syyaiowlogs[|
`syce6_|smceape[|
`
`puaBba]10jo5
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 19 of 24
`
`6,144,962
`
`og
`
`21
`
`21
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 20 of 24
`
`6,144,962
`
`
`
`PROCESS LOG FILE
`
`TO DETERMINE LINK
`ACTIVITY LEVELS
`
`
`
`READ NEXT LINE
`OF LOG FILE
`
`ZO
`
`
`
`
`
`
`
`
`HAS
`
`USER PREVIOUSLY
`
`VISITED SITE
`?
`
` DOES LINK
`
`
`EXIST BETWEEN
`PRESENT URL
`AND PRIOR URL
`?
`
`
`
`INCREMENT “HITS”
`ATTRIBUTE OF LINK
`
`RETURN
`
`
`
`
`
`FIG. 20
`
`22
`
`22
`
`
`
`ea SSPRISERSEE
`uae+aPeFeq-[dELwaspreduroga“ndDairomIMOayaaay
`PVA,MRE,suaessroimntcone“d3HfeI8Aroum
`
`
`
`MOUSTeyBPuRul
`
`igErsteiiaAEESSaINGwuroamCas
`
`PTCelelolololleeeelslaelelele
`
`
`
`=“onpoudanyzeuauyUncuay
`
`x
`
`“aqsmmuaaqualis)[(S|[ed“jdPaUauosgpaonng
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 21 of 24
`
`6,144,962
`
`Sur]SYN
`
`AOFpeypowupAaPredgfa[APYPWMlAANA
`
`sa¥diINOoYelWOOL
`
`IcDI
`
`MINGmypeayArca
`
`WINSIAAN
`srciayeS[=
`Fuoco
`
`troy
`
`23
`
`23
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 22 of 24
`
`6,144,962
`
`fe]@
`“DuyteyanyLeeCeNeLaresAnrep“CuuzeamenAura|“Duneayeae]Karey“a“euy=eyanNPengLary“al“OUNSe|arenaKeyclqlonlalaJy
`
`
`
`
`7erdn6/byysiauyedsogg:doqaaqys:day
`
`
`
`
`ssagons/punooyeq/suedwoo/pgg:doqaaq/7:diy
`PeTalKanabyyangBAPEELaudpyEPODBAPeayLanny
`“q2wopapEE)Arey“bPyePEE]Arey=(a)\a/
`
`yyHEpUlyeyeQLOneAYbIy/sieued/QRg-dogeedqs/:dyy
`(wayKepul/ajesc/biyysieuped/Qgg:doqeaqy/ayy
`
`
`{unyxepul/eydngby/sseuped/ogg-doqesqs/.diyy
`/dy/by/susupedsge9doqesqys:dyjy
`
`
`
`uyTeeeTeredeneee:Goqeoqy/'“dy
`
`
`RT1SayP/UEVOBS)Goqasg//-CaY
`
`janRepul/|H/bey/siauped/jag:doqaaqy/:day
`
`
`/RyePuCnaubay/sreuedsggg:doqeaqs/:dyy
`
`{6weqaeu/ssidwoddnsqgg:doqaeqyyday
`fd;fuedwos/pgg-doqaeq//.day
`
`/2]0Ro/buysaupedsgeg:doqaaqschy
`
`pBleqaeu/saidsfuedwoss999:doqaaqs7:dyy
`
`
`
`DuoQaIPED]LEDPLUEAPEDB]hin=eDyy
`
`ccDIA
`
`24
`
`24
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 23 of 24
`
`6,144,962
`
`
`——aysOyayngal
`
`gafedLabed
`
`
`Mouednc,zafiedakEN\Idaaqoe13}u|kino
`
`
`
`
`UeCJIUUNIX
`
`
`25
`
`25
`
`
`
`
`
`U.S. Patent
`
`Nov.7, 2000
`
`Sheet 24 of 24
`
`6,144,962
`
`
`
`
`a6 ~at6
`0Se004feoofaSO0aAoyeVI6
`
`OOg
`
`véDIA
`
`26
`
`26
`
`
`
`6,144,962
`
`1
`VISUALIZATION OF WEB SITES AND
`HIERARCHICAL DATA STRUCTURES
`
`PRIORITY CLAIM
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/028,474 titled SOFTWARE SYSTEM
`AND ASSOCIATED METHODS FOR FACILITATING
`THE ANALYSIS AND MANAGEMENTOF WEBSITES,
`filed Oct. 15, 1996, which is hereby incorporated by refer-
`ence.
`
`MICROFICHE APPENDIX
`
`This specification includes a microfiche appendix con-
`sisting of 1 sheet with 45 frames) which contains a partial
`source code listing and an API
`(application program
`interface) listing of a preferred embodimentof the invention,
`as Appendices A and B, respectively. These materials form
`part of the disclosure of the specification.
`
`FIELD OF THE INVENTION
`
`invention relates generally to database
`The present
`management, analysis and visualization tools. More
`particularly, the present invention relates to software tools
`for facilitating the managementand analysis of World Wide
`Websites and other types of database systems whichutilize
`hyperlinks to facilitate user navigation.
`
`BACKGROUND OF THE INVENTION
`
`With the increasing popularity and complexity of Internet
`and intranet applications,
`the task of managing Website
`content and maintaining Website effectiveness has become
`increasingly difficult. Company Webmasters and business
`managers are routinely faced with a wide array of burden-
`some tasks, including, for example, the identification and
`repair of large numbersof broken links(ie., links to missing
`URLs), the monitoring and organization of large volumes of
`diverse, continuously-changing Web site content, and the
`detection and management of congested links. These prob-
`lemsare particularly troublesome for companiesthat rely on
`their respective Web sites to provide mission-critical infor-
`mation and services to customers and business partners.
`Several software companies have developed software
`products which address some of these problems by gener-
`ating graphical maps of Website content and providing tools
`for navigating and managing the content displayed within
`the maps. Examples of such software tools include Web-
`Mapper™ from Netcarta Corporation and WebAnalyzer™
`from InContext Corporation. Unfortunately,
`the graphical
`site maps generated by these products tend to be difficult to
`navigate, and fail to convey muchof the information needed
`by Webmasters to effectively manage complex Websites. As
`a result, many companies continue to resort to the burden-
`some task of manually generating large, paper-based maps
`of their Web sites. In addition, many of these products are
`only capable of mapping certain types of Web pages, and do
`not provide the types of analysis tools needed by Webmas-
`ters to evaluate the performance and effectiveness of Web
`Sites.
`
`The present invention addresses these and other limita-
`tions in existing products and technologies.
`
`SUMMARYOF THE INVENTIVE FEATURES
`
`invention, a software
`In accordance with the present
`package (“Web site analysis program”) is provided which
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`includesa variety of features for facilitating the management
`and analysis of Websites. In the preferred embodiment, the
`program runs on a network-connected PC under the Win-
`dows® 95 or Windows® NToperating system, and utilizes
`the standard protocols and conventions of the World Wide
`Web (“Web”). In other embodiments, the program may be
`adapted to provide for the analysis of other types of
`hypertextual-content sites,
`including sites based on non-
`standard protocols.
`In the preferred embodiment, the program includes Web
`site scanning routines which use conventional webcrawling
`techniques to gather information about the content objects
`(HTMLdocuments, GIF files, etc.) and links of a Website
`via a network connection. Mapping routines of the program
`in-turn use this information to generate, on the computer’s
`display screen, a graphical site map that showsthe overall
`architecture (ie.,
`the structural arrangement of content
`objects and links) of the Web site. A user interface of the
`program allows the user to perform actions such as initiate
`and pause the scanning/mapping of a Website, zoom in and
`out on portions of a site map, apply contentfilters to the site
`mapto filter out content objects of specific types, and save
`and retrieve maps to/from disk. A map comparison tool
`allows the user to generate a comparison map which high-
`lights changes that have been madeto the Website since a
`previous mappingof thesite.
`In accordance with one aspect of the invention, the Web
`site analysis program implements a map generation method
`which greatly facilitates the visualization by the user of the
`overall architecture of the Web site, and allows the user to
`navigate the map in an intuitive manner to explore the
`content of the Web site. To generate the site map, a structural
`representation of the Web site (specifying the actual arrange-
`ment of content objects and links) is initially reduced, for
`purposes of generating the site map, to a hierarchical tree
`representation in which each content object of the Website
`is represented as a node of the tree. A recursive layout
`method is then applied which uses the parent-child node
`relationships, as such relationships exist within the tree, to
`spatially position the nodes (represented as respective icons
`within the map) on the display screen such that children
`nodes are positioned around and connected to their respec-
`tive immediate parents. (This layout method can also be
`used to display other types of hierarchical data structures,
`suchas the tree structure of a conventional file system.) The
`result is a map which comprises a hierarchical arrangement
`of parent-child node (icon) clusters in which parent-child
`relationships are immediately apparent.
`As part of the layout method,therelative sizes of the node
`icons are preferably adjusted such that nodes with relatively
`large numbers of outgoing links havea relatively large icon
`size, and thus stand out in the map. In addition, the node and
`link display sizes are automatically adjusted such that the
`entire map is displayed on the display screen, regardless of
`the size of the Web site. As the user zoomsin on portions of
`the map, additional details of the Web site’s content objects
`are automatically revealed within the map.
`In accordance with another aspect of the invention, the
`Website analysis program is based on an extensible archi-
`tecture that allows software components to be added that
`make extensive use of the program’s mapping functionality.
`Specifically, the architecture includes an API (application
`program interface) which includes API procedures
`(“methods”) that allow other applications (“plug-ins”) to,
`among other things, manipulate the display attributes of the
`nodes and links within a site map. Using these methods, a
`plug-in application can be added which dynamically super-
`27
`
`27
`
`
`
`6,144,962
`
`4
`a preferred software package referred to as the Astra™
`SiteManager™ Web site analysis tool (“Astra”), its screen
`displays, and various related components. In these drawings,
`reference numbers are re-used, where appropriate, to indi-
`cate a correspondence between referenced items.
`FIG. 1 is a screen display which illustrates an example
`Web site map generated by Astra, and which illustrates the
`menu, tool and filter bars of the Astra graphical user inter-
`face.
`
`3
`imposes data onto the site map by, for example, selectively
`modifying display colors of nodes and links, selectively
`hiding nodesandlinks, and/or attaching alphanumeric anno-
`tations to the nodes and links. The API also includes
`methodsfor allowing plug-in components to access Website
`data (both during and following the Web site scanning
`process) retrieved by the scanning routines, and for adding
`menu commandsto the user interface of the main program.
`In accordance with another aspect of the invention,soft-
`ware routines (preferably implemented within a plug-in
`FIGS. 2 and 3 are screen displays whichillustrate respec-
`application) are provided for processing a Web site’s server
`tive zoomed-in views of the site map of FIG. 1.
`access log file to generate Web site usage data, and for
`FIG. 4 is a screen display whichillustrates a split-screen
`displaying the usage data on a site map. This usage data may,
`display mode, wherein a graphical representation of a Web
`for example, be in the form of the numberof “hits” per link,
`site is displayed in an upper window andatextual repre-
`the number of Website exit events per node, or the navi-
`15
`sentation of the Website is displayed in a lower window.
`gation paths taken by specific users (“visitors”). This usage
`FIG. 5 is a screen display which illustrates a navigational
`data is preferably generated by processing the entries within
`aid of the Astra graphical user interface.
`the log file on a per-visitor basis to determine the probable
`FIG. 6 is a screen display illustrating a feature which
`navigation path taken by each respective visitor to the Web
`allowsa user to selectively view the outbound links of URL
`site. (Standard-format access log files which record each
`in a hierarchical display format.
`access to any page of the Website are typically maintained
`FIG. 7 is a block diagram whichillustrates the general
`by conventional Web servers.)
`In a preferred
`architecture of Astra, which is shown in the context of a
`implementation, the usage data is then superimposed onto
`client computer communicating with a Website.
`the site map (using the API methods) using different node
`FIG. 8 illustrates the object model used by Astra.
`and link display colors to represent different respective
`FIG. 9 illustrates a multi-threaded process used by Astra
`levels of user activity. Using this feature, Webmasters can
`for scanning and mapping Websites.
`readily detect common “problem areas” such as congested
`FIG. 10 illustrates the general decision process used by
`links and popular Web site exit points.
`In addition, by
`Astra to scan a URL.
`looking at individual navigation paths on a per-visitor basis,
`Webmasters can identify popular navigation paths taken by
`visitors to the site.
`
`In accordance with yet another aspect of the invention, the
`Web site analysis program includes software routines and
`associated user interface controls for automatically scanning
`and mapping dynamically-generated Web pages, such as
`Web pages generated “on-the-fly” in response to user-
`specified database queries. This feature generally involves
`the two-step process of capturing and recording a dataset
`manually entered by the user into an embedded form of a
`Web page (such as a page of a previously-mapped Website),
`and then automatically resubmitting the dataset (within the
`form) when the Web site is later re-scanned. As will be
`appreciated, this feature of the invention can also be applied
`to conventional Internet search engines.
`To effectuate the capture of one or more datasets in the
`preferred implementation,the userinitiates a capture session
`from the user interface; this causes a standard Web browser
`to be launched and temporarily configured to use the Web
`site analysis program as an HTTP-level proxy to commu-
`nicate with Websites. Thereafter, until the capture session is
`terminated bythe user, any pagesretrieved with the browser,
`and any forms (datasets) submitted from the browser, are
`automatically recorded by the Web site analysis program
`into the site map. Whenthe site map is subsequently updated
`(using an “automatic update” option of the user interface),
`the scanning routines automatically re-enter the captured
`datasets into the corresponding formsand recreate the form
`submissions. The dynamically-generated Web pages
`returned in response to these automatic form submissions
`are then added to the updated site map as respective nodes.
`A related aspect of the invention involves the associated
`method of locally capturing the output of the Web browser
`to generate a sequence that can subsequently be used to
`automatically evaluate a Website.
`BRIEF DESCRIPTION OF THE DRAWINGS
`The various features of the invention will now be
`
`described in greater detail with reference to the drawings of
`
`10
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`FIG. 11 is a block diagram which illustrates a method
`used by Astra to scan dynamically-generated Web pages.
`FIG. 12 is a flow diagram which further illustrates the
`method for scanning dynamically-generated Web pages.
`FIGS. 13-15 are a sequence of screen displays which
`further illustrate the operation of Astra’s dynamic page
`scanning feature.
`FIG. 16 is a screen display which illustrates the site map
`of FIG. 1 following the application of a filter which filters
`out all URLs (and associated links) having a status other
`than “OK.” FIG. 17 illustrates the general program sequence
`followed by Astra to generate filtered maps of the type
`shown in FIG. 16.
`
`FIG. 18 illustrates thefiltered map of FIG. 16 redisplayed
`in Astra’s Visual Web Display format.
`FIG. 19 is a screen display which illustrates an activity
`monitoring feature of Astra.
`FIG. 20 illustrates a decision process used by Astra to
`generate link activity data (of the type illustrated in FIG. 19)
`from a server accesslogfile.
`FIG. 21 is a screen display which illustrates a map
`comparison tool of Astra.
`FIG. 22 is a screen display which illustrates a link repair
`feature of Astra.
`
`FIGS. 23 and 24 are partial screen displays which illus-
`trate layout features in accordance with another embodiment
`of the invention.
`
`55
`
`The screen displays includedin the figures were generated
`from screen captures taken during the execution of the Astra
`code. In order to comply with patent office standards, the
`original screen captures have been modified to reduce shad-
`ing and to replace certain color-coded regions with appro-
`priate cross hatching. All copyrights in these screen displays
`are hereby reserved.
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`60
`
`65
`
`The description of the preferred embodimentsis arranged
`within the following sections:
`28
`
`28
`
`
`
`6,144,962
`
`5
`I. Glossary of Terms and Acronyms
`II. Overview
`
`II. Map Layout and Display Methodology
`IV. Astra Graphical User Interface
`V. Astra Software Architecture
`
`VI. Scanning Process
`VII. Scanning and Mapping of Dynamically-Generated
`Pages
`VIII. Display of Filtered Maps
`IX. Tracking and Display of Visitor Activity
`X. Map Comparison Tool
`XI. Link Repair Plug-in
`XII. Conclusion
`
`I. Glossary of Terms and Acronyms
`The following definitions and explanations provide back-
`ground information pertaining to the technical field of the
`present invention, and are intended to facilitate an under-
`standing of both the invention and the preferred embodi-
`ments thereof. Additional definitions are provided through-
`out the detailed description.
`Internet. The Internet is a collection of interconnected
`public and private computer networks that are linked
`together by a set of standard protocols (such as TCP/IP,
`HTTP, FIP and Gopher)
`to form a global, distributed
`network.
`
`Document. Generally, a collection of data that can be
`viewed using an application program, and that appears or is
`treated as a self-contained entity. Documents typically
`include control codes that specify how the documentcontent
`is displayed by the application program. An “HTMLdocu-
`ment” is a special type of document which includes HTML
`(HyperText Markup Language) codes to permit the docu-
`mentto be viewed using a Web browser program. An HTML
`document that is accessible on a World Wide Website is
`commonly referred to as a “Web document” or “Web page.”
`Web documents commonly include embedded components,
`such as GIF (Graphics Interchange Format) files, which are
`represented within the HTML codingaslinks to other URLs.
`(See “HTML”and “URL”below.)
`Hyperlink. A navigational link from one document to
`another, or from one portion (or component) of a document
`to another. Typically, a hyperlink is displayed as a high-
`lighted word or phrase that can be clicked on using the
`mouse to jump to the associated document or document
`portion.
`Hypertext System. A computer-based informational sys-
`tem in which documents (and possibly other types of data
`entities) are linked together via hyperlinks to form a user-
`navigable “web.” Although the term “text” appears within
`“hypertext,” the documents and hyperlinks of a hypertext
`system may(and typically do) include other forms of media.
`For example, a hyperlink to a sound file may be represented
`within a document by graphic image of an audio speaker.
`World Wide Web. A distributed, global hypertext system,
`based on an set of standard protocols and conventions (such
`as HTTP and HTML,discussed below), which uses the
`Internet as a transport mechanism. A software program
`which allows users to request and view World Wide Web
`(“Web”) documents is commonly referred to as a “Web
`browser,” and a program which respondsto such requests by
`returning (“serving”) Web documents is commonly referred
`to as a “Webserver.”
`
`WebSite. As used herein,“web site” refers generally to a
`database or other collection of inter-linked hypertextual
`
`10
`
`15
`
`20
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`documents (“web documents”) and associated data entities,
`which is accessible via a computer network, and which
`forms part of a larger, distributed informational system.
`Depending upon its context, the term may also refer to the
`associated hardware and/or software server components
`used to provide access to such documents. Whenused herein
`with initial capitalization (i.e., “Web site”), the term refers
`more specifically to a web site of the World Wide Web. (In
`general, a Web site corresponds to a particular Internet
`domain name, such as “merc-intcom,” and includes the
`content of or associated with a particular organization.)
`Other types of web sites may include,
`for example, a
`hypertextual database of a corporate “intranet” (i.e., an
`internal network which uses standard Internet protocols), or
`a site of a hypertext system that uses documentretrieval
`protocols other than those of the World Wide Web.
`Content Object. As used herein, a data entity (document,
`document component, etc.) that can be selectively retrieved
`from a website. In the context of the World Wide Web,
`commontypes of content objects include HTML documents,
`GIF files, sound files, video files, Java applets and aglets,
`and downloadable applications, and each object has a unique
`identifier (referred to as the “URL”) which specifies the
`location of the object (See “URL”below.)
`URL (Uniform Resource Locator). A unique address
`which fully specifies the location of a content object on the
`Internet. The general format of a URL1s protocol://machine-
`address/path/filename. (As will be apparent from the context
`in which it is used, the term “URL”is also used herein to
`refer to the corresponding content object itself.)
`Graph/Tree. In the context of database systems, the term
`“graph” (or “graph structure”) refers generally to a data
`structure that can be represented as a collection of intercon-
`nected nodes. As described below, a Web site can conve-
`niently be represented as a graph in which each node of the
`graph correspondsto a content object of the Web site, and in
`which each interconnection between two nodesrepresents a
`link within the Website. A “tree”is a specific type of graph
`structure in which exactly one path exists from a main or
`“root” node to each additional node of the structure. The
`terms “parent” and “child” are commonlyused torefer to the
`interrelationships of nodes within a tree structure (or other
`hierarchical graph structure), and the term “leaf” or “leaf
`node” is used to refer to nodes that have no children. For
`
`additional information on graph andtree data structures, see
`Alfred V. Aho et al, Data Structures and Algorithms,
`Addison-Wesley, 1982.
`TCP/IP (Transfer Control Protocol/Internet Protocol). A
`standard Internet protocol which specifies how computers
`exchange data over the Internet. TCP/IP is the lowest level
`data transfer protocol of the standard Internet protocols.
`HTML(HyperText Markup Lan