throbber
as United States
`a2) Patent Application Publication co) Pub. No.: US 2002/0178217 A1
`
` Nguyen et al. (43) Pub. Date: Nov. 28, 2002
`
`
`US 20020178217A1
`
`(54) METHOD AND APPARATUS FOR SCANNING
`A WEBSITE IN A DISTRIBUTED DATA
`PROCESSING SYSTEM FOR PROBLEM
`DETERMINATION
`
`(76)
`
`Inventors: Andrew Quoc Anh Nguyen,Austin,
`TX (US); John Joseph Edward Turek,
`South Nyack, NY (US); Menachem
`Shtalhaim, Haifa (IL)
`
`Correspondence Address:
`Duke W. Yee
`Carstens, Yee & Cahoon, L.L.P.
`P.O. Box 802334
`Dallas, TX 75380 (US)
`
`(21) Appl. No.:
`
`10/145,445
`
`(22) Tiled:
`
`May14, 2002
`
`Related U.S. Application Data
`
`(62) Division of application No. 09/315,611, filed on May
`20, 1999,
`
`Publication Classification
`
`(ST) Tt, C0 acces eeseececctceeccescene GO6F 15/16
`(52) US. Cd.ceceneeseteseenennees 709/203; 709/201
`
`(57)
`
`ABSTRACT
`
`Amethodand apparatus for identifying problems associated
`with a web site. Ascan of a website is initiated by a plurality
`of agents, wherein each of the plurality of agents are at a
`different location in the distributed data processing system.
`Results of the scan are obtained from the plurality of agents.
`The results of the scan are analyzed to determine if a
`problem is associated with the website.
`
`
`
`MAKE CONNECTION TO AGENTS
`
`SEND POLICY TO AGENTS
`
`
`
`
`
` RECEIVE
`608
`
`
`
`RECEIVE RESULTS OF SCAN
`
`STORE RESULTS
`
`ALL RESULTS OF
`SCAN?
`
`YES
`
`SERVER ANALYZE FOR DIFFERENCES
`
`
`
`IDENTIFY PROBLEMS
`
`IDENTIFY AND INITIATE ACTION
`
`
`
`
`
`
`
`
`600
`
`602
`
`604
`
`606
`
`
`
`
`610
`
`612
`
`614
`
`Data Co Exhibit 1029
`Data Co Exhibit 1029
`Data Co v. Bright Data
`Data Cov. Bright Data
`
`

`

`Patent Application Publication Nov. 28, 2002 Sheet 1 of 4
`
`US 2002/0178217 Al
`
`
`
`
`
`
`
`
`~CUENT
`
`202~]
`
`processor
`
`PROCESSOR
`
`SSBUS
`908 ae
`eycey aDCE
`
`SERVER
`
`a
`
`LOCAL
`MEMORY
`
`| PCI BUS
`
`BRIDGE
`
`PCI BUS
`
`216
`
`
`1/0 NETWORK|-220218
`
`
`ye MODEM||ADAPTER
`GRAPHICS
`ADAPTER
`
`222
`
`HARD
`DISK
`
`PIG. 2
`
`PCI BUS
`L+—___| PCI BUS
`BRIOGE. PT
`
`+ PCI BUS
`BRIDGE
`
`PCL BUS
`
`224
`
`098
`
`

`

`US 2002/0178217 Al
`
`Y3ldVaV
`
`Patent Application Publication Nov. 28, 2002 Sheet 2 of 4
`
`neeeeee waldvay||YON3001
`O1GNV
`
`Y3LdVaVY3aldvaySN
`
`gicpage|AO"koaSSohcgyLKz9¢.KayYOssz00Nd
`
`6L¢BLevieOLS!92¢
`JISNOWiANOWaN||WAGON}|Gyauyogiay00!!1I
`I|yeeAAS02¢!!1|8ceAdvI
`140)BOSIN3N9
`
` pomme+1l1IGSOla;OffWos-co
`
`O3GIA/
`SOHdVaD
`
`Y3aldVvav
`
`NOISNVdX4
`
`SN
`
`JOVANSINI
`
`YaldVvdv
`
`WIrans
`
`_-----------4
`
`ASIC
`
`LSOHISOS
`
`IiIi
`
`! {i {LL
`
`00¢
`
`
`
`

`

`Patent Application Publication Nov. 28, 2002 Sheet 3 of 4
`
`US 2002/0178217 Al
`
`400
`
`200
`
`
`MAXIMUM NUMBER OF FILES
`
`404
`
`406
`
`408
`
`410
`
` 402
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG. 5
`
`
`
`
`
`
`
` 700
`
` 7102
`
` 704
`
` 706
`
`FIG. 7
`
`FIG. 4
`
`MAKE CONNECTION TO AGENTS
`
`SEND POLICY TO AGENTS
`
`RECEIVE RESULTS OF SCAN
`
`STORE RESULTS
`
`YES
`
`608
`
` RECEIVE
`ALL RESULTS OF
`
`SCAN?
`
`SERVER ANALYZE FOR DIFFERENCES
`
`
`IDENTIFY PROBLEMS
`
`
`IDENTIFY AND INITIATE ACTION
`
`
`649
`
`612
`
`Lg 44
`
`FIC. 6
`
`

`

`Patent Application Publication Nov. 28, 2002 Sheet 4 of 4
`
`US 2002/0178217 Al
`
`800
`
`YES
`CLASSIFICATION
`OF PROBLEM
`
`802
`
`DIFFERENCES NORMAL
`RESULT AND
`ABNORMAL RESULTS
`
`ANALYZE DNS
`RESOLUTION
`
`LOOK AT POLICY FOR
`LINK DEPTH AND
`NUMBER OF DOCUMENTS
`
`
`|
`
`
`
`
`YES
`
`
`
` NUMBER OF
`
`DOCUMENTS EQUAL TO
`POLICY?
`
`1104
`
`
`
`
`
`NUMBER
`OF LINKS EQUAL
`TO POLICY?
`
`NO
`
`LOOK AT NEXT URL
`
`
`
` 804
`
`
`
`
`FIG. 8
`
`1102
`
`
`
`
`SET OF ATTRIBUTES
`
`STATISTICAL ANALYSIS
`ON DATASTREAMS
`
`900
`
`902
`
`NO
`
`UNEXPECTED
`VARIATION FROM DIFFERENT
`AGENTS?
`
`CLASSIFICATION OF COMMON
`
`FIG. 9
`
`

`

`US 2002/0178217 Al
`
`Nov. 28, 2002
`
`METHOD AND APPARATUS FOR SCANNING A
`WEBSITE IN A DISTRIBUTED DATA
`PROCESSING SYSTEM FOR PROBLEM
`DETERMINATION
`
`BACKGROUND OF THE INVENTION
`
`[0001]
`
`1. Technical Field
`
`invention relates generally to an
`[0002] The present
`improved distributed data processing system andin particu-
`lar to a method and apparatus for identifying problems
`within a distributed data processing system. Still more
`particularly,
`the present invention provides a method and
`apparatus for scanning computers within a distributed data
`processing system for problem determination.
`
`[0003]
`
`2. Description of Related Art
`
`[0004] The Internet, also referred to as an “internetwork”,
`is a set of computer networks, possibly dissimilar, joined
`together by means of gatewaysthat handle data transfer and
`the conversion of messages from the sending networkto the
`protocols used by the receiving network (with packets if
`necessary). When capitalized, the term “Internet” refers to
`the collection of networks and gatewaysthat use the TCP/IP
`suite of protocols.
`
`{0005] The Internet has become a cultural fixture as a
`source of both information and entertainment. Many busi-
`nesses are creating Internet sites as an integral part of their
`marketing efforts, informing consumers of the products or
`services offered by the business or providing other informa-
`tion seeking to engender brand loyalty. Many federal, state,
`and local government agencies are also employing Internet
`sites for informational purposes,particularly agencies which
`must interact with virtually all segments of society such as
`the Internal Revenue Service and secretaries of state. Pro-
`viding informational guides and/or searchable databases of
`online public records may reduce operating costs. ['urther,
`the Internet is becoming increasingly popular as a medium
`for commercial transactions.
`
`[0006] Currently, the most commonly employed method
`of transferring data over the Internet is to employ the World
`Wide Web environment,also called simply “the Web”. Other
`Internet resources exist for transferring information, such as
`Tile Transfer Protocol (’'TP) and Gopher, but have not
`achieved the popularity of the Web. In the Web environment,
`servers and clients effect data transaction using the Hyper-
`text Transfer Protocol (HTTP), a known protocol for han-
`dling the transfer of variousdatafiles(e.g., text, still graphic
`images, audio, motion video,etc.). Information is formatted
`for presentation to a user by a standard page description
`language,
`the Hypertext Markup Language (HTML). In
`addition to basic presentation formatting, HTML allows
`developers to specify “links” to other Web resources iden-
`tified by a Uniform Resource Locator (URL). A URLis a
`special syntax identifier defining a communications path to
`specific information. Each logical block of information
`accessible to a client, called a “page” or a “Web page”, is
`identified by a URL. The URLprovides a universal, con-
`sistent method for finding and accessing this information,
`not necessarily for the user, but mostly for the user’s Web
`“browser”. A browser is a program capable of submitting a
`request for information identified by a URLat the client
`machine. Retrieval of information on the Web is generally
`
`that
`accomplished with an HIML-compatible browser
`browses websites. A web site is a group of related HTML
`documents and associatedfiles, scripts, and databasesthatis
`served up by an HTTP server on the World Wide Web. The
`HTMLdocumentsin a web site generally cover one or more
`related topics and are interconnected through hyperlinks.
`Most websites have a home pageastheir starting point,
`which frequently functionsas a table of contents for the site.
`Manylarge organizations, such as corporations, will have
`one or more HTTP servers dedicated to a single website.
`However, an HTTPserver can also serve several small web
`sites, such as those owned by individuals.
`
`[0007] The Internet also is widely used to transfer appli-
`cations to users using browsers. With respect to commerce
`on the Web, individual consumers and business use the Web
`to purchase various goods andservices. In offering goods
`and services, some companies offer goods and services
`solely on the Web while others use the Web to extend their
`reach.
`
`[0008] Users exploring the Web have discovered that the
`content supported by HTML document format on the Web
`wastoo limited. Users desire an ability to access applica-
`tions and programs, but applications were targeted towards
`specific types of platforms. As a result, not everyone could
`access applications or programs. This deficiency has been
`minimized though the introduction and use of programs
`knownas “applets”, which may be embedded as objects in
`HTML documents on the Web. Applets are Java programs
`that may be transparently downloaded into a browser sup-
`porting Java along with HTMLpagesin which they appear.
`These Java programsare network and platform independent.
`Applets run the same way regardless of where they originate
`or what data processing system onto which they are loaded.
`
`Java™is an object oriented programming language
`[0009]
`and environment focusing on defining data as objects and
`the methods that may be applied to those objects. Java
`provides a mechanism to distribute software and extends the
`capabilitics of a web browscr because programmers can
`write an applet once and the applet can be run on any Java
`enabled machine on the Web.
`
`[0010] With these features on the Internet and especially
`on the Web, E-commerce activities are becoming more and
`more important to various companies. Extended enterprises
`are becoming more common in which an extended enter-
`prise is made up of customers, suppliers, distributors, and
`other business partners with whom a company conducts
`online business. With this increased e-commerce activity on
`distributed data processing systems, such as the Internet, it
`is important to ensure that web resources are available and
`to enable applications and information be distributed and
`maintained across the extended enterprise.
`
`the resources, systems, net-
`[0011] Companies control
`works, and applications within their own enterprises. Busi-
`ness practices, however, have changed. Enterprises are
`increasing becoming extendedenterprises. The advent ofthe
`Internet has enabled companies to open their e-commerce
`“doors” to allow customer, suppliers, and distributors to
`sharecritical information online, in order to moreefficiently
`conduct business with a wider range of partners. As a
`consequence, conducting business on the Internet meansthat
`companies must rely on a myriad of relationships with not
`only their trading partners, but also upon multiple Internet
`
`

`

`US 2002/0178217 Al
`
`Nov. 28, 2002
`
`service providers (ISP) to conduct businesstransactions. It
`is important to identify problems with particular servers or
`web sites located on servers to ensure that e-commerce
`activities can be conducted without
`failure or without
`
`[0025] FIG. 11 isa flowchart of a process employed by an
`agent to scan a web site in accordance with a preferred
`embodiment of the present invention.
`
`delays.
`
`it would advantageous to have an
`{0012] Therefore,
`improved method and apparatus for identifying problems on
`servers and websites in a distributed data processing system.
`
`SUMMARY OF THE INVENTION
`
`[0013] A method and apparatus for identifying problems
`associated with a web site. A scan of a website is initiated
`by a plurality of agents, wherein each of the plurality of
`agents are at a different location in the distributed data
`processing system. Results of the scan are obtained from the
`plurality of agents. The results of the scan are analyzed to
`determine if a problem is associated with the website.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`‘The novel features believed characteristic of the
`[0014]
`invention are set forth in the appended claims. The invention
`itself, however, as well as a preferred mode of use, further
`objectives and advantages thereof, will best be understood
`by reference to the following detailed description of an
`illustrative embodiment when read in conjunction with the
`accompanying drawings, wherein:
`
`[0015] FIG. 1 depicts a pictorial representation of a dis-
`tributed data processing system in which the present inven-
`tion may be implemented;
`
`[0016] FIG. 2 is a block diagram depicting a data pro-
`cessing system, which may be implemented as a server, in
`accordance with a preferred embodiment of the present
`invention;
`
`{0017] FIG. 3 is a block diagram illustrating a data
`processing system in which the present invention may be
`implemented;
`
`(0018] FIG.4 is an illustration of a scan policy in accor-
`dance with a preferred embodimentofthe present invention;
`
`[0019] FIG. 5 is a diagram illustrating a data structure
`returning results from a scan, one per URL,in accordance
`with a preferred embodimentof the present invention;
`
`[0020] FIG.6 is a flowchart of a process used by a server
`to initiate a scan of a web site or other target server in
`accordance with a preferred embodiment of the present
`invention;
`
`{0021] FIG. 7 is a high-level flowchart of a process
`employed by an agent to scan a target web site for a server
`in accordance with a preferred embodiment of the present
`invention;
`
`[0022] FIG. 8 is a flowchart of a process for analyzing
`problems on web sites in accordance with a preferred
`embodiment of the present invention;
`
`[0023] FIG. 9 is a flowchart of a process for statistical
`analysis of a data stream in accordance with a preferred
`embodiment of the present invention;
`
`[0024] FIG. 10 is a flowchart of a process for analyzing
`DNSperformance in accordance with a preferred embodi-
`ment of the present invention; and
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`[0026] The present invention provides a method, appara-
`tus, and instructions for conducting a scan of a server or web
`site from at least two different locations so that the results
`can be correlated and the network impacts on the scan can
`be taken into account. When a problem is detected on one
`scan but not on the other scan, heuristics may be employed
`to determine whether the problem is a network effect or a
`real problem. When scansare performed from a fair number
`of locations, a majority rules policy may be implemented in
`making determinations and coming to conclusions. Alterna-
`lively, a discrepancy in scan results may indicate thal a
`second more thorough scan should be initiated or that scans
`from other specific network locations would be helpful in
`problem determination. Additionally, multiple scans over
`time may be usedto create a historical database to identity
`intermittent problems,originating from one or more network
`locations.
`
`[0027] With reference now to the figures, FIG. 1 depicts
`a pictorial representation of a distributed data processing
`system in which the present invention may be implemented.
`Distributed data processing system 100 is a network of
`computers in which the present invention may be imple-
`mented. Distributed data processing system 100 contains a
`network 102, which is the medium used to provide commu-
`nications links between various devices and computers
`connected together within distributed data processing sys-
`tem 100. Network 102 may include permanent connections,
`such as wire or fiber optic cables, or temporary connections
`made through telephone connections.
`
`Inthe depicted example, a server 104 is connected
`[0028]
`to network 102 along with storage unit 106. In addition,
`clients 108, 110, and 112 also are connected to a network
`102. These clients 108, 110, and 112 may be, for example,
`personal computers or network computers. For purposes of
`this application, a network computer is any computer,
`coupled to a network, which receives a program or other
`application from another computer coupled to the network.
`In the depicted example, server 104 provides data, such as
`boot files, operating system images, and applications to
`clients 108-112. Clients 108, 110, and 112 are clients to
`server 104. Scrver 104 may be a scrver or computer that is
`used to initiate scans of a target resource using clients 108,
`110, and 112. These clients are also referred to as “agents”
`when used to initiate scans of a target resource. In this
`example, server 114 also is connected to network 102 and
`may be a target resource that is scanned by clients 108, 110,
`and 112 acting as agents. Server 114 as illustrated is a web
`server, also referred to as a HTTP server,is a server software
`that uses HTTP to serve up HTML documents and any
`associated files and scripts when requested by a client, such
`as a web browser. The connection between client and server
`
`is usually broken after the requested documentorfile has
`been served. HTTP servers are used on Web and Intranet
`
`sites. A target resource that is scanned by agents mayinclude
`other resources in addition to a web server. For example,
`without limitation, a target resource may be a website, a file
`transfer protocol (FTP) site, or a domain name system
`
`

`

`US 2002/0178217 Al
`
`Nov. 28, 2002
`
`(DNS) server. The results from these scans may be returned
`to server 104 for analysis or alternatively to another server
`or other computer. In other words, the computer initiating
`the scan is not necessarily the computerthat will perform the
`analysis of the results returned by the agents.
`[0029] Distributed data processing system 100 may
`include additional servers, clients, and other devices not
`shown. In the depicted example, distributed data processing
`system 100 is the Internet with network 102 representing a
`worldwide collection of networks and gatewaysthat use the
`TCP/IP suite of protocols to communicate with one another.
`At the heart of the Internet is a backboneof high-speed data
`communication lines between major nodes or host comput-
`ers, consisting of thousands of commercial, government,
`educational and other computer systems that route data and
`messages. Of course, distributed data processing system 100
`also may be implemented as a numberofdifferent types of
`networks, such as for example, an intranet, a local area
`network (LAN), or a wide area network (WAN). FIG. 1 is
`intended as an example, and not as an architectural limita-
`tion for the present invention.
`[0030] Referring to FIG. 2, a block diagram depicts a data
`processing system, which may be implemented asa server,
`such as server 104 in FIG.1, in accordance with a preferred
`embodiment of the present invention. This data processing
`system is an example of a computer that may be used to
`initiate scans of a resource by two or more agents and to
`analyze results returned by the agents scanning the resource.
`Morespecifically, data processing system 200 is used to
`collect and analyze data collected from scans of selected
`resources, such as web sites, by two or more agents at
`different locations in the network. Server 104 is able to
`
`identify problemsassociated with resources by analyzing the
`data from the scans. For example, broken links, slow links,
`slow response times, and authorization failures are examples
`of problemsthat may be identified through scans. Although
`in this example, the computer used to initiate the scan and
`analyze the results is a server, other types of data processing
`systems also may be used to initiate and/or analyze results.
`[0031] Data processing system 200 may be a symmetric
`multiprocessor (SMP) system including a plurality of pro-
`cessors 202 and 204 connected to system bus 206. Alterna-
`tively, a single processor system may be employed. Also
`connected to system bus 206 is memory controller/cache
`208, which provides an interface to local memory 209. I/O
`bus bridge 210 is connected to system bus 206 and provides
`an interface to I/O bus 212. Memory controller/cache 208
`and I/O busbridge 210 maybe integrated as depicted.
`[0032] Peripheral component
`interconnect
`(PCI) bus
`bridge 214 connected to I/O bus 212 provides an interface to
`PCI local bus 216. Typical PCI bus implementations will
`support four PCI expansion slots or add-in connectors.
`Communications links to network computers 108-112 in
`FIG. 1 maybe provided through modem 218 and network
`adapter 220 connected to PCI local bus 216 through add-in
`boards.
`
`[0033] Additional PCI bus bridges 222 and 224 provide
`interfaces for additional PCI buses 226 and 228, from which
`additional modemsor network adapters may be supported.
`In this manner, server 200 allows connections to multiple
`network computers. A memory-mapped graphics adapter
`230 and hard disk 232 may also be connected to I/O bus 212
`as depicted either directly or indirectly.
`
`[0034] Those of ordinary skill in the art will appreciate
`that the hardware depicted in FIG. 2 mayvary. For example,
`other peripheral devices, such as optical disk drives and the
`like, also may be used in addition to or in place of the
`hardware depicted. The depicted example is not meant to
`imply architectural limitations with respect to the present
`invention.
`
`[0035] The data processing system depicted in FIG. 2 may
`be, for example, an IBM RISC/System 6000 system, a
`product of International Business Machines Corporation in
`Armonk, N.Y., running the Advanced Interactive Executive
`(AIX) operating system.
`[0036] With reference now to FIG. 3, a block diagram
`illustrates a data processing system in which the present
`invention may be implemented. Data processing system 300
`is an example of a client computer, which may be used as an
`agent to perform scans on a target resource. In this example,
`data processing system 300 may be used to scan a target
`resource by performing various tests or sending various
`requests to the target resource. For example, data processing
`system 300 may be used to access a website, traverse
`various links located within the web sitc, and retricve
`documents or other resources from the web site. The data
`
`and statistics gathered from a scan are returned to a server
`or other computer for analysis.
`[0037] Data processing system 300 employs a peripheral
`component
`interconnect
`(PCI)
`local bus
`architecture.
`Although the depicted example employs a PCI bus, other
`bus architectures such as Micro Channel and ISA may be
`used. Processor 302 and main memory 304 are connected to
`PCI local bus 306 through PCI bridge 308. PCI bridge 308
`also may include an integrated memorycontroller and cache
`memory for processor 302. Additional connections to PCI
`local bus 306 may be made through direct component
`interconnection or through add-in boards. In the depicted
`example, local area network (LAN)adapter 310, SCSI host
`bus adapter 312, and expansion businterface 314 are con-
`nected to PCI local bus 306 by direct component connection.
`In contrast, audio adapter 316, graphics adapter 318, and
`audio/video adapter 319 are connected to PCI local bus 306
`by add-in boards inserted into expansion slots. Expansion
`bus interface 314 provides a connection for a keyboard and
`mouse adapter 320, modem 322, and additional memory
`324. SCSI host bus adapter 312 provides a connection for
`hard disk drive 326, tape drive 328, and CD-ROMdrive 330.
`Typical PCI local bus implementations will support three or
`four PCI expansion slots or add-in connectors.
`
`[0038] An operating system runs on processor 302 and is
`used to coordinate and provide control of various compo-
`nents within data processing system 300 in FIG. 3. The
`operating system may be a commercially available operating
`system such as OS/2, which is available from International
`Business Machines Corporation. “OS/2” is a trademark of
`International Business Machines Corporation. An object
`oriented programming system such as Java may run in
`conjunction with the operating system and providescalls to
`the operating system from Java programs or applications
`executing on data processing system 300. “Java”is a trade-
`mark of Sun Microsystems, Inc. Instructions for the oper-
`ating system,
`the object-oriented operating system, and
`applications or programs are located on storage devices,
`such as hard disk drive 326, and may be loaded into main
`memory 304 for execution by processor 302.
`
`

`

`US 2002/0178217 Al
`
`Nov. 28, 2002
`
`[0039] Those of ordinary skill in the art will appreciate
`that the hardware in FIG. 3 may vary depending on the
`implementation. Other
`imternal hardware or peripheral
`devices, such as flash ROM (or equivalent nonvolatile
`memory) oroptical disk drives and the like, may be used in
`addition to or in place of the hardware depicted in FIG.3.
`Also, the processes of the present invention may be applied
`to a multiprocessor data processing system.
`
`if
`[0040] For cxamplc, data processing system 300,
`optionally configured as a network computer, may not
`include SCSI host bus adapter 312, hard disk drive 326, tape
`drive 328, and CD-ROM 330,as noted by dotted line 332 in
`FIG. 3 denoting optional inclusion. In that casc, the com-
`puter, to be properly called a client computer, must include
`some type of network communication interface, such as
`LAN adapter 310, modem 322, or the like. As another
`example, data processing system 300 may be a stand-alone
`system configured to be bootable without relying on some
`type of network communication interface, whether or not
`data processing system 300 comprises some type of network
`communication interface. As a further example, data pro-
`cessing system 300 may be a Personal Digital Assistant
`(PDA)device which is configured with ROM and/or flash
`ROM in order to provide non-volatile memory for storing
`operating system files and/or user-generated data.
`
`[0041] The depicted example in FIG. 3 and above-de-
`scribed examples are not meant to implyarchitectural limi-
`tations.
`
`[0042] With reference nowto FIG.4, anillustration of a
`scan policy is depicted in accordance with a preferred
`cmbodiment of the present invention. Scan policy 400 is a
`data structure sent to an agent to initiate a scan of a resource.
`Scan policy 400 identifies the type of scan that an agent is
`to make. For example, scan policy 400 includestarget field
`402 in which scan policy 400 contains a URLofthe target
`that is to be scanned. Amaximum numberoffiles field 404,
`whichidentifies how mayfiles should be retrieved from the
`resource. These files may be for example, web pages,
`programs, vidco files, and audio files. Maximum numberof
`levels field 406 identifies the maximum numberoflevels in
`the web site to scan. More specifically, this field identifies
`the number of levels within a web site that should be
`accessed by an agent.
`
`[0043] Additionally, a maximum timeout may be set in
`scan policy 400 through maxtime outfield 408 to terminate
`the scan if too much time is being taken. This is especially
`useful in the instance in whicha file or level does notresult
`in a response.In addition, scheduling information also may
`be present within the scan policy. The information may be
`specified in schedule field 410 to indicate that a scan should
`occur at a set time. The time may be immediately, periodi-
`cally, or at a number of times pre-sct within this ficld. In
`addition,start dates maybe identified within the scan policy.
`Further, the server to whom the scanresult is to be returned
`also may be placed within the scan policy. Of course, scan
`policy 400 may include other ficlds in addition to or in place
`of those illustrated. For example, a return field identifying
`the computer to which the results should be returned may be
`specified in scan policy 400. This field may be for example,
`a DNS nameoran IP address.
`
`[0044] Turning next to FIG. 5, a diagram illustrating a
`data structure returning results from a scan is depicted in
`accordance with a preferred embodiment of the present
`invention. Results 500 is a data structure, which is returned
`by an agent for analysis. Results 500 includes a URLfield
`
`In
`502, which contains the URL scanned by the agent.
`addition, a file size field 504 is used to indicate the size of
`the file retrieved from the URL. Download timefield 506 is
`used to identify the amount of time needed to download the
`file. Type of file field 508 is used to identify the file type,
`such as whether the file is an executable file, a text file, a
`graphics file, or an audiofile.
`
`[0045] Results 500 also includes a DNSresolution time
`ficld 510, which idcntifics the amount of time taken to
`resolve the DNS name whenthe agent accessed the website.
`TCP/IP header field 512 identifies the TCP/IP header. Some
`information is extracted from the TCP and IP headers of
`cach responsefrom the server, and is saved in TCP/IP header
`field 512. The IP header information saved includes:
`the
`time-to-live (TTL) value;
`the source IP address; and the
`destination IP address. The TCP header information saved
`includes: the source port number; and the destination port
`number. Typeofservice field 514 indicates the TCP/IP based
`service.
`It contains the TCP/IP port number of service.
`Connection type field 518 provides information as to the
`connection used by the agent
`to scan the web site. Vor
`example, this field may include the type of connection along
`with the speed of the connection (e.g., Ethernet, 64 k).
`
`Theillustrated fields in results 500 are not intended
`[0046]
`as limitations as to the type of information that may be
`returned by an agent for analysis. ‘or example, other infor-
`mation, such as network roundtrip time, HTTP errors, DNS
`errors, and modem failures may be recorded by an agent and
`returned in results 500.
`
`[0047] With reference now to FIG. 6, a flowchart of a
`process uscd by a serverto initiate a scan of website or other
`target server is depicted in accordance with a preferred
`embodimentof the present invention. The process begins by
`making a connection to the selected agent (step 600). In this
`step,
`the server establishes a communications link with
`agent computersthatare to initiate scans. These connections
`may be made using TCP/IP, but could be through any
`communications protocol. After making connection to the
`sclected agent computers, then a scan policy, such as scan
`policy 400 in FIG. 4,
`is sent to the agent (step 602). In
`accordance with a preferred embodiment of the present
`invention, the scan may be made from agents from as many
`different locations as possible. The numberof agents that are
`sent a scan policy are at
`least
`two and may be more
`depending on the numberof available agents and on the type
`of data to be returned from the scan.
`
`[0048] Thereafter, results from a scan are received (step
`604), and the results are stored (step 606). A determination
`is then made as to whetherall of the results from scans made
`by the agents have been received (step 608). If all of the
`results have not been received, the process returns to step
`604. Thereafter, the results arc analyzed to identify differ-
`ences between the various scans(step 610). Then, problems,
`if any, are identified (step 612). An action is then identified
`and initiated (step 614) with the process terminating there-
`after. This action may take a numberofdifferent forms from
`providing an error message to an administrator initiating a
`corrective action. In addition, the action taken may be to
`send a policy to one or more agents to perform a scan to
`collect additional ordifferent information from the resource
`identified as having a problem. This additional information
`may be used to further identify the problem or to identify
`corrective action needed to resolve the problem.
`
`[0049] With reference now to FIG. 7, a high-level flow-
`chart of a process employed byan agent to scan a target web
`
`

`

`US 2002/0178217 Al
`
`Nov. 28, 2002
`
`tion thresholds from either the mean or the median of a type
`of data. Transactions with values for that data type, which
`fall outside of those thresholds, will then be considered
`anomalous, and will be further analyzed. This identification
`of whethera variation is significant is typically determined
`by using statistical analysis of the variation and comparing
`the variation with historical data. Alternatively, the variation
`may be compared to a threshold, which also may be based
`on historical data. A significant variation may include a
`broken link in which no communicationorresult is returned
`for a scan from an agent. Alternatively, a significant varia-
`tion may occurif a download from one agent takes one half
`of a second while a download from another agent takes
`forty-five seconds.
`
`[0054] Thereafter, a classification of the results is made
`using a commonsetof attributes that have been identified for
`different situations (step 906) with the process terminating
`thereafter. This classification uses attributes that have been
`identified for different problems, suchasa failed link, heavy
`traffic, access rights failures, and server malfunctions. Based
`on what attributes are matched to the scans with significant
`or unexpected variations, the problem and possible correct
`action may be taken.
`
`[0055] With reference again to step 904, if the variation is
`not significant,
`the process also terminates. The process
`again terminates if a variation from different agents is not
`present in step 902.
`
`[0056] DNS resolution tables are an example of a target
`resource that may be scanned byagents located at different
`locations in a network. A DNSserver keeps a database, e.g.,
`a DNSresolution table, of host computers and the corre-
`sponding IP addresses. When presented with a name, such as
`IBM.Com,for example, th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket