`Thomas
`
`US006401118B1
`(10) Patent N0.:
`US 6,401,118 B1
`(45) Date of Patent:
`Jun. 4, 2002
`
`(54) METHOD AND COMPUTER PROGRAM
`PRODUCT FOR AN ONLINE MONITORING
`SEARCH ENGINE
`
`-
`,
`(75) Inventor‘ Jason B‘ Thomas’ Arhngton’ VA(US)
`
`.
`.
`.
`.
`.
`(73) Asslgnee: Onhne Momtonng semces’
`Alexandna> VA(US)
`
`(*) Notice?
`
`subjectto any diSC1aiII1@r>_th@ term Ofthis
`patent 1s extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/133,374
`(22) Filed,
`Aug 13 1998
`’
`Related US Application Data
`(60) Provisional application No. 60/091,164, ?led on Jun. 30,
`1998'
`(51) Int. c1.7 ......................... .. G06F 11/30; G06F 17/00
`(52) us. Cl. ..................... .. 709/224- 709/203- 709/217-
`709/219. 70’9/226. 70’7/5. 707/4;
`’
`’709/22’3 222
`(58) Field of Search
`709/225 226 203 217 219 707/5 4
`’
`’
`’
`’
`’
`’
`’
`3, 2, 1
`
`(56)
`
`References Cited
`
`US. PATENT DOCUMENTS
`
`*
`
`.
`
`*
`
`"""""""""" "
`' 7075
`5,864,845 A * 1/1999 Voorhees et al.
`____ __ 707/5
`5,864,846 A * 1/1999 Voorhees et a1_
`707/501
`5,873,107 A * 2/1999 Borovoy et a1.
`. . . . .. 707/3
`5,913,208 A * 6/1999 Brown et al. . . . . . . . .
`-- 707/10
`5,913,215 A * 6/1999 Rubinstein et a1-
`707/5
`59207859 A * 7/1999 Li ~~~~~~~~ ~~~~~~~~ ~~
`707/5
`5,924,090 A * 7/1999 Krellenstem """"" "
`5’933’822 A * 8/1999 Braden-Harder et a1‘ "" " 707/5
`5,983,216 A * 11/1999 Kirsch et al. ................ .. 707/2
`5,987,446 A * 11/1999 Corey et al.
`707/3
`5,991,751 A * 11/1999 Rivette et al. ............... .. 707/1
`
`5,995,961 A * 11/1999 Levy et a1. .................. .. 707/4
`6,006,217 A * 12/1999 Lumsden ..................... .. 707/2
`6,009,422 A * 12/1999 Ciccarelli
`707/4
`6,009,459 A * 12/1999 Bel?ore et al. ..
`. 709/203
`6,018,733 A * 1/2000 Kirsch et al.
`...... .. 707/3
`6,092,182 A * 2/2000 Nehab et al. ............. .. 707/523
`
`6,041,326 A * 3/2000 Am t
`l. ................ .. 707/10
`6,078,914 A * 6/2000 Redrferil ............... .. 707/3
`6,078,917 A * 6/2000 Paulsen, Jr. et al.
`707/6
`6,094,649 A * 7/2000 B
`t
`l.
`.... ..
`707/3
`6,098,065 A * 8/2000
`Z121. ....... ..
`707/3
`6,102,969 A * 8/2000 Christianson et al. ..
`717/8
`6,112,202 A * 8/2000 Kleinberg .................... .. 707/5
`
`.
`
`*
`
`.
`
`‘med by exammer
`Primary Examiner—Glenton B. Burgess
`Assistant Examiner—Abdullah E. Salad
`(74) Attorney, Agent, or Firm—Piper Marbury Rudnick &
`Wolfe’ LLP; Steven B‘ Kelber
`(57)
`ABSTRACT
`_
`_
`_
`_
`_
`_
`_
`An Onhne momtonng Search engme- The memo“ 1S a
`system, method and computer program product that allows
`an organization, company, or the like to monitor the Internet
`(or any computer network) for violations of their intellectual
`property (e.g., patent, trademark or copyright infringement),
`.
`.
`.
`.
`or momtor how persons on the Internet view their busmess,
`products and/or services. The system includes a Web server
`for receiving search requests and criteria from users on a
`Web client and a server for searching the Internet for URL’s
`that contain contents matching the search criteria, thereby
`compiling a list of offending URL’s. The system also
`includes a ?le system for storing contents from each of the
`offending URL’s and a relational database for allowing the
`server to perform queries of the content in order to produce
`a report. The method involves receiving search criteria from
`a user, searching the Internet, downloading offending
`contents, and then archiving and scoring the contents. The
`method also obtains contact information for each registrant
`of the offending URL’s and produces a report for the user.
`
`18 Claims, 12 Drawing Sheets
`
`306
`
`CONSTRUCT 2
`NEW SEARCH
`TERMS W'TH
`UNUSED RELATED
`TOPIC KEYWORDS
`
`CCT>§L§BITQHS
`PRELIMINARY
`LIST OF URLS
`
`520
`
`524
`
`QUERY WITH
`2 NEW TERMS
`
`Plaid Technologies Inc.
`Exhibit 1009
`
`Ex. 1009 Page 1
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 1 0f 12
`
`US 6,401,118 B1
`
`
`
`_ m9; Smog
`
`{NIJIIIL
`
`_ _
`
`_ _ _
`
`_ \ \ \ "N: @2 NS
`“ w_n__ m5
`
`55 3: _ >5 m2 2: 2:
`
`
`
`w: + | ~m>~mw mm; 55% 5E
`
`_ \ f
`
`mow?
`
`F .GE
`
`S: _ wmwIom/mw _
`TWEMEI _
`
`_
`
`_ S:
`
`Ex. 1009 Page 2
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 2 0f 12
`
`US 6,401,118 B1
`
`200
`,g
`
`100
`/
`
`IPIS
`(BACK END)
`C++
`
`102
`
`ODBC
`
`RELATIONAL
`DATABASE
`PRODUCT
`
`NATIVE WINDOWS
`
`FILE SYSTEM
`
`104
`\
`
`PHYSICAL FILE
`SYSTEM DISK
`
`NATIVE WINDOWS
`
`HTTP
`
`108
`/
`
`202
`/
`
`WEB CLIENTS
`‘(?x-Egg;
`-ACT|VE SERVER PAGES HTTP ‘JAVA SCR'PT
`_JAVA SCRIPT
`-DYNAM|C HTML
`-VB SCRIPT
`‘JAVA
`
`FIG. 2
`
`Ex. 1009 Page 3
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 3 0f 12
`
`US 6,401,118 B1
`
`300
`j
`
`( START P302
`
`V
`
`DEFINE SEARCH CRITERIA ~304
`
`V
`
`~306
`GET PROBABLE URL’S SEARCH 1
`
`DOWNLOAD PAGES
`
`~308
`
`V
`
`~310
`SCORE PAGES % r312
`
`I
`
`FULL ARCHIVE
`
`~ 314
`GROUP PAGES <—I
`
`IT ------ T T Tl ------- T T 1/ \ ' 316
`
`'
`
`PRIORITIZE SITES
`
`GET CONTACT INFORMATION N318
`
`V
`
`GENERATE REPORT
`
`~320
`
`IT TTTTTT T T Tl TTTTTTT T T 1
`
`:
`
`CLIENT ACTION
`
`r \ -322
`
`Ex. 1009 Page 4
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 4 0f 12
`
`US 6,401,118 B1
`
`IPIS
`
`/106
`
`A QUEUE THREAD (Q 402
`
`URL THREAD
`
`»-~/ 404
`
`DATABASE THREAD ~/ 406
`
`ARCHIVE THREAD ~/ 408
`
`CONTACT THREAD ~/ 410
`
`FIG. 4
`
`Ex. 1009 Page 5
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 5 0f 12
`
`US 6,401,118 B1
`
`SELECT SEARCH ENGINE(S) ~504
`
`/
`
`I
`T RANSLATE SEARCH CRITERIA N506
`INTO KEYWORDS
`I
`IDENTIFY MAIN TOPIC
`KEYWORD
`I
`IDENTIFY SET OF RELATED N510
`TOPIC KEYWORDS
`I
`QUERY FOR {TOPIC}
`
`~512
`
`N508
`
`514
`
`YES
`
`HITS
`s ENGINE
`LIMIT?
`
`NUSED RELATED
`TOPIC KEYWORDS
`2
`
`NO
`
`518
`/_/
`?gvvsggggglf
`TERMS W'TH
`UNUSED RELATED
`TOPIC KEYWORDS
`
`522
`COLLECT HITS
`PECELIOIEIIISY
`LIST OF URLs
`
`520 1
`
`v
`QUERY WITH
`2 NEW TERMS
`
`524
`
`V
`
`END
`
`FIG. 5
`
`Ex. 1009 Page 6
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 6 0f 12
`
`US 6,401,118 B1
`
`A com
`
`£8).
`
`£8).
`
`88)‘
`
`5%).
`
`NM 2“ 8W Q“ X“
`
`
`wmmwig .__<sm_ zo?gmomwo M65 “6 SE #6
`
`m .QE
`
`Ex. 1009 Page 7
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 7 0f 12
`
`US 6,401,118 B1
`
`CE
`
`NNONZ‘
`
`£2).
`
`
`
`ONE ‘1.
`
`CNON).
`
`a‘ m2 @E #2
`
`
`
`926a Eow M05 “6 mi; in
`
`k .QE
`
`Ex. 1009 Page 8
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 8 0f 12
`
`US 6,401,118 B1
`
`m .QE
`
`VWWVVWW‘
`
`Ex. 1009 Page 9
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 9 of 12
`
`US 6,401,118 B1
`
`«mow
`
`£8
`
`omom
`
`
`
`._.Zm._.ZOO._.Zm=._0
`
`m_o<n_
`
`com
`
`Sm
`
`Em.
`
`N5
`
`H200m.
`
`OI
`
`Ex. 1009 Page 10
`
`Ex. 1009 Page 10
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 10 of 12
`
`US 6,401,118 B1
`
`com
`
`mm
`
`woo\
`528:,m__d
`
`aGE
`
`mom
`
`E8822:mzzzoE2889
`
`=,Ez8be
`
`
`
`
`
` gazeo__§$:,_H.E,_889>\Sm
`
`\mom
`
`8
`
`8
`
`S
`
`8
`
`cm
`
`2
`
`lN3lNOO lN3HOlM S39Vd JO HHSWHN
`
`
`
` :,E.z8o_%<O
`
`9
`
`Ex. 1009 Page 11
`
`Ex. 1009 Page 11
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 11 0f 12
`
`US 6,401,118 B1
`
`coo?
`
`woo? \“
`
`N028‘
`
`Ex. 1009 Page 12
`
`
`
`U.S. Patent
`
`Jun. 4, 2002
`
`Sheet 12 of 12
`
`US 6,401,118 B1
`
`2.GE
`
`
`
`_>_m:m>mmm5n__>_oo
`
`|lIo..E|l|
`
`3:
`
`NNZ.
`
`m:m<>o_>m_m
`
`.:ZDmo<m_obm
`
`m._m<>o_>_m_m
`
`.:ZDmw<m_o5
`
`mm:
`
`
`
`I._.<n_mzo:<o_z:_>=>_oo
`
`3:
`
`mo:
`
`02;.
`
`NSF
`
`3:
`
`om:
`
`m._m<>o_>_m_m_
`
`
`
`m>_momo<m_o5
`
`mzo:<o_z:_>__>_oo
`
`m_o<nEm»z_
`
`
`
`>mo_>_m__>_>m<ozoomm
`
`
`
` >mo_>m__>_z_<_>_tmommmoomaI8:
`
`
`
`v_m_n_n_m<:
`
`m_>_mo
`
`mam
`
`Ex. 1009 Page 13
`
`Ex. 1009 Page 13
`
`
`
`
`
`US 6,401,118 B1
`
`1
`METHOD AND COMPUTER PROGRAM
`PRODUCT FOR AN ONLINE MONITORING
`SEARCH ENGINE
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`
`This application claims priority to US. Provisional Patent
`Application No. 60/091,164, ?led Jun. 30, 1998, Which is
`incorporated herein by reference in its entirety.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates generally to computer net
`Work search engines, and more particularly to search engines
`for performing online monitoring activities.
`2. Related Art
`Over the past several years, there has been an explosion
`of computers, and thus people, connected to the global
`Internet and the World-Wide Web
`This increase of
`connectivity has alloWed computer users to access various
`types of information, disseminate information, and be
`eXposed to electronic commerce activities, all With a great
`degree of freedom. Electronic commerce includes large
`corporations, small businesses, individual entrepreneurs,
`organiZations, and the like Who offer their information,
`products, and/or services to people all over the World via the
`Internet.
`The rise in the usage of the Internet, hoWever, has also had
`a negative side. Given the Internet’s vastness and freedom,
`many unscrupulous individuals have taken the opportunity
`to pro?t by violating the intellectual property of others. For
`eXample, it has been estimated that billions of dollars in
`pro?ts are lost each year due to piracy of copyrighted
`materials over the Internet. These lost pro?ts result from
`unscrupulous individuals making available through the
`Internet, either free or for their oWn pro?t, copyrighted
`materials such as music, movies, magaZines, softWare, and
`pictures. Also, an individual, a company, an organiZation, or
`the like may be concerned With other intellectual property
`violations such as the illegal sale of their products, or the
`sale of inferior products using their brand names—that is,
`patent and trademark infringements. Furthermore, an
`individual, a company, an organiZation, or the like may be
`concerned With false information (i.e., “rumors”) that origi
`nate and spread quickly over the Internet, resulting in the
`disparagement of the individual, company, organiZation, or
`the like. Such entities may also be interested in gathering
`data about hoW they and their products and/or services are
`perceived on the Internet (i.e., a form of market research).
`Individual artists, Writers, and other oWners of intellectual
`property are currently forced to search Internet Web sites,
`File Transfer Protocol (FTP) sites, chat rooms, etc. by
`visiting over thousands of sites in order to detect piracy or
`disparagement at offending sites. Such searching is currently
`done either by hand or using commercial search engines.
`Each of these methods is costly because a great amount of
`time is required to do such searching—time that detracts
`from positive, pro?t-earning activities. Adding to the frus
`tration of detecting infringements is the fact that commercial
`search engines are infrequently updated and typically limit
`the resulting number of sites (“hits”) that a search request
`returns. Furthermore, the task of visiting each site to deter
`mine Whether there is indeed an infringement or disparage
`ment and if so, the eXtent and character of it, also demands
`a great deal of time.
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`2
`Therefore, in vieW of the above, What is needed is a
`system, method and computer program product for an online
`(i.e., Internet or intranet) monitoring search engine. Such
`online monitoring Would enhance the ability of intellectual
`property oWners and business oWners to detect and prioritiZe
`their response to infringements and disparagements. Further,
`What is needed is a system, method and computer program
`product that searches the Internet’s Web pages, FTP sites,
`FSP sites, Usenet neWsgroups, chat rooms, etc. for data
`relevant to the intellectual property and goodWill oWned by
`an entity and produces a detailed, customiZed report of
`offending sites.
`SUMMARY OF THE INVENTION
`The present invention is a system, method and computer
`program product for an online monitoring search engine that
`satis?es the above-stated needs. The method involves
`receiving search criteria from a user, Where the search
`criteria re?ects the user’s intellectual property infringement
`or disparagement concerns. Then a search of the Internet (or
`intranet) is done for uniform resource locators (URL’ s) (i.e.,
`addresses) that specify sites Which contain contents match
`ing the search criteria. After a list of URL’s containing
`probable infringements or disparagement is obtained, the
`pages of each URL are doWnloaded, archived, and scored.
`The method also obtains contact information for each reg
`istrant of the offending URL’S. The method then produces
`a report listing the offending URL’s and the score for each
`of the URL’s. The report may then be utiliZed by the user to
`plan intellectual property infringement or disparagement
`enforcement activities. In a preferred embodiment of the
`present invention, before generating a report, the pages are
`also grouped into “actual sites” to reduce the magnitude of
`information contained in the report. The method may also
`list the highest scoring page for each of the actual sites, as
`Well as the highest ranking actual site.
`The online monitoring system of the present invention
`includes a Web server for receiving search criteria, search
`setup, and management inputs from users, an intellectual
`property infringement server (IPIS) for searching the Inter
`net (or any computer netWork) for URL’s that contain
`contents matching the search criteria to thereby compile a
`list of offending URL’s. The system also includes a ?le
`system for storing pages from each of the offending URL’s
`and a relational database for alloWing the IPIS to perform
`queries of the pages in order to produce a report. In a
`preferred embodiment, the system also includes a plurality
`of Web clients that provide a graphical user interface (GUI)
`for users to enter their search criteria, as Well as vieW pages
`of the offending URL’s by communicating With the Web
`server.
`One advantage of the present invention is that intellectual
`property oWners may quickly and efficiently search and ?nd
`infringements and disparagements contained on Web, FTP,
`and FSP sites, as Well as chat rooms and Usenet neWsgroups
`Within the Internet.
`Another advantage of the present invention is that detailed
`and customiZable reports listing offending sites and associ
`ated metrics are produced alloWing intellectual property
`oWners to focus their enforcement activities.
`Another advantage of the present invention is that its
`back-end (search engine) and front-end (user interface) are
`designed to operate independently of each other, thus alloW
`ing greater throughput and availability of the system as a
`Whole.
`Yet another advantage of the present invention is that lists
`of probable offending URL’s may be grouped and
`
`Ex. 1009 Page 14
`
`
`
`US 6,401,118 B1
`
`3
`prioritized, both in an automated and manual fashion, in
`order to arrive at a manageable set of data to focus intel
`lectual property enforcement activities.
`Further features and advantages of the invention as Well
`as the structure and operation of various embodiments of the
`present invention are described in detail beloW With refer
`ence to the accompanying draWings.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`The features and advantages of the present invention Will
`become more apparent from the detailed description set
`forth beloW When taken in conjunction With the draWings in
`Which like reference numbers indicate identical or function
`ally similar elements. Additionally, the left-most digit of a
`reference number identi?es the draWing in Which the refer
`ence number ?rst appears.
`FIG. 1 is a block diagram illustrating the system archi
`tecture of an embodiment of the present invention, shoWing
`netWork connectivity among the various components;
`FIG. 2 is a block diagram illustrating the softWare archi
`tecture of an embodiment of the present invention, shoWing
`communications among the various components;
`FIG. 3 is a ?oWchart shoWing the overall operation of an
`embodiment of the present invention;
`FIG. 4 is a block diagram illustrating the softWare archi
`tecture of an intellectual property infringement server
`according to an embodiment of the present invention;
`FIG. 5 is a ?oWchart shoWing the operation of a meta
`search engine, according to an embodiment of the present
`invention;
`FIGS. 6—10 are exemplary output report pages according
`to an embodiment of the present invention; and
`FIG. 11 is a block diagram of an exemplary computer
`system useful for implementing the present invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`Table of Contents
`
`I. OvervieW
`II. System Architecture
`III. SoftWare Architecture
`IV. Overall Monitoring System Operation
`A. Inputs and Searching
`B. Web CraWling
`C. FTP CraWling
`D. Processing
`E. Output
`F. DoWnloading Non-FTP and Non-HTTP Contents
`V. Graphical User Interface (Front-End)
`VI. Search Engine (Back-End)
`A. Multi-Threaded Execution Environment
`B. Meta Search Engine Mode
`C. Standard Search Engine Mode
`VII. Output Reports
`VIII. Front-End and Back-End Severability
`IX. Environment
`X. Conclusion
`I. OvervieW
`The present invention is directed to a system, method, and
`computer program product for an online monitoring search
`engine. In a preferred embodiment of the present invention,
`an organiZation provides monitoring services for clients that
`Would include, for example, individuals, companies,
`
`4
`consortiums, organiZations, and the like Who are interested
`in protecting their intellectual property and/or goodWill from
`infringement or disparagement on the Internet.
`Such a monitoring organiZation Would employ an intel
`ligent search engine that spans the entire Internet (Web
`pages, FTP sites, FSP sites, chat rooms, Usenet neWsgroups,
`etc.) and returns links to Internet sites that, With a high
`probability of certainty, contain infringing or disparaging
`content. The input of the monitoring organiZation’s search
`engine Would be customiZed for each client based on, for
`example, their products, services, business activity, and/or
`the form of intellectual property oWned. The monitoring
`organiZation’s search engine Would also provide detailed
`reports, also customiZed to ?t each client’s monitoring
`needs, so that the client’s legal personnel may prioritiZe their
`enforcement activities. In a preferred embodiment, the
`monitoring organiZation also provides a Web server so that
`clients may remotely utiliZe the search engine.
`While the present invention is described in terms of the
`above example, this is for convenience only and is not
`intended to limit its application. In fact, after reading the
`folloWing description, it Will be apparent to one skilled in the
`relevant art(s) hoW to implement the folloWing invention in
`alternative embodiments (e.g., providing online monitoring
`for a corporate intranet or extranet).
`Furthermore, While the folloWing description focuses on
`the monitoring of Web sites and FTP sites, and thus employs
`such terms as URL’s (addresses) and Web pages (contents),
`it is not intended to limit the application of the present
`invention. It Will be apparent to one skilled in the relevant
`art hoW to implement the folloWing invention, Where
`appropriate, in alternative embodiments. For example, the
`present invention may be applied to monitoring Internet
`addresses (URL’s, URN’s, and the like) that specify the
`contents of chat rooms, or Usenet neWsgroups, FSP sites,
`etc.
`II. System Architecture
`FIG. 1 is a block diagram illustrating the physical archi
`tecture of a monitoring system 100, according to an embodi
`ment of the present invention, shoWing netWork connectivity
`among the various components. It should be understood that
`the particular monitoring system 100 in FIG. 1 is shoWn for
`illustrative purposes only and does not limit the invention.
`As Will be apparent to one skilled in the relevant art(s), all
`of components “inside” of the monitoring system 100 are
`connected and communicate via a local area netWork (LAN)
`101.
`The monitoring system 100 includes an intellectual prop
`erty infringement server 106 (shoWn as “IPIS” 106) that
`serves as the “back-end” (i.e., search engine) of the present
`invention. Connected to the IPIS 106, is a relational database
`102 (shoWn as “DB” 102), a ?le system 104, and a Web
`server 108. As is Well-knoWn in the relevant art(s), a Web
`sever is a server process running at a Web site Which sends
`out Web pages in response to Hypertext Transfer Protocol
`(HTTP) requests from remote broWsers. The Web server 108
`serves as the “front end” of the present invention. That is, the
`Web server 108 provides the graphical user interface (GUI)
`to users of the monitoring system 100 in the form of Web
`pages. Such users may access the Web server 108 at the
`monitoring organiZation’s site via a plurality of internal
`search Workstations 110 (shoWn as Workstations 110a—n).
`A ?reWall 112 (shoWn as “FW” 112) serves as the
`connection and separation betWeen the LAN 101, Which
`includes the plurality of netWork elements (i.e., elements
`102—110) “inside” of the LAN 101, and the global Internet
`103 “outside” of the LAN 101. Generally speaking, a
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`Ex. 1009 Page 15
`
`
`
`US 6,401,118 B1
`
`15
`
`5
`?reWall—Which is Well-knoWn in the relevant art(s)—is a
`dedicated gateway machine With special security precaution
`software. It is typically used, for example, to service Internet
`103 connections and dial-in lines, and protects a cluster of
`more loosely administered machines hidden behind it from
`an external invasion.
`The global Internet 103, outside of the LAN 101, includes
`a plurality of various FTP sites 114 (shoWn as sites 114a—n)
`and the WWW 116. Within the WWW 116 are a plurality of
`Web sites 120 (shoWn as sites 120a—n). The search space for
`the IPIS 106 includes the W 116 and the plurality of
`FTP sites 114. As mentioned above, it Will be apparent to one
`skilled in the relevant art(s), that the search space (i.e.,
`Internet 103) of the monitoring system 100, although not
`shoWn, Will also include chat rooms, Usenet newsgroups,
`FSP sites, etc.
`Aplurality of external search Workstations 118 (shoWn as
`Workstations 120a—n) are also located Within the WWW
`116. The external search Workstations 118 alloW clients of
`the monitoring organiZation to remotely perform searches
`using their oWn personnel and equipment.
`While only one database 102, ?le system 104, and IPIS
`106 computer are shoWn in FIG. 1, it Will be apparent to one
`skilled in the relevant art(s) that monitoring system 100 may
`be run in a distributed fashion over a plurality of the
`25
`above-mentioned netWork elements connected via LAN
`101. For example, both the IPIS 106 “back-end” application
`and the Web server 108 “front-end” may be distributed over
`several computers thereby increasing the overall execution
`speed of the monitoring system 100. More detailed descrip
`tions of the monitoring system 100 components, as Well their
`functionality, are provided beloW.
`III. SoftWare Architecture
`Referring to FIG. 2, a block diagram illustrating a soft
`Ware architecture 200 according to an embodiment of moni
`toring system 100, shoWing communications among the
`various components, is shoWn. The softWare architecture
`200 of monitoring system 100 includes softWare code that
`implements the IPIS 106 in a high level programming
`language such as the C++ programming language. Further,
`in an embodiment, the IPIS 106 softWare code is an appli
`cation running on an IBMTM (or compatible) personal com
`puter (PC) in the WindoWs NTTM operating system environ
`ment.
`In a preferred embodiment of the present invention, the
`database 102 is implemented using a high-end relational
`database product (e.g., MicrosoftTM SQL Server, IBMTM
`DB2, ORACLETM, INGRESTM, etc.). As is Well-knoWn in
`the relevant art(s), relational databases alloW the de?nition
`of data structures, storage and retrieval operations, and
`integrity constraints, Where data and relations betWeen them
`are organiZed in tables.
`In a preferred embodiment of the present invention, the
`IPIS 106 application communicates With the database 102
`using the Open Database Connectivity (ODBC) interface.
`As is Well-knoWn in the relevant art(s), ODBC is a standard
`for accessing different database systems from high level
`programming language application. It enables these appli
`cations to submit statements to ODBC using an ODBC
`structured query language (SQL) and then translates these to
`the particular SQL commands the underlying database prod
`uct employs.
`The physical ?le system 104, in a preferred embodiment
`of the present invention, is any physical memory device that
`includes a storage media and a cache (e. g., the hard drive and
`primary cache, respectively, of the same PC that runs the
`IPIS 106 application). In an alternative embodiment, the ?le
`
`35
`
`6
`system 104 may be a memory device external to the PC
`hosting the IPIS 106 application. In yet another alternative
`embodiment, the ?le system 104 may encompass a storage
`media physically separate from the cache, Where the storage
`media may also be distributed over several elements Within
`LAN 101. Further, in a preferred embodiment of the present
`invention, the ?le system 104 communicates With the IPIS
`106 application and Web server 108 using the native ?le
`commands of the operating system in use (e.g., WindoWs
`NTTM).
`The Web server 108 provides the GUI “front-end” for
`monitoring system 100. In a preferred embodiment of the
`present invention, it is implemented using the Active Server
`Pages (ASP), Visual BASIC (VB) script, and JavaScriptTM
`sever-side scripting environments that alloW the creation of
`dynamic Web pages. The Web server 108 communicates
`With the plurality of external search Workstations 118 and
`the plurality of internal search Workstations 110 (collectively
`shoWn as a “Web Clients” 202) using the Hypertext Transfer
`Protocol
`The Web clients 202 user interface is a
`broWser implemented using Java, JavaScriptTM, and
`Dynamic Hypertext Markup Language (DHTML). In a
`preferred embodiment of the present invention, as Will be
`described in detail beloW in Section VIII, the Web clients
`202 may also communicate directly With the IPIS 106
`application via HTTP.
`IV. Overall Monitoring System Operation
`A. Inputs and Searching
`Referring to FIG. 3, a ?oWchart 300 shoWing the overall
`operation of the monitoring system 100, according to an
`embodiment of the present invention, is shoWn. FloWchart
`300 begins at step 302 With control passing immediately to
`step 304. In step 304, a user (on one of the Web client 202
`Workstations), de?nes a search criteria. The search criteria,
`as explained in detail beloW in Section V, are customiZed
`according to a particular client’s intellectual property
`infringement or disparagement concerns. In step 306, a
`search of the Internet 103 is performed. This search returns
`a list of probable uniform resource locators (URL’s). As is
`Well-knoWn in the relevant art(s), a URL is the standard for
`specifying the location of an object on the Internet 103. The
`URL standard addressing scheme is speci?ed as “protocol://
`hostname” (e.g., “http://WWW.aicompanycom”, “ftp://
`organiZation/pub/?les” or “neWs:alt.topic”). An URL begin
`ning With “http” speci?es a Web site 120, an URL beginning
`With “ftp” speci?es an FTP site 114, and an URL beginning
`With “neWs” speci?es a Usenet neWsgroup, etc. The prob
`able URL’s indicate a ?rst (preliminary) set of locations
`(i.e., addresses) on the Internet 103, based on the search
`criteria, Where infringements or disparagements may occur.
`The details of the search in step 306 are described in detail
`beloW in Section V.
`B. Web CraWling
`In step 308, each of the probable URL’s is visited and the
`contents doWnloaded locally to the cache of the ?le system
`104. The aim of the doWnload step 308 is so that subsequent
`processing steps of the monitoring system 100 may be
`performed on “local” copies of the visited URL’s. This
`eliminates the need for re-visiting (and thus, re-establishing
`a connection to) each of the URL’s Web severs, thus
`increasing the overall performance of the monitoring system
`100.
`If any of the URL’s Within the preliminary set contains
`?les, those ?les may contain potentially infringing materials
`(e.g., a “*.mp3” music ?le, or a “*.gif” or “*.jpg” image
`?le). This is in contrast to actual text located on a Web page
`of a particular Web site 120. The ?les may be located (1) on
`
`45
`
`55
`
`65
`
`Ex. 1009 Page 16
`
`
`
`US 6,401,118 B1
`
`7
`a different Web site 120 accessible via a hyperlink on the
`Web page the monitoring system 100 is currently accessing;
`(2) on a different Web page of the same Web site 120 the
`monitoring system 100 is currently accessing; or (3) in a
`different directory of the FTP site 114 than the monitoring
`system 100 is currently accessing. In these instances, the
`monitoring system 100 employs a Web craWling technique
`in order to locate the ?les. After the original URL is visited
`and the link to the ?le is identi?ed, the monitoring system
`100 truncates the link URL at the rightmost slash (“/”), thus
`generating a neW link URL. This process is repeated until a
`reachable domain is generated. This technique takes advan
`tage of the fact that most designers of Web sites 120 alloW
`“default” documents to be returned by their Web servers in
`response to such URL (via HTTP) requests. An eXample of
`the IPIS 106 Web craWling technique is shoWn in Table 1
`beloW.
`
`TABLE 1
`
`EXAMPLE OF IPIS 106 WEB CRAWLING TECHNIQUE
`
`Original Web Page URL:
`http://WWW.links-to-interesting-?les—all—over-the—net.com
`Interesting Links Found on the Original Web Page Identi?ed by Client’s
`Search Criteria:
`http://WWW.really-good-music-not-yet-released-.com/future—hit.mp3
`ftp://WWW.company-trades-secrets.com/july/tradeseceret.doc
`Truncated URL’s:
`http://WWW.really-good-music-not-yet-released.com/
`ftp://WWW.company-trades-secrets.com/july/
`ftp://WWW.company—trades—secrets.com/
`
`For any Web site 120 Where the site’s server is not
`currently responding (i.e., “doWn” or “off-line”), the IPIS
`106 application, before removing the URL corresponding to
`the site from the preliminary set, implements a “re-try” timer
`and mechanism.
`C. Nice FTP CraWling
`When any of the URL’s Within the preliminary set is an
`FTP site 114 (or FSP site), the normal steps of visiting and
`doWnloading the sites are not practical and thus, not used.
`Therefore, the present invention contemplates a method for
`“FTP craWling” in order to accomplish step 308 for such
`URL’s. First, the IPIS 106 application attempts to log into
`the FTP site 114 speci?ed by the URL. As is Well knoWn in
`the relevant art(s), there are tWo types of FTP sites 114—
`passWord protected sites and anonymous sites. If the site 114
`is passWord protected and the passWord is not published in
`a reference linked page, it is passed over and the URL is
`removed from the preliminary set. If the FTP site 114 has a
`published passWord, the IPIS 106 attempts to login using
`that passWord. If the FTP site 114 is an anonymous site, the
`IPIS 106 application attempts to log in. As is Well knoWn in
`the relevant art(s), an anonymous FTP site alloWs a user to
`login using a user name such as “ftp” or “anonymous” and
`then use their electronic mail address as the passWord.
`In any event, if a connection can be established, the IPIS
`106 application has access to the directory hierarchy con
`taining the publically accessible ?les (e.g., a “pub”
`subdirectory). The IPIS 106 application may then “nicely”
`craWl the relevant portions of the FTP site 114 by mapping
`the directory structure and then visiting certain directories
`based on keyWords derived from the de?ned search criteria
`(step 304).
`The purpose of nice FTP craWling is to capture the
`relevant contents of the FTP site 114 as it relates to the client
`Without burdening the host’s resources by craWling the
`entire FTP site 114. This is especially important due the large
`siZe of a typical FTP site 114 (e.g., a university’s site or
`
`8
`someone entire PC hard disk drive), and due to the lack of
`craWl restriction standards like the “robots.tXt” ?le com
`monly found on Web sites 120.
`Suppose the IPIS 106 is searching the for the directory:
`“ftp://ftp.stuff.com/~user/music/famousiartist” in the con
`teXt of a music and copyright infringement related search.
`First, the nice FTP craWling technique involves establishing
`a single connection to the FTP site 114 (even if multiple
`content is needed from the site) and then going to the root
`directory. Second, a counter is then marked Zero and a
`directory listing and snapshot of the current directory is
`taken. For each directory, if the directory name is
`“interesting,” then the IPIS 106 enters the directory, sets the
`counter to a positive number (e.g., C=2), then repeats the
`listing and snapshot step. If the counter is greater than Zero
`or the directory is on the Way to the destination directory,
`then the directory is entered and then