`Freivald et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US005898836A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,898,836
`Apr. 27, 1999
`
`[54] CHANGE-DETECTION TOOL INDICATING
`DEGREE AND LOCATION OF CHANGE OF
`INTERNET DOCUMENTS BY COMPARISON
`OF CYCLIC-REDUNDANCY-CHECK(CRC)
`SIGNATURES
`
`[75]
`
`Inventors: Matthew P. Freivald, Sunnyvale;
`Mark S. Richards, San Jose; Alan C.
`Noble, Santa Cruz, all of Calif.
`
`[73] Assignee: Netmind Services, Inc., Campbell,
`Calif.
`
`[21] Appl. No.: 08/783,625
`
`[22] Filed:
`
`Jan. 14, 1997
`
`Int. Cl.6
`[51]
`..................................................... H04L 12/00
`[52] U.S. Cl. ....................................... 395/200.48; 707/513
`[58] Field of Search ......................... 395/200.48, 400.49;
`707/10, 203, 511, 513
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,388,255
`5,630,116
`5,813,007
`
`2/1995 Pytlik et a!.
`... ... ... ... ... .... ... ... ... 395/600
`5/1997 Takaya et a!. .......................... 395/617
`9/1998 Nielsen ..................................... 707/10
`
`Primary Examiner-Lance Leonard Barry
`Attorney, Agent, or Firm-Stuart T. Auvinen
`
`[57]
`
`ABSTRACT
`
`A change-detection web server automatically checks web(cid:173)
`page documents for recent changes. The server retrieves and
`compares documents one or more times a week. The user is
`notified by electronic mail when a change is detected. The
`user registers a web-page document by submitting his e-mail
`address and the uniform-resource locator (URL) of the
`desired document. The document is fetched and the user can
`select text on the page of interest. Non-selected text is
`ignored; only changes in the selected text are reported back
`to the user. Thus changes to less relevant parts of the
`document are ignored. The document is divided into sections
`bounded by hyper-text markup-language (HTML) tags. A
`checksum is generated and stored for each HTML-bound
`section. Storage requirements are reduced since only check(cid:173)
`sums are stored rather than the original documents. During
`periodic comparisons a fresh copy of the document is
`retrieved, divided into HTML-bound sections and check(cid:173)
`sums generated for each section. The freshly-generated
`checksums are compared to the archived checksums. Sec(cid:173)
`tions with non-matching checksums are highlighted as
`changed, and the percentage of changed sections is reported.
`The user-defined selection is also stored as a checksum and
`compared to a freshly-generated checksum. Changed check(cid:173)
`sums outside the user-defined selection do not generate a
`change notification. Re-ordering of sections does not gen(cid:173)
`erate a change notification when the checksums otherwise
`match. Thus format and layout changes do not generate
`change notifications, and the frequency of notices to user is
`reduced.
`
`19 Claims, 10 Drawing Sheets
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`">
`12
`
`MINDER l
`
`22
`
`SOURCE
`DOCUMENT
`SERVER
`
`14
`
`24
`----)----------------
`(WWW BROWSER)
`..._ _____ __. CONFIRMED
`20
`
`USER SETUP
`
`14
`;>
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`WEEKLY COMPARE
`
`I RESPONDER L
`----) ----------------
`
`24
`
`20
`
`SAP Exhibit 1005, Page 1 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 1 of 10
`
`5,898,836
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`r-----------------------
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`MINDER
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`RESPONDER
`
`24
`
`-----~----------------
`20
`
`FIG. 1
`
`SAP Exhibit 1005, Page 2 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 2 of 10
`
`5,898,836
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`~
`12
`
`r-----------------------
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`MINDER l
`
`22
`
`URL, SECTIONS
`E-MAILADDR
`
`14
`
`USER
`CLIENT
`
`24
`-----)----------------
`(WWW BROWSER)
`- - - - - - - CONFIRMED 20
`
`RESPONDER
`
`USER SETUP
`
`FIG. 2
`
`SAP Exhibit 1005, Page 3 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 3 of 10
`
`5,898,836
`
`r-----------------------
`: CHANGE-DETECTION
`: TOOL WEB SERVER
`I
`I
`I
`
`SOURCE
`DOCUMENT
`SERVER
`
`URL
`
`(WWW SERVER)
`
`MINDER
`
`12
`
`DOCUMENT
`
`OLD CRC'S
`
`14
`~
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`I RESPONDER l"
`
`24
`·-----~----------------
`20
`
`WEEKLY COMPARE FIG. 3
`
`SAP Exhibit 1005, Page 4 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 4 of 10
`
`5,898,836
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`NOTICE
`
`12
`
`14
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`r-----------------------
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`MINDER
`
`22
`
`16
`
`I RESPONDER L
`
`24
`·-----~----------------
`20
`
`REPORT CHANGE FIG. 4
`
`SAP Exhibit 1005, Page 5 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 5 of 10
`
`5,898,836
`
`CRC
`SELECTION
`
`CRC1
`
`SOURCE
`DOC
`
`...,___ _ __..,
`
`. . . . . .
`
`... /START,
`// LEN1
`. .
`.
`
`URL
`
`. . . .
`E-MAILADDR
`.. . . . . . . . :.. ........ . LEN1
`CRC1
`LEN2
`
`CRC2
`
`RESPONDER
`
`40
`
`FIG. 5
`
`SAP Exhibit 1005, Page 6 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 6 of 10
`
`5,898,836
`
`NEXT CHAR
`
`NEW
`SOURCE
`DOC
`.
`·\ .. ~RL
`
`URL
`
`E-MAILADDR
`
`LEN1
`
`CRC1
`
`LEN2
`
`CRC2
`
`40
`
`MINDER
`
`FIG. 6
`
`y
`
`NO MORE
`CHAR'S·
`'
`STRING
`NOT FOUND
`
`FOUND
`EXACT
`STRING
`(NO CHANGE)
`
`CHANGE
`NOTICE
`
`SAP Exhibit 1005, Page 7 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 7 of 10
`
`5,898,836
`
`<TAG1>
`TEXT IN SECTION 1 ...
`</TAG1>
`<TAG2>
`TEXT IN SECTION 2 ...
`</TAG2>
`<TAG3>
`TEXT IN SECTION 3 ...
`</TAG3>
`<TAG4>
`TEXT IN SECTION 4 ...
`</TAG4>
`
`•
`•
`•
`
`1
`
`2
`
`3
`
`4
`
`CRC1
`
`CRC2
`
`CRC3
`
`CRC4
`
`•
`•
`•
`
`FIG. 7
`
`SAP Exhibit 1005, Page 8 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 8 of 10
`
`5,898,836
`
`SOURCE
`DOC
`
`....,..__ _ __.
`
`~----.., EN/DIS
`SECTION
`
`..... ···· ...
`. . . . . . .
`
`URL
`
`CRC1
`
`E-MAILADDR
`CRC1
`
`0
`
`1
`
`2
`
`3
`
`4
`
`1
`
`1
`
`0
`
`CRC2
`
`CRC3
`
`CRC4
`
`52
`
`40'
`
`FIG. 8
`
`RESPONDER
`
`SAP Exhibit 1005, Page 9 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 9 of 10
`
`5,898,836
`
`NEW
`SOURCE
`DOC
`. .
`. URL
`.
`. . .
`.
`.
`
`URL
`
`E-MAILADDR
`CRC1
`
`0
`
`1
`
`2
`
`3
`
`4
`
`1 CRC2
`1 CRC3
`
`0
`
`CRC4
`
`52
`
`40'
`
`MINDER
`
`PARSER
`DIVIDER
`
`NEWCRC'S
`
`80
`
`1
`
`11
`
`NCRC1
`2 NCRC2 00
`3 NCRC3 01
`
`4 NCRC4 00
`
`82
`
`FOUND
`EXACT
`SECTION
`(NO CHANGE)
`
`FIG. 9
`
`SAP Exhibit 1005, Page 10 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 1999
`
`Sheet 10 of 10
`
`5,898,836
`
`FIG. 10
`
`USER
`SELECTION
`
`<TAG1>
`TEXT IN SECTION 1 ...
`</TAG1>
`<TAG2>
`TEXT IN SECTION 2 ...
`MORE TEXT IN SECTION 2 ...
`</TAG2>
`<T AG3>
`TEXT IN SECTION 3 ...
`
`MORE TEXT IN SECTION 3 ...
`</TAG3>
`
`SECTION#
`
`CRC
`
`ENA/DIS
`SECTION
`
`1
`
`2
`
`3
`
`4
`
`CRC1
`
`CRC2
`
`CRC3
`
`CRC4
`
`0
`
`1
`
`1
`
`0
`
`"'
`
`90
`
`USER
`SELECTION
`#
`
`STARTING
`SECTION#
`
`) 92
`
`LENGTH
`
`CRC
`
`1
`
`2
`
`LEN1
`
`CRC1A
`
`SAP Exhibit 1005, Page 11 of 21
`
`
`
`5,898,836
`
`1
`CHANGE-DETECTION TOOL INDICATING
`DEGREE AND LOCATION OF CHANGE OF
`INTERNET DOCUMENTS BY COMPARISON
`OF CYCLIC-REDUNDANCY-CHECK(CRC)
`SIGNATURES
`
`2
`These automated software tools are sometimes known as
`"netbots", a network robot which automatically performs
`some task for a user. Netbots allow users to better manage
`the information on the Internet and reduce the amount of
`information that a user must read. Filtering down the amount
`of information is critical to making good use of the over(cid:173)
`whelming amount of information available on the Internet.
`More recent change-detection tools allow users to register
`a document or web page on the Internet and be notified when
`any change to that document occurs. The user "registers" a
`document by specifying the URL of the document, and
`providing the user's e-mail address. The change-detection
`tool stores a local copy of the document together with the
`user's e-mail address. Once every day or week the change-
`15 detection tool accesses the source document at the specified
`URL, and compares the retrieved source document to the
`local copy of the document. If a difference between the older
`local copy and the just-retrieved source document is
`detected, then a message is sent to the user's e-mail address,
`20 perhaps with a copy of the new document or a copy of the
`changes.
`The document-change tool could store an actual copy of
`the entire document at the tool's web site for comparison.
`However, storing the whole document at the
`documentchange-tool's web site is expensive because large
`amounts of storage are needed. For example, if 500,000
`documents were registered, and each document averages 50
`Kbytes, then 25 GigaBytes of storage are needed to store
`copies of the registered documents.
`Instead of storing the entire document, the revision date or
`time-stamp of the document could be stored. U.S. Pat. No.
`5,388,255 shows a database which compares time stamps to
`determine when data has changed. Since the time-stamp is
`35 much smaller than the entire document, storage space is
`reduced at the tool's web site.
`The inventors have a change-detection tool which stores
`a checksum or CRC of the document rather than the time-
`stamp or the entire document. When the document is ini-
`40 tially registered, a checksum is generated for the entire
`source document. This checksum is stored at the tool's web
`site. Each week when the source document is retrieved,
`another checksum is generated and compared to the stored
`checksum. If the stored checksum matches the newly(cid:173)
`generated checksum, then no change is detected. When the
`checksums do not match, then the user is notified of a change
`by e-mail. The user can optionally have a copy of the new
`document attached to the e-mail notification.
`Such a change-detection tool called a "URL-minder" has
`been available for free public use at the inventor's web site,
`www.netmind.com, for more than a year before the filing
`date of the present application. Over 150,000 documents or
`URL's are registered at that site for 1.4 million users.
`
`30
`
`25
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to software retrieval tools for
`networks, and more particularly for a change-detection and 10
`highlighting tool for the Internet.
`2. Description of the Related Art
`Today's society is sometimes referred to as an informa(cid:173)
`tion society. Technology has increased the ease of generating
`and disseminating information. The widespread acceptance
`of the global network known as the Internet allows huge
`amounts of information to be instantly transmitted to per(cid:173)
`sons around the world.
`Explosive growth is occurring in the part of the Internet
`known as the World-Wide Web, or simply the "web". The
`web is a collection of millions of files or "web pages" of text,
`graphics, and other media which are connected by hyper(cid:173)
`links to other web pages. These may physically reside on a
`computer system anywhere on the Internet---{)n a computer
`in the next room or on the other side of the world.
`These hyper-links often appear in the browser as a graphi(cid:173)
`cal icon or as colored, underlined text. A hyper-link contains
`a link to another web page. Using a mouse to click on the
`hyper-link initiates a process which locates and retrieves the
`linked web page, regardless of the physical location of that
`page. Hovering a mouse over a hyperlink or clicking on the
`link often displays in a corner of the browser a locator for the
`linked web page. This locator is known as a Universal
`Resource Locator, or URL.
`The vast amount of information available on the Internet
`has created an overload of information which the casual user
`cannot digest. Internet search tools or search engines allow
`users to find desired information by searching for keywords
`through an index of the millions of documents posted on the
`Internet. Search engines such as Excite of Mountain View,
`Calif. and Digital Equipment's "ALTAVISTA" help users
`quickly sift through huge amounts of information to find the
`desired information.
`A characteristic of the Internet is that it is relatively easy 45
`to change or update information. The user may wish to know
`when updates are made to the desired information he found
`with a search. For example, the information found may
`describe a bug fix or other revision in a software program.
`Initially a crude work-around or even just a notice of the bug 50
`may be posted on the Internet. Later, this posting may be
`updated with a more robust fix or other useful information.
`The information could also be a list of phone numbers or
`other contact information, or it could be a product list or a
`competitor's web site, advertising, or press releases.
`The user could frequently re-access the information on the
`Internet to see if changes have occurred, but this is time(cid:173)
`consuming. Frequently re-accessing the information is
`tedious, particularly when the information is contained in a
`long document, or when many documents must be checked 60
`for changes.
`Software tools have been developed to automate the task
`of detecting updates to information on the Internet. Early
`tools such as America Online's News Profiles allow users to
`specify keywords which are periodically searched for in a 65
`news database. News articles containing the specified key(cid:173)
`words are sent to the user by electronic mail (email).
`
`55
`
`MINOR CHANGES NOT FILTERED OUT
`While such a change-detection tool is useful, the existing
`tool has several drawbacks. Since minor changes are fre(cid:173)
`quently made to Internet documents, users are notified of
`many insignificant changes. The users can quickly become
`irritated with frequent e-mail notices of the minor, irrelevant
`changes. Statistics taken for the URL-minder tool in May,
`1996, showed that over 100,000 change notices were
`e-mailed in just four days to the 500,000 registered users.
`Internet documents change every few weeks on the average.
`Thus a user with a few dozen registered documents receives
`notices almost every day. This is an undesirably high fre(cid:173)
`quency of notices for many users.
`
`SAP Exhibit 1005, Page 12 of 21
`
`
`
`5,898,836
`
`3
`LOCATION OF CHANGE DESIRABLE
`
`When the entire document is stored rather than a
`checksum, the location of the change in the document can be
`found and highlighted to the user since the original docu(cid:173)
`ment is available for comparison. However, when a single
`checksum is stored for each registered document, the
`changes within that document cannot be determined or
`identified. Thus the user is left to determine the location of
`the change within the document, and the relevance of that
`change.
`With the existing URL-minder which stores only
`checksums, when a change is detected, the user is simply
`notified that there was a change. The user can optionally
`receive a copy of the changed document, but the changes are
`not highlighted. Thus the user must re-read the entire
`document to determine what the change was. Often the
`changes are minor and even hard to detect, such as a spelling
`change of a word, or a date change. Sometimes the order or
`arrangement of text has changed but not the content. These
`minor changes are not always significant to the user.
`Thus the user is plagued with frequent notices of minor
`changes, and the user must re-read the entire document to
`determine what the change was. Having to re-read the
`documents increases the burden on the user, which is the
`opposite intent of an automated tool or netbot.
`
`LONG, COMPLEX DOCUMENTS COMMON
`
`30
`
`4
`remote client to register a document for change detection by
`receiving from the remote client a uniform-resource-locator
`(URL) identifying the document. The responder fetches the
`document from the remote document server and generates
`an original checksum for a checked portion of the document.
`The checked portion is less than the entire document.
`A database is coupled to the responder. It receives the
`URL and the original checksum from the responder when
`the document is registered by the remote client. The data-
`10 base stores a plurality of records each containing a URL and
`a checksum for a registered document. A periodic minder is
`coupled to the database and the network connection. It
`periodically re-fetches the document from the remote docu(cid:173)
`ment server by transmitting the URL from the database to
`15 the network connection. The periodic minder receives a
`fresh copy of the document from the remote document
`server. The periodic minder generates a fresh checksum of a
`portion of the fresh copy of the document and compares the
`fresh checksum to the original checksum. A detected change
`20 is signaled to the remote client when the fresh checksum
`does not match the original checksum.
`Thus a change in the document is detected by comparing
`a checksum for the checked portion of the document.
`Changes in portions of the document outside the checked
`25 portion are not signaled to the remote client.
`In further aspects the database does not store the docu(cid:173)
`ment. The database stores a checksum for the document.
`Thus storage requirements for the database are reduced by
`archiving checksums and not entire documents.
`In other aspects of the invention a selection means is
`coupled to the responder. It receives a selection from the
`remote client. The selection identifies boundaries of the
`checked portion of the document. A parsing means is
`35 coupled to the periodic minder. It parses the fresh copy and
`generates checksums for a plurality of portions of the fresh
`copy. A compare means is coupled to the parsing means. It
`signals a match when any of the checksums generated by the
`parsing means matches the original checksum from the
`40 database. Thus a change in the document is detected when
`the match is not signaled by the compare means. The parsing
`means generates a plurality of checksums for the plurality of
`portions of the fresh copy.
`In still further aspects of the invention a length field
`45 indicates a size of the checked portion. The length field is
`written by the selection means. The parsing means generates
`each checksum for portions having the size of the checked
`portion. Thus the size of the checked portion is stored and
`used by the parsing means.
`In further aspects the document is a hyper-text markup-
`language (HTML) document containing HTML tags. The
`HTML tags indicate formatting, layout, and hyper-links
`specifying URLs of other servers. The change-detection web
`server also has divider means coupled to the responder, for
`55 dividing the document into portions bound by the HTML
`tags. A checksum means generates original checksums. An
`original checksum is generated for each portion bound by
`HTML tags. The database stores the original checksums for
`the portions bound by the HTML tags. The periodic minder
`60 also has a second divider means which divides the fresh
`copy of the document into portions bound by the HTML
`tags. A second checksum means generates fresh checksums
`for portions of the fresh copy bound by HTML tags in the
`fresh copy of the document. A compare means receives the
`65 fresh checksums of the fresh copy from the second check(cid:173)
`sum means. It compares the fresh checksums to the original
`checksums from the database. A report means signals a
`
`The change-detection tool allows a user to register a
`document by specifying the uniform-resource-locator
`(URL) of that document. A unique URL is specified for each
`web page on the Internet's world-wide-web. Other informa(cid:173)
`tion sometimes embedded in the URL includes passwords or
`search text that the user types in, or name and address
`information typed in. Internet documents are usually web
`pages containing several individual files such as for
`graphics, text, and motion video and sound. Sometimes
`these files include small programs such as CGI (common
`gateway interface) scripts. Thus the documents registered
`are fairly complex and often lengthy.
`Often the user is only interested in a small part of a
`document, rather than the whole document. A user might be
`interested only in one contact or phone number on a list of
`hundreds of phone numbers for an office, or only one
`product line in a long list of products. It is desirable to allow
`the user to specify only the portion of a document or web
`page which is of interest.
`What is desired is a storage-efficient change-detection
`tool which detects when changes occur to a registered 50
`document on the Internet. It is desired that minor changes to
`the document be filtered by the change-detection tool to
`reduce the number of change notifications sent to the user.
`It is also desired to give the user an indication of how
`significant the change is. It is desired to allow the user to
`identify relevant portions of a document so that the user is
`not notified of changes to other portions of the document. It
`is further desired to reduce storage requirements for the
`change-detection tool by storing a condensed checksum or
`signature of the registered document rather than storing the
`entire document.
`
`SUMMARY OF THE INVENTION
`
`A change-detection web server has a network connection
`for transmitting and receiving packets from a remote client
`and a remote document server. A responder is coupled to the
`network connection. The responder communicates with the
`
`SAP Exhibit 1005, Page 13 of 21
`
`
`
`5,898,836
`
`5
`change in the document when an original checksum for the
`document has no matching fresh checksum. Thus check(cid:173)
`sums are generated and stored for portions of the document
`bound by the HTML tags.
`In further aspects the report means has a mailer means
`coupled to the network connection. It sends a change noti(cid:173)
`fication message to the remote client when the change is
`signaled. The responder receives an electronic-mail address
`from the remote client and stores the electronic-mail address
`of the remote client in the database. The mailer means reads 10
`the electronic-mail address from the database. The change
`notification message is sent to the remote client as an
`electronic-mail message addressed to the electronic-mail
`address. Thus the remote client is notified of the change by
`electronic mail.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`OVERVIEW OF CHANGE-DETECTION WEB
`SERVER
`FIG. 1 is a diagram of a change detection tool on a server
`on the Internet. The user operates client 14 from a remote
`site on Internet 10. The user typically is operating a browser
`application, such as Netscape's Navigator or Microsoft's
`Internet Explorer. Client 14 communicates through Internet
`10 by sending and receiving TCP/IP packets to establish
`connections with remote servers, typically using the hyper(cid:173)
`text transfer protocol (http) of the world-wide web.
`
`6
`Client 14 retrieves web pages of files from document
`server 12 through Internet 10. These web pages are identi(cid:173)
`fied by a unique URL (uniform resource locator) which
`specifies a document file containing the text and graphics of
`a desired web page. Often additional files are retrieved when
`a document is retrieved. The "document" returned from
`document server 12 to client 14 is thus a composite docu(cid:173)
`ment composed of several files of text, graphics, and perhaps
`sound or animation. The physical appearance of the web
`page on the user's browser on client 14 is specified by layout
`information embedded in non-displayed tags, as is well-
`known for HTML (hyper-text markup language) documents.
`Often these HTML documents contain tags with URL's that
`specify other web pages, perhaps on other web servers
`15 which may be physically located in different cities or coun(cid:173)
`tries. These tags create hyper-links to these other web
`servers allowing the user to quickly jump to other servers.
`These hyper-links form a complex web of linked servers
`across the world; hence the name "world-wide web".
`The user may frequently retrieve files from remote docu-
`ment server 12. Often the same file is retrieved. The user
`may only be interested in differences in the file, or learning
`when the file is updated, such as when a new product or
`service is announced. The inventors have developed a soft-
`25 ware tool which automatically retrieves files and compares
`the retrieved files to an archived checksum of the file to
`determine if a change in the file has occurred. When a
`change is detected, the user is notified by an electronic mail
`message (e-mail). A copy of the new file rna y be attached to
`the e-mail notification, allowing the user to review the
`changes.
`Rather than archive the source files from remote docu(cid:173)
`ment server 12, the invention archives a checksum of CRC
`of the source files. These CRC's and the e-mail address of
`35 the user are stored in database 16 of change-detection server
`20. Comparison is made of the stored or archived CRC of the
`document and a fresh CRC of the currently-available docu(cid:173)
`ment. The CRC is a condensed signature or fingerprint of the
`document. Any change to the document changes the CRC.
`Aliasing of CRC's can be reduced to a very small probability
`by using sufficiently large CRC's, such as an 8-byte CRC.
`With an 8-byte CRC it is extremely improbable that a
`change to a document results in the same CRC being
`generated. If an identical CRC is generated, then the user is
`not notified of any change.
`Change-detection server 20 performs three basic func(cid:173)
`tions:
`1. Register (setup) a web page document for change
`detection.
`2. Periodically re-fetch the document and compare for
`changes
`3. E-mail a change notice to the registered user if a change
`is detected.
`Change-detection server 20 contains three basic compo-
`nents. Database 16 stores the archive of CRC's for registered
`web-page documents. The URL identifying the web page
`and the user's e-mail address are also stored with the
`archived CRC's. Responder 24 communicates with the user
`60 at client 14 to setup or register a web page document for
`change detection. Minder 22 periodically fetches registered
`documents from document server 12 through Internet 10.
`Minder 22 compares the archived CRC's in database 16 to
`new CRC's of the fetched documents to determine if a
`65 change has occurred. When a change is detected, minder 22
`sends a notice to the user at client 14 that the document has
`changed.
`
`45
`
`20
`
`FIG. 1 is a diagram of a change detection tool on a server
`on the Internet.
`FIG. 2 shows a user registering a web page document for
`change detection.
`FIG. 3 shows a periodic comparison of a registered web
`page document to determine if the document has changed.
`FIG. 4 shows a document-change notice being generated
`and sent to the user.
`FIG. 5 illustrates the operation of responder 24 of FIG. 1
`when the registered document is an arbitrary, unstructured
`file.
`FIG. 6 illustrates operation of minder 22 of FIG. 1 when 30
`the registered document has an arbitrary, unstructured for-
`mat.
`FIG. 7 is a diagram of an HTML document and a table of
`checksums for the HTML-delineated sections.
`FIG. 8 illustrates the operation of responder 24 of FIG. 1
`when the registered document is an HTML file.
`FIG. 9 illustrates the operation of minder 22 of FIG. 1
`when an HTML document is checked for recent changes.
`FIG. 10 is a diagram illustrating an alternate embodiment 40
`which archives separate checksums for HTML-defined sec(cid:173)
`tions and checksums for user-defined sections.
`
`DETAILED DESCRIPTION
`
`The present invention relates to an improvement in
`Internet-document change-detection tools. The following
`description is presented to enable one of ordinary skill in the
`art to make and use the invention as provided in the context
`of a particular application and its requirements. Various
`modifications to the preferred embodiment will be apparent 50
`to those with skill in the art, and the general principles
`defined herein may be applied to other embodiments.
`Therefore, the present invention is not intended to be limited
`to the particular embodiments shown and described, but is to
`be accorded the widest scope consistent with the principles 55
`and novel features herein disclosed.
`
`SAP Exhibit 1005, Page 14 of 21
`
`
`
`5,898,836
`
`7
`OVERVIEW OF OPERATION-FIGS. 2, 3,4
`
`FIG. 2 shows a user registering a web page document for
`change detection. The user on client 14 registers a web page
`document by specifying the URL which identifies the web
`page. A portion of the URL is translated into an IP address
`of a server by a domain-name server. The user also sends his
`e-mail address to responder 24. Responder 24 fetches the
`web page and displays the page to the user. The user then
`selects which portions of the web page document are to be
`compared for changes. The user can select paragraphs of text
`by dragging a highlight across the text. Responder 24 then
`stores the location of the selected text and generates one or
`more CRC for the selected text. Responder 24 then stores the
`CRC(s), URL, and e-mail address in database 16. A confir(cid:173)
`mation that the web page document has been registered is
`finally sent to the user on client 14.
`FIG. 3 shows a periodic comparison of a registered web
`page document to determine if the document has changed.
`Each registered document is compared for changes on a 20
`periodic basis which depends on the number of registered
`documents and the speed of operation of change-detection
`server 20. Typically each document is compared every few
`days, although more frequent comparisons are possible.
`Minder 22 reads the URL of the registered document from 25
`database 16. Minder 22 automatically fetches from docu(cid:173)
`ment server 12 a fresh copy of the web-page document
`pointed to by the URL. Client 14 is not involved in this
`transaction. Occasionally the URL is deleted or does not
`respond, and a change is then signaled indicating that the 30
`URL could not be fetched. Change-detection server 20 may
`try to fetch the document again after several hours so that
`temporary shutdowns do not generate spurious change
`notices.
`Once a fresh copy of the registered document has been 35
`fetched from the Internet, one or more CRC's of the fresh
`document are generated. These CRC's are compared to
`archived CRC's stored in database 16. Amis-compare of one
`or more CRC's indicates that the document changed.
`FIG. 4 shows a document-change notice being generated
`and sent to the user. When a change has been detected by
`minder 22, a change notice is sent by e-mail to the registered
`user at client 14. The user's e-mail address is fetched from
`database 16 by minder 22. The new CRC's generated from
`the fresh copy of the registered document are written to
`database 16 so that future comparisons reflect the recent
`changes.
`When the change that was detected is in a portion of the
`document not selected by the user when registering the
`document, a change notice is not sent. Thus changes to
`non-selected portions of a registered document do not gen(cid:173)
`erate change notices. This allows the user to filter out
`irrelevant changes, such as date changes or access counters
`which are frequently updated.
`
`45
`
`8
`file. The user initiates registration of a document by provid(cid:173)
`ing the URL identifying the document and the user's e-mail
`address. These can be provided by typing or pasting them
`into fields on a registration web page at change-detection
`server 20.
`Change-detection server 20 uses the URL to fetch a copy
`of source document 30 from document server 12 of FIG. 1.
`Source document 30 could be any one of millions of
`documents on the thousands of web servers connected to the
`10 Internet. Source document 30 is displayed to the user,
`allowing the user to select portions of source document 30
`for registration. The user can select portions of source
`document 30 by dragging a highlight with a mouse over the
`text to be selected. Alternately, the user can select whole
`15 paragraphs by triple-clicking anywhere inside these
`sections, or a single word or numeric value by double(cid:173)
`clicking on the word. Changes which occur in unselected
`portions of source document 30 do not generate change
`notifications.
`The selection information from the user is encoded as a
`string of length LEN1, with a starting location START.
`Parser 32 reads characters from source document 30 one at
`a time until the first character in the string at the starting
`location START is found. START can simply be an offset in
`bytes or in characters from the beginning of the file to the
`beginning of the user's selection. Characters following
`START are sent from parser 32 to CRC generator 34 until
`the number of characters indicated by LEN1 is reached,
`indicating that the end of the selecti