`Freivald et al.
`
`[19]
`
`[54] CHANGE-DETECTION TOOL INDICATING
`DEGREE AND LOCATION OF CHANGE OF
`INTERNET DOCUMENTS BY COMPARISON
`OF CYCLIC-REDUNDANCY-CHECK(CRC)
`SIGNATURES
`
`[75]
`
`Inventors: Matthew P. Freivald, Sunnyvale;
`Mark S. Richards, San Jose; Alan C.
`Noble, Santa Cruz, all of Calif.
`
`[73] Assignee: Netmind Services, Inc., Campbell,
`Calif.
`
`[21]
`
`Appl. No.: 08/783,625
`
`[22]
`
`[51]
`[52]
`[58]
`
`[56]
`
`Filed:
`
`Jan. 14, 1997
`
`Int. CI.6
`U.S. CI.
`Field of Search
`
`H04L 12/00
`395/200.48; 707/513
`395/200.48, 400.49;
`707/10, 203, 511, 513
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,388,255
`5,630,116
`5,813,007
`
`2/1995 Pytlik et al.
`5/1997 Takaya et al.
`9/1998 Nielsen
`
`395/600
`395/617
`707/10
`
`Primary Examiner~ance Leonard Barry
`Attorney, Agent, or Firm---8tuart T. Auvinen
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`.,
`
`12
`
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`MINDER h
`
`22
`
`URL, SECTIONS
`E-MAILADDR
`
`14
`
`USER
`CLIENT
`
`(WWW BROWSER)
`' - - - - - - - - - - ' CONFIRMED 20
`
`·-----r----------------
`
`24
`
`USER SETUP
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US005898836A
`Patent Number:
`Date of Patent:
`
`[11]
`
`[45]
`
`5,898,836
`Apr. 27, 1999
`
`[57]
`
`ABSTRACT
`
`A change-detection web server automatically checks web(cid:173)
`page documents for recent changes. The server retrieves and
`compares documents one or more times a week. The user is
`notified by electronic mail when a change is detected. The
`user registers a web-page document by submitting his e-mail
`address and the uniform-resource locator (URL) of the
`desired document. The document is fetched and the user can
`select text on the page of interest. Non-selected text
`is
`ignored; only changes in the selected text are reported back
`to the user. Thus changes to less relevant parts of the
`document are ignored. The document is divided into sections
`bounded by hyper-text markup-language (HTML) tags. A
`checksum is generated and stored for each HTML-bound
`section. Storage requirements are reduced since only check(cid:173)
`sums are stored rather than the original documents. During
`periodic comparisons a fresh copy of the document
`is
`retrieved, divided into HTML-bound sections and check(cid:173)
`sums generated for each section. The freshly-generated
`checksums are compared to the archived checksums. Sec(cid:173)
`tions with non-matching checksums are highlighted as
`changed, and the percentage of changed sections is reported.
`The user-defined selection is also stored as a checksum and
`compared to a freshly-generated checksum. Changed check(cid:173)
`sums outside the user-defined selection do not generate a
`change notification. Re-ordering of sections does not gen(cid:173)
`erate a change notification when the checksums otherwise
`match. Thus format and layout changes do not generate
`change notifications, and the frequency of notices to user is
`reduced.
`
`19 Claims, 10 Drawing Sheets
`
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`URL
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`12
`
`14
`
`)
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`WEEKLY COMPARE
`
`IRESPONDER l
`.-----r----------------
`
`24
`
`20
`
`Oracle Exhibit 1005, pg 1
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 1 of 10
`
`5,898,836
`
`r-----------------------.
`CHANGE-DETECTION:
`TOOL WEB SERVER :
`
`IIIII
`
`MINDER
`
`DATABASE
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`RESPONDER
`
`24
`
`-----~----------------
`20
`
`FIG. 1
`
`Oracle Exhibit 1005, pg 2
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 2 of 10
`
`5,898,836
`
`r-----------------------
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`I MINDER h
`
`22
`
`DATABASE
`
`RESPONDER
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`~
`12
`
`14
`
`URL, SECTIONS
`E-MAILADDR
`
`USER
`CLIENT
`
`24
`-----")----------------
`(WWW BROWSER)
`----' CONFI RMED 20
`
`10...-
`
`USER SETUP
`
`FIG.2
`
`Oracle Exhibit 1005, pg 3
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 3 of 10
`
`5,898,836
`
`r-----------------------
`: CHANGE-DETECTION
`: TOOL WEB SERVER
`
`III
`
`URL
`
`MINDER
`
`OLD CRC'S
`
`DATABASE
`
`SOURCE
`DOCUMENT
`SERVER
`(WWW SERVER)
`
`12
`
`DOCUMENT
`
`14
`
`~
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`IRESPONDER L
`
`24
`._----~----------------
`20
`
`WEEKLY COMPARE FIG. 3
`
`Oracle Exhibit 1005, pg 4
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 4 of 10
`
`5,898,836
`
`SOURCE
`DOCUMENT
`SERVER
`
`(WWW SERVER)
`
`NOTICE
`
`12
`
`14
`
`USER
`CLIENT
`
`(WWW BROWSER)
`
`r-----------------------
`CHANGE-DETECTION
`TOOL WEB SERVER
`
`MINDER
`
`22
`
`IRESPONDER L
`
`24
`-----~----------------
`20
`
`REPORT CHANGE FIG.4
`
`Oracle Exhibit 1005, pg 5
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 5 of 10
`
`5,898,836
`
`CRC
`SELECTION
`
`CRC1
`
`SOURCE
`DOC
`
`......-.-;.
`
`···....
`
`....... START,
`......
`LEN1
`
`.······················
`
`.........:.
`
`~~
`
`URL
`
`E-MAILADDR
`LEN1
`CRC1
`
`.
`
`LEN2
`
`CRC2
`
`40
`
`RESPONDER
`
`FIG. 5
`
`Oracle Exhibit 1005, pg 6
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 6 of 10
`
`5,898,836
`
`NEXT CHAR
`
`CRC
`SELECTION
`
`N
`
`y
`
`NO MORE
`CHAR'S·,
`STRING
`NOT FOUND
`
`FOUND
`EXACT
`STRING
`(NO CHANGE)
`
`CHANGE
`NOTICE
`
`NEW
`SOURCE
`DOC
`..
`..... URL
`
`........
`
`URL
`
`E-MAILADDR
`
`LEN1
`
`LEN2
`
`CRC1
`
`CRC2
`
`40
`
`MINDER
`
`FIG.6
`
`Oracle Exhibit 1005, pg 7
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 7 of 10
`
`5,898,836
`
`CRC1
`
`CRC2
`
`CRC3
`
`CRC4
`
`•••
`
`1
`
`2
`
`3
`
`4
`
`<TAG1>
`TEXT IN SECTION 1...
`</TAG1>
`<TAG2>
`TEXT IN SECTION 2...
`</TAG2>
`<TAG3>
`TEXT IN SECTION 3...
`</TAG3>
`<TAG4>
`TEXT IN SECTION 4...
`</TAG4>
`
`•••
`
`FIG.7
`
`Oracle Exhibit 1005, pg 8
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 8 of 10
`
`5,898,836
`
`SOURCE
`DOC
`
`.....----.
`
`PARSER
`DIVIDER
`
`CRC
`SECTION
`
`SELECT
`SECTIONS
`..........
`
`···············
`···········
`
`RESPONDER
`
`~-""""----, EN/DIS
`SECTION
`
`CRC1
`
`URL
`
`E-MAILADDR
`CRC1
`1
`
`0
`
`2
`3
`
`4
`
`1
`1
`
`0
`
`52
`
`CRC2
`CRC3
`
`CRC4
`
`40'
`
`FIG. 8
`
`Oracle Exhibit 1005, pg 9
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 9 of 10
`
`5,898,836
`
`NEW
`SOURCE
`DOC
`··
`· URL
`·····..
`····
`·
`··
`··.
`
`URL
`
`PARSER
`DIVIDER
`
`76
`
`y
`
`FOUND
`EXACT
`SECTION
`(NO CHANGE)
`
`NEWCRC'S
`
`80
`
`1
`
`2
`
`3
`
`4
`
`NCRC1
`
`NCRC2
`NCRC3
`
`NCRC4
`
`11
`
`00
`01
`
`00
`
`82
`
`FIG.9
`
`E-MAILADDR
`a
`1
`
`CRC1
`
`CRC2
`
`1
`
`2
`
`3
`
`4
`
`1
`a
`
`52
`
`CRC3
`
`CRC4
`
`40'
`
`MINDER
`
`Oracle Exhibit 1005, pg 10
`
`
`
`u.s. Patent
`
`Apr. 27, 1999
`
`Sheet 10 of 10
`
`5,898,836
`
`FIG. 10
`
`USER
`SELECTION
`
`<TAG1>
`TEXT IN SECTION 1...
`</TAG1>
`<TAG2>
`TEXT IN SECTION 2...
`MORE TEXT IN SECTION 2...
`</TAG2>
`<TAG3>
`TEXT IN SECTION 3...
`
`MORE TEXT IN SECTION 3...
`</TAG3>
`
`SECTION #
`
`CRC
`
`1
`
`2
`
`3
`
`4
`
`CRC1
`
`CRC2
`
`CRC3
`
`CRC4
`
`ENAIDIS
`SECTION ft
`0
`
`90
`
`1
`
`1
`
`0
`
`92
`
`)
`
`STARTING
`SECTION #
`
`LENGTH
`
`CRC
`
`2
`
`LEN1
`
`CRC1A
`
`USER
`SELECTION
`
`# 1
`
`Oracle Exhibit 1005, pg 11
`
`
`
`5,898,836
`
`1
`CHANGE-DETECTION TOOL INDICATING
`DEGREE AND LOCATION OF CHANGE OF
`INTERNET DOCUMENTS BY COMPARISON
`OF CYCLIC-REDUNDANCY-CHECK(CRC)
`SIGNATURES
`
`5
`
`2
`These automated software tools are sometimes known as
`"netbots", a network robot which automatically performs
`some task for a user. Netbots allow users to better manage
`the information on the Internet and reduce the amount of
`information that a user must read. Filtering down the amount
`of information is critical to making good use of the over(cid:173)
`whelming amount of information available on the Internet.
`More recent change-detection tools allow users to register
`a document or web page on the Internet and be notified when
`10 any change to that document occurs. The user "registers" a
`document by specifying the URL of the document, and
`providing the user's e-mail address. The change-detection
`tool stores a local copy of the document together with the
`user's e-mail address. Once every day or week the change-
`15 detection tool accesses the source document at the specified
`URL, and compares the retrieved source document to the
`local copy of the document. If a difference between the older
`local copy and the just-retrieved source document
`is
`detected, then a message is sent to the user's e-mail address,
`20 perhaps with a copy of the new document or a copy of the
`changes.
`The document-change tool could store an actual copy of
`the entire document at the tool's web site for comparison.
`However,
`storing the whole document at
`the
`25 documentchange-tool's web site is expensive because large
`amounts of storage are needed. For example, if 500,000
`documents were registered, and each document averages 50
`Kbytes, then 25 GigaBytes of storage are needed to store
`copies of the registered documents.
`Instead of storing the entire document, the revision date or
`time-stamp of the document could be stored. U.S. Pat. No.
`5,388,255 shows a database which compares time stamps to
`determine when data has changed. Since the time-stamp is
`35 much smaller than the entire document, storage space is
`reduced at the tool's web site.
`The inventors have a change-detection tool which stores
`a checksum or CRC of the document rather than the time(cid:173)
`stamp or the entire document. When the document is ini-
`40 tially registered, a checksum is generated for the entire
`source document. This checksum is stored at the tool's web
`site. Each week when the source document is retrieved,
`another checksum is generated and compared to the stored
`checksum. If the stored checksum matches the newly-
`45 generated checksum, then no change is detected. When the
`checksums do not match, then the user is notified of a change
`bye-mail. The user can optionally have a copy of the new
`document attached to the e-mail notification.
`Such a change-detection tool called a "URL-minder" has
`50 been available for free public use at the inventor's web site,
`www.netmind.com. for more than a year before the filing
`date of the present application. Over 150,000 documents or
`URL's are registered at that site for 1.4 million users.
`
`30
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to software retrieval
`tools for
`networks, and more particularly for a change-detection and
`highlighting tool for the Internet.
`2. Description of the Related Art
`Today's society is sometimes referred to as an informa(cid:173)
`tion society. Technology has increased the ease of generating
`and disseminating information. The widespread acceptance
`of the global network known as the Internet allows huge
`amounts of information to be instantly transmitted to per(cid:173)
`sons around the world.
`Explosive growth is occurring in the part of the Internet
`known as the World-Wide Web, or simply the "web". The
`web is a collection of millions of files or "web pages" of text,
`graphics, and other media which are connected by hyper(cid:173)
`links to other web pages. These may physically reside on a
`computer system anywhere on the Internet---{)n a computer
`in the next room or on the other side of the world.
`These hyper-links often appear in the browser as a graphi(cid:173)
`cal icon or as colored, underlined text. A hyper-link contains
`a link to another web page. Using a mouse to click on the
`hyper-link initiates a process which locates and retrieves the
`linked web page, regardless of the physical location of that
`page. Hovering a mouse over a hyperlink or clicking on the
`link often displays in a corner of the browser a locator for the
`linked web page. This locator is known as a Universal
`Resource Locator, or URL.
`The vast amount of information available on the Internet
`has created an overload of information which the casual user
`cannot digest. Internet search tools or search engines allow
`users to find desired information by searching for keywords
`through an index of the millions of documents posted on the
`Internet. Search engines such as Excite of Mountain View,
`Calif. and Digital Equipment's "ALTAVISTA" help users
`quickly sift through huge amounts of information to find the
`desired information.
`A characteristic of the Internet is that it is relatively easy
`to change or update information. The user may wish to know
`when updates are made to the desired information he found
`with a search. For example,
`the information found may
`describe a bug fix or other revision in a software program.
`Initially a crude work-around or even just a notice of the bug
`may be posted on the Internet. Later, this posting may be
`updated with a more robust fix or other useful information.
`The information could also be a list of phone numbers or
`other contact information, or it could be a product list or a
`competitor's web site, advertising, or press releases.
`The user could frequently re-access the information on the
`Internet to see if changes have occurred, but this is time(cid:173)
`consuming. Frequently re-accessing the information is
`tedious, particularly when the information is contained in a
`long document, or when many documents must be checked 60
`for changes.
`Software tools have been developed to automate the task
`of detecting updates to information on the Internet. Early
`tools such as America Online's News Profiles allow users to
`specify keywords which are periodically searched for in a 65
`news database. News articles containing the specified key(cid:173)
`words are sent to the user by electronic mail (email).
`
`55
`
`MINOR CHANGES NOT FILTERED OUT
`While such a change-detection tool is useful, the existing
`tool has several drawbacks. Since minor changes are fre(cid:173)
`quently made to Internet documents, users are notified of
`many insignificant changes. The users can quickly become
`irritated with frequent e-mail notices of the minor, irrelevant
`changes. Statistics taken for the URL-minder tool in May,
`1996, showed that over 100,000 change notices were
`e-mailed in just four days to the 500,000 registered users.
`Internet documents change every few weeks on the average.
`Thus a user with a few dozen registered documents receives
`notices almost every day. This is an undesirably high fre(cid:173)
`quency of notices for many users.
`
`Oracle Exhibit 1005, pg 12
`
`
`
`5,898,836
`
`5
`
`3
`LOCATION OF CHANGE DESIRABLE
`
`than a
`is stored rather
`When the entire document
`checksum, the location of the change in the document can be
`found and highlighted to the user since the original docu(cid:173)
`ment is available for comparison. However, when a single
`checksum is stored for each registered document,
`the
`changes within that document cannot be determined or
`identified. Thus the user is left to determine the location of
`the change within the document, and the relevance of that
`change.
`With the existing URL-minder which stores only
`checksums, when a change is detected, the user is simply
`notified that there was a change. The user can optionally
`receive a copy of the changed document, but the changes are
`not highlighted. Thus the user must
`re-read the entire
`document
`to determine what
`the change was. Often the
`changes are minor and even hard to detect, such as a spelling
`change of a word, or a date change. Sometimes the order or
`arrangement of text has changed but not the content. These
`minor changes are not always significant to the user.
`Thus the user is plagued with frequent notices of minor
`changes, and the user must re-read the entire document to
`the change was. Having to re-read the
`determine what
`documents increases the burden on the user, which is the
`opposite intent of an automated tool or netbot.
`
`LONG, COMPLEX DOCUMENTS COMMON
`
`4
`remote client to register a document for change detection by
`receiving from the remote client a uniform-resource-locator
`(URL) identifying the document. The responder fetches the
`document from the remote document server and generates
`an original checksum for a checked portion of the document.
`The checked portion is less than the entire document.
`A database is coupled to the responder. It receives the
`URL and the original checksum from the responder when
`the document is registered by the remote client. The data-
`10 base stores a plurality of records each containing a URL and
`a checksum for a registered document. A periodic minder is
`coupled to the database and the network connection. It
`periodically re-fetches the document from the remote docu(cid:173)
`ment server by transmitting the URL from the database to
`15 the network connection. The periodic minder receives a
`fresh copy of the document from the remote document
`server. The periodic minder generates a fresh checksum of a
`portion of the fresh copy of the document and compares the
`fresh checksum to the original checksum. A detected change
`20 is signaled to the remote client when the fresh checksum
`does not match the original checksum.
`Thus a change in the document is detected by comparing
`the checked portion of the document.
`a checksum for
`Changes in portions of the document outside the checked
`25 portion are not signaled to the remote client.
`In further aspects the database does not store the docu(cid:173)
`ment. The database stores a checksum for the document.
`Thus storage requirements for the database are reduced by
`archiving checksums and not entire documents.
`In other aspects of the invention a selection means is
`coupled to the responder. It receives a selection from the
`remote client. The selection identifies boundaries of the
`checked portion of the document. A parsing means is
`35 coupled to the periodic minder. It parses the fresh copy and
`generates checksums for a plurality of portions of the fresh
`copy. A compare means is coupled to the parsing means. It
`signals a match when any of the checksums generated by the
`parsing means matches the original checksum from the
`40 database. Thus a change in the document is detected when
`the match is not signaled by the compare means. The parsing
`means generates a plurality of checksums for the plurality of
`portions of the fresh copy.
`In still further aspects of the invention a length field
`45 indicates a size of the checked portion. The length field is
`written by the selection means. The parsing means generates
`each checksum for portions having the size of the checked
`portion. Thus the size of the checked portion is stored and
`used by the parsing means.
`In further aspects the document is a hyper-text markup-
`language (HTML) document containing HTML tags. The
`HTML tags indicate formatting,
`layout, and hyper-links
`specifying URLs of other servers. The change-detection web
`server also has divider means coupled to the responder, for
`55 dividing the document into portions bound by the HTML
`tags. A checksum means generates original checksums. An
`original checksum is generated for each portion bound by
`HTML tags. The database stores the original checksums for
`the portions bound by the HTML tags. The periodic minder
`60 also has a second divider means which divides the fresh
`copy of the document into portions bound by the HTML
`tags. A second checksum means generates fresh checksums
`for portions of the fresh copy bound by HTML tags in the
`fresh copy of the document. A compare means receives the
`65 fresh checksums of the fresh copy from the second check(cid:173)
`sum means. It compares the fresh checksums to the original
`checksums from the database. A report means signals a
`
`30
`
`The change-detection tool allows a user to register a
`document by specifying the uniform-resource-locator
`(URL) of that document. A unique URL is specified for each
`web page on the Internet's world-wide-web. Other informa(cid:173)
`tion sometimes embedded in the URL includes passwords or
`search text
`that
`the user types in, or name and address
`information typed in. Internet documents are usually web
`pages containing several
`individual
`files such as for
`graphics,
`text, and motion video and sound. Sometimes
`these files include small programs such as CGI (common
`gateway interface) scripts. Thus the documents registered
`are fairly complex and often lengthy.
`Often the user is only interested in a small part of a
`document, rather than the whole document. A user might be
`interested only in one contact or phone number on a list of
`hundreds of phone numbers for an office, or only one
`product line in a long list of products. It is desirable to allow
`the user to specify only the portion of a document or web
`page which is of interest.
`What is desired is a storage-efficient change-detection
`tool which detects when changes occur to a registered 50
`document on the Internet. It is desired that minor changes to
`the document be filtered by the change-detection tool to
`reduce the number of change notifications sent to the user.
`It is also desired to give the user an indication of how
`significant the change is. It is desired to allow the user to
`identify relevant portions of a document so that the user is
`not notified of changes to other portions of the document. It
`is further desired to reduce storage requirements for the
`change-detection tool by storing a condensed checksum or
`signature of the registered document rather than storing the
`entire document.
`
`SUMMARY OF THE INVENTION
`
`A change-detection web server has a network connection
`for transmitting and receiving packets from a remote client
`and a remote document server. A responder is coupled to the
`network connection. The responder communicates with the
`
`Oracle Exhibit 1005, pg 13
`
`
`
`5,898,836
`
`5
`change in the document when an original checksum for the
`document has no matching fresh checksum. Thus check(cid:173)
`sums are generated and stored for portions of the document
`bound by the HTML tags.
`In further aspects the report means has a mailer means
`coupled to the network connection. It sends a change noti(cid:173)
`fication message to the remote client when the change is
`signaled. The responder receives an electronic-mail address
`from the remote client and stores the electronic-mail address
`of the remote client in the database. The mailer means reads 10
`the electronic-mail address from the database. The change
`to the remote client as an
`notification message is sent
`electronic-mail message addressed to the electronic-mail
`address. Thus the remote client is notified of the change by
`electronic mail.
`
`5
`
`6
`Client 14 retrieves web pages of files from document
`server 12 through Internet 10. These web pages are identi(cid:173)
`fied by a unique URL (uniform resource locator) which
`specifies a document file containing the text and graphics of
`a desired web page. Often additional files are retrieved when
`a document
`is retrieved. The "document" returned from
`document server 12 to client 14 is thus a composite docu(cid:173)
`ment composed of several files of text, graphics, and perhaps
`sound or animation. The physical appearance of the web
`page on the user's browser on client 14 is specified by layout
`information embedded in non-displayed tags, as is well-
`known for HTML (hyper-text markup language) documents.
`Often these HTML documents contain tags with URL's that
`specify other web pages, perhaps on other web servers
`15 which may be physically located in different cities or coun(cid:173)
`tries. These tags create hyper-links to these other web
`servers allowing the user to quickly jump to other servers.
`These hyper-links form a complex web of linked servers
`across the world; hence the name "world-wide web".
`The user may frequently retrieve files from remote docu(cid:173)
`ment server 12. Often the same file is retrieved. The user
`may only be interested in differences in the file, or learning
`when the file is updated, such as when a new product or
`service is announced. The inventors have developed a soft-
`25 ware tool which automatically retrieves files and compares
`the retrieved files to an archived checksum of the file to
`determine if a change in the file has occurred. When a
`change is detected, the user is notified by an electronic mail
`message (e-mail). A copy of the new file maybe attached to
`30 the e-mail notification, allowing the user to review the
`changes.
`Rather than archive the source files from remote docu(cid:173)
`ment server 12, the invention archives a checksum of CRC
`of the source files. These CRC's and the e-mail address of
`35 the user are stored in database 16 of change-detection server
`20. Comparison is made of the stored or archived CRC of the
`document and a fresh CRC of the currently-available docu(cid:173)
`ment. The CRC is a condensed signature or fingerprint of the
`document. Any change to the document changes the CRe.
`40 Aliasing of CRC's can be reduced to a very small probability
`by using sufficiently large CRC's, such as an 8-byte CRe.
`With an 8-byte CRC it
`is extremely improbable that a
`change to a document results in the same CRC being
`generated. If an identical CRC is generated, then the user is
`45 not notified of any change.
`Change-detection server 20 performs three basic func(cid:173)
`tions:
`1. Register (setup) a web page document for change
`detection.
`2. Periodically re-fetch the document and compare for
`changes
`3. E-mail a change notice to the registered user if a change
`is detected.
`Change-detection server 20 contains three basic compo-
`nents. Database 16 stores the archive of CRC's for registered
`web-page documents. The URL identifying the web page
`and the user's e-mail address are also stored with the
`archived CRC's. Responder 24 communicates with the user
`60 at client 14 to setup or register a web page document for
`change detection. Minder 22 periodically fetches registered
`documents from document server 12 through Internet 10.
`Minder 22 compares the archived CRC's in database 16 to
`new CRC's of the fetched documents to determine if a
`65 change has occurred. When a change is detected, minder 22
`sends a notice to the user at client 14 that the document has
`changed.
`
`20
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of a change detection tool on a server
`on the Internet.
`FIG. 2 shows a user registering a web page document for
`change detection.
`FIG. 3 shows a periodic comparison of a registered web
`page document to determine if the document has changed.
`FIG. 4 shows a document-change notice being generated
`and sent to the user.
`FIG. 5 illustrates the operation of responder 24 of FIG. 1
`when the registered document is an arbitrary, unstructured
`file.
`FIG. 6 illustrates operation of minder 22 of FIG. 1 when
`the registered document has an arbitrary, unstructured for(cid:173)
`mat.
`FIG. 7 is a diagram of an HTML document and a table of
`checksums for the HTML-delineated sections.
`FIG. 8 illustrates the operation of responder 24 of FIG. 1
`when the registered document is an HTML file.
`FIG. 9 illustrates the operation of minder 22 of FIG. 1
`when an HTML document is checked for recent changes.
`FIG. 10 is a diagram illustrating an alternate embodiment
`which archives separate checksums for HTML-defined sec(cid:173)
`tions and checksums for user-defined sections.
`
`DETAILED DESCRIPTION
`
`invention relates to an improvement
`The present
`in
`Internet-document change-detection tools. The following
`description is presented to enable one of ordinary skill in the
`art to make and use the invention as provided in the context
`of a particular application and its requirements. Various
`modifications to the preferred embodiment will be apparent 50
`to those with skill in the art, and the general principles
`defined herein may be applied to other embodiments.
`Therefore, the present invention is not intended to be limited
`to the particular embodiments shown and described, but is to
`be accorded the widest scope consistent with the principles 55
`and novel features herein disclosed.
`
`OVERVIEW OF CHANGE-DETECTION WEB
`SERVER
`FIG. 1 is a diagram of a change detection tool on a server
`on the Internet. The user operates client 14 from a remote
`site on Internet 10. The user typically is operating a browser
`application, such as Netscape's Navigator or Microsoft's
`Internet Explorer. Client 14 communicates through Internet
`10 by sending and receiving TCP/IP packets to establish
`connections with remote servers, typically using the hyper(cid:173)
`text transfer protocol (http) of the world-wide web.
`
`Oracle Exhibit 1005, pg 14
`
`
`
`7
`OVERVIEW OF OPERATION-FIGS. 2,3,4
`
`5,898,836
`
`5
`
`8
`file. The user initiates registration of a document by provid(cid:173)
`ing the URL identifying the document and the user's e-mail
`address. These can be provided by typing or pasting them
`into fields on a registration web page at change-detection
`server 20.
`Change-detection server 20 uses the URL to fetch a copy
`of source document 30 from document server 12 of FIG. 1.
`Source document 30 could be anyone of millions of
`documents on the thousands of web servers connected to the
`10 Internet. Source document 30 is displayed to the user,
`allowing the user to select portions of source document 30
`for registration. The user can select portions of source
`document 30 by dragging a highlight with a mouse over the
`text to be selected. Alternately, the user can select whole
`15 paragraphs by triple-clicking anywhere inside these
`sections, or a single word or numeric value by double(cid:173)
`clicking on the word. Changes which occur in unselected
`portions of source document 30 do not generate change
`notifications.
`The selection information from the user is encoded as a
`string of length LEN1, with a starting location START.
`Parser 32 reads characters from source document 30 one at
`a time until the first character in the string at the starting
`location START is found. START can simply be an offset in
`25 bytes or in characters from the beginning of the file to the
`beginning of the user's selection. Characters following
`START are sent from parser 32 to CRC generator 34 until
`the number of characters indicated by LENI is reached,
`indicating that the end of the selection has been reached.
`30 CRC generator 34 calculates the cyclic-redundancy-check
`(CRe) of these characters selected by the user from source
`document 30. Methods of generating CRC's and other
`checksums are well-known in the art and any of several
`methods can be used.
`The CRC is typically generated by exclusive-ORing bits
`from a current character with a running checksum to gen(cid:173)
`erate a new checksum, which is then exclusive-ORed with
`bits from the next character. The final value of the running
`checksum, CRC1, is written to record 40 in database 16 of
`40 FIG. 1. The URL and the e-mail address from the user are
`also written to record 40. The length of the selection, LEN1,
`is also written to record 40, but the starting location is not.
`The starting location can change when changes are made to
`the web page document in the non-selected region before the
`selection, such as in a document header. Thus the starting
`location can change even when the selection has not
`changed, and changes in the header should be ignored.
`The user may make several selections on the same source
`50 document 30, and each selection has it length and CRC
`stored in record 40. For example, the second user-selection
`stores LEN2 and CRC2 in record 40.
`FIG. 6 illustrates operation of minder 22 of FIG. 1 when
`the registered document has an arbitrary, unstructured for-
`55 mat. The minder performs change-detection on each of the
`thousands of documents having their URL's registered.
`Checking is preferably performed once for all users regis(cid:173)
`tering the same URL since this saves re-fetching documents
`for different users.
`The minder begins by reading record 40 from database 16
`of FIG. 1. The URLin record 40 is used to access the remote
`document server on the Internet and retrieve a fresh docu(cid:173)
`ment copy 30' of source document 30 which was registered
`as described for FIG. 5. Fresh document copy 30' is parsed
`65 by parser 42 and each successive character of document
`copy 30' is sent to CRC generator 44 until the stored length
`LENI is reached. Anew CRC for this string from document
`
`45
`
`FIG. 2 shows a user registering a web page document for
`change detection. The user on client 14 registers a web page
`document by specifying the URL which identifies the web
`page. A portion of the URL is translated into an IP address
`of a server by a domain-name server. The user also sends his
`e-mail address to responder 24. Responder 24 fetches the
`web page and displays the page to the user. The user then
`selects which portions of the web page document are to be
`compared for changes. The user can select paragraphs of text
`by dragging a highlight across the text. Responder 24 then
`stores the location of the selected text and generates one or
`more CRC for the selected text. Responder 24 then stores the
`CRC(s), URL, and e-mail address in database 16. A confir(cid:173)
`mation that the web page document has been registered is
`finally sent to the user on client 14.
`FIG. 3 shows a periodic comparison of a registered web
`page document to determine if the document has changed.
`Each registered document is compared for changes on a 20
`periodic basis which depends on the number of registered
`documents and the speed of operation of change-detection
`server 20. Typically each document is compared every few
`days, although more frequent comparisons are possible.
`Minder 22 reads the URL of the registered document from
`database 16. Minder 22 automatically fetches from docu(cid:173)
`ment server 12 a fresh copy of the web-page document
`pointed to by the URL. Client 14 is not involved in this
`transaction. Occasionally the URL is deleted or does not
`respond, and a change is then signaled indicating that the
`URL co