`[11] Patent Number:
`[19]
`Unlted States Patent
`
`Freivald et al.
`[45] Date of Patent:
`Jan. 4, 2000
`
`U5006012087A
`
`[54] UNIQUE-CHANGE DETECTION OF
`DYNAMIC WEB PAGES USING HISTORY
`
`5,898,836
`5,923,880
`
`4/1999 Freivald .................................. 709/218
`7/1999 Rose et a1.
`.............................. 395/705
`
`[75]
`
`TABLES 0F SIGNATURES
`Inventors: Matthew P. Freivald, Sunnyvale; Alan
`C. Noble, Santa Cruz, both of Calif.
`_
`.
`[73] Ass1gnee: NetMlnd Technologies, Inc., Cambell,
`Calif.
`
`[21] Appl. No.: 09/081,991
`.
`F1led:
`
`May 20, 1998
`
`[22]
`
`[
`
`63
`
`I
`
`_
`_
`Related U-S- Appllcatlon Data
`.
`.
`.
`.
`.
`C
`15911;111:3111:népgggogggppheat“)nNO'08/78376257Jan' 14:
`’
`7'
`'
`’
`’
`'
`Int. Cl.
`..................................................... H04L 12/00
`[51]
`[52] US. Cl.
`........................... 709/218; 709/229; 709/201
`[58] Field of Search ..................................... 709/218, 229,
`709/224, 226, 201; 395/705; 380/21, 25,
`29, 4; 705/8
`
`[56]
`
`References Cited
`
`U'S' PATENT DOCUMENTS
`5,109,486
`4/1992 Seymour ................................. 709/224
`
`5,204,897
`4/1993 Wyman ............. 380/4
`5,249,261
`9/1993 Nataraj an .................................. 706/46
`
`3,332,232 11/199: Wyman ....................................... 380/4
`,
`,
`8/199 Wyman .
`705/8
`5,574,906
`11/1996 Moms
`707/1
`
`709/101
`5 596 750
`“1997 Li et al
`
`
`‘
`5,666,502
`9/1997 Capps ‘
`345/352
`
`9/1997 Wolff 6:211.
`5:671:282
`380/25
`
`.........
`5,715,453
`2/1998 Stewart
`707/104
`........................... 709/229
`5,835,726
`11/1998 Shwed et a1.
`
`~
`~
`P ~
`rzmary Exammer—Zarm Maung
`fisszstam Examiner
`.
`anh Quang D1nh
`tromey,
`gent, 0r Firm—Stuart T. Auv1nen
`[57]
`ABSTRACT
`
`An improved change-detection tool detects only relevant
`changes within Internet web pages on the world-wide-web.
`Changes back to an earlier version of a web page are not
`relevant and do not cause the user to be notified. Only
`changes to a new, unique version of the web page generate
`a user notification. After the user finishes registering the web
`page by specifying the URL and the user’s e-mail address,
`the change-detection tool periodically retrieves the web-
`page at the specified URL and generates a checksum or
`signature to determine when to send a notification to the
`user. Signatures from several older versions of the web page
`are stored in a history table. When a new signature for a
`re-fetched page matches the most-recent signature at the top
`of the stack 1n the h1story table, no change has occurred.
`When the new signature matched any of the older signatures
`in the history table, the detected change is not unique and
`notification is not made even though a change has occurred.
`When the new signature matches one of the older, not-most-
`recent signatures in the history table, the signature is moved
`into a Permanent history table- Signatures in the Permanent
`history table are for recurring versions of the web page and
`are likely to appear again. Error pages displayed when a web
`server is down for routine maintenance can be screened out
`’
`[h
`h’ t
`t b1 . Th
`’
`’
`’
`us1ng
`e
`15 ory a
`e
`e .frequency of not1ficat1ons 1s
`tracked. When too many not1ficat1ons are be1ng sent for a
`web page,
`the last-modified header is used rather than
`signature-matching to reduce the frequency of notifications.
`
`20 Claims, 14 Drawing Sheets
`
`PERIODIC
`MINDER
`
`
`
`READ URL FROM DB
`
`FETCH DOC AT URL
`
`
`
`GENERATE NEW
`SIGNATURE FOR DOC
`
`60
`
`62
`
`
`
`Clouding Exhibit 2001 , pg. 1
`
`
`
`
`
`64
`
`
`READ ALL SIG'S FROM HISTORY TABLE IN DB
`
`66
`
`65
`
`DO
`
`ANY SIG‘S IN
`HISTORY TABLE
`MATCH
`?
`
`YES
`
`NO CHANGE;
`NEXT URL
`
`69
`
`
`
`NO
`
`READ LAST_MOD FROM DB
`
`
`
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 1 0f 14
`
`6,012,087
`
`37 C.F.R. RULES
`
`37 C.F.R. 1:8
`
`APPLICANT SHALL
`
`37 C.F.R. 1.62
`
`A CONTINUATION
`
`AN EXTENSION OF....
`
`37 C.F.R. 1.136
`
`DOC SIGNATURE = 5A7
`
`FIG. 1
`
`37 C.F.R. RULES
`
`37 C.F.R. 1.8
`
`APPLICANT SHALL
`
`{MODIFIED RULE}
`
`37 C.F.R. 1.62
`
`DELETED RULE
`
`37 C.F.R. 1.136
`
`AN EXTENSION OF....
`
`DOC SIGNATURE = D6F
`CHANGE DETECTED
`
`FIG. 2
`
`Clouding Exhibit 2001, pg. 2
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 2 0f 14
`
`6,012,087
`
`(ERROR PAGE)
`
`SERVER IS TEMPORARILY
`
`UNAVAILABLE
`
`FOR ROUTINE MAINTENANCE.
`
`SORRY FOR THE INCONVENIENCE.
`
`DOC SIGNATURE = E89
`
`FIG. 3
`
`CHANGE DETECTED
`
`IS NOT RELEVENT
`
`<HTML>
`
`<CONTENT_LEN = 37,428>
`
`<LAST_MOD|FIED = 3.15.98 13:42>
`
`<END__HTML>
`
`:
`HWP 631/4+3
`INTC 623/4 - 12 1/2§
`
`FIG. 4
`
`Clouding Exhibit 2001, pg. 3
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 3 0f 14
`
`6,012,087
`
`SOURCE
`
`DOCUMENT
`
`
`
`
`(WWW SERVER)
`
`------------------—-—----------‘
`
`
`
`
`SERVER
`
`CLIENT
`
`
` USER
`
`
`
`(WWW BROWSER)
`
`
`
`
`CHANGE—DETECTION
`
`TOOL WEB SERVER
`
`
`
`
`
` N N
`
`.3 CD
`
`NA
`
`MINDER
`
`DATABASE
`
`RESPONDER
`
`Vi
`20
`
`i---------u---------------------
`
`Clouding Exhibit 2001, pg. 4
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 4 0f 14
`
`6,012,087
`
`32
`
`34
`
`URL (WWW ADDR)
`
`LAST—MOD
`
` E-MAIL ADDR
`
`SIGNATURE
`
`HISTORY
`
`
`
`36
`
`38
`
`FIG. 6
`
`TABLE
`
`40
`
`
`
`
`
`
`Clouding Exhibit 2001, pg. 5
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 5 0f 14
`
`6,012,087
`
`NEW
`
`SIG:
`
`
`NOTIFICATION
`
`
`D6F
`NO CHANGE
`
`,-.""
`
`
`
`SIG=
`
`EBQ
`
`NO CHANGE
`
`NOTIFICATION
`
`FIG. 7D
`
`
`
`Clouding Exhibit 2001, pg. 6
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 6 0f 14
`
`6,012,087
`
`PERIODIC
`
`MINDER
`
`READ URL FROM DB
`
`60
`
`FETCH DOC AT URL
`
`62
`
`
`
`GENERATE NEW
`SIGNATURE FOR DOC
`
`64
`
`READ ALL SIG'S FROM HISTORY TABLE IN DB
`
`66
`
`68
`
`YES
`
`
`DO
`
`
`
`ANY SIG'S IN
`
`
`HISTORY TABLE
`
`?
`
`NO
`
`MATCH
`
`.
`
`”SEiflAtTSLE'
`
`67
`
`69
`
`READ LAST_MOD FROM DB
`
`FIG. 8A
`
`Clouding Exhibit 2001, pg. 7
`
`
`
`7O
`
`DOES
`DOC HAVE
`
`LAST_MOD
`HEADER
`?
`
`NOTIFY
`
`80
`
`
`
`
`
`IS
`
`LAST_MOD
`FROM DOC SAME
`AS IN DB
`7
`
`72
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 7 0f 14
`
`6,012,087
`
`YES
`
`
`
`
`
`
`
`YES
`
`FETCH DOC AT URL AGAIN
`
`RE—GENERATE SIG FOR DOC
`RE-FECTHED
`
`78
`
`ANY
`
`
`SIG'S FROM
`
`
`HISTORY TABLE
`MATCH
`
`?
`
`
`
`YES
`
`FALSE DETECT;
`IGNORE
`
` FIG. 8B
`
`79
`
`Clouding Exhibit 2001, pg. 8
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 8 0f 14
`
`6,012,087
`
`
`
`NOTIFY
`
`ADD NEW SIG TO
`
` 82
`
`HISTORY TABLE
`
` 84
`
`READ E-MAIL ADDR
`
`FROM DB
`
`SEND NOTIFICATION
`
`
`
`
`86
`
`
`MESSAGE TO EMAIL
`
`
`ADDR
`
`
`
`
`
`FIG. 9
`
`Clouding Exhibit 2001, pg. 9
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 9 0f 14
`
`6,012,087
`
`TEMP PERM
`
`HIST
`
`HIST
`
`TABLE TABLE
`
`-- FIG. 10
`--
`.-
`
`50
`
`52
`
`TEMP PERM
`
`HIST /\ TEMP PERM
`HIST
`TABLE TABLE
`NEW
`HIST
`HIST
`SIG=
`TABLE TABLE
`
`
`
`E89
`
`
`
`--
`.....
`
`
`NOCHANGE --
`NOTIFICATION .-
`
`
`
`FIG. 11
`
`50'
`
`52'
`
`Clouding Exhibit 2001, pg. 10
`
`
`
`US. Patent
`
`Jan. 4, 2000
`
`Sheet 10 0f 14
`
`6,012,087
`
`DUAL-TABLE
`
`ADD-ON
`
`
`
`YES
`
`SIG
`
`
`
`
`MACTHES IN
`PERMANENT
`
`HISTORY TABLE
`7
`
`130
`
`132
`
`
`
`SIG
`
`MATCHES
`
`
`
`MOST-RECENT IN
`CONTINUE
`
`TEMP TABLE
`
`
`
`’?
`
`134
`
`REMOVE MATCHING SIG FROM
`
`
`
`
`TEMP TABLE
`
`136
`
`WRITE NEW SIG TO PERM
`
`TABLE
`
`FIG. 12
`
`Clouding Exhibit 2001, pg. 11
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 11 0f 14
`
`6,012,087
`
`E-MAILADDR
`
`URL (WWWADDR)
`
`LASLMOD
`
`38
`
`
` 36
`
`
`
`
`
`
`
`SIGNATURE
`
`HISTORY
`
`TABLE
`
`
`# DETECTS
`
`IGNORE SIG
`
`PERM SIGS:
`EBQ
`
`
`52
`
`32
`
`34
`
`54
`
`56
`
`FIG. 13
`
`Clouding Exhibit 2001, pg. 12
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 12 0f 14
`
`6,012,087
`
`FREQUENCY
`
`CHECK
`
`
`
`
`
`90
`
`
`
`READ # DETECT FROM DB
`
`
`
`FIG. 14
`
` 92
`
`IS
`# DETECTS >
`
`THRESHOLD
`
`?
`
`
`
`91
`
`
`
`DOES
`DOC HAVE
`
`
`LAST_MOD
`?
`
`94
`
`SET |GNORE_SIG FLAG IN
`DB FOR URL
`
`
`
`
`
`CLEAR #DETECTS IN DB
`
`96
`
`NEXT URL
`
`RECORD
`
`Clouding Exhibit 2001, pg. 13
`
`
`
`US. Patent
`
`Jan. 4,2000
`
`Sheet 13 0f 14
`
`6,012,087
`
`PERIODIC
`MINDER
`
`READ URL FROM DB
`
`60
`
`FETCH DOC AT URL
`
`52
`
`75
`,
`
`77
`
`
`NO
`IS
`
`
`
`
`IGNORE_SIG
`GENERATE NEW
`
`
`SIGNATURE FOR DOC
`FLAG SET
`
`7
`
`
`YES
`READ ALL SIG'S FROM
`
`
`HISTORY TABLE IN DB
`
`IS
`
`
`
`LAST_MOD FROM
`
`DOC SAME AS IN DB
`
`?
`
`
`
`
`DO
`
`ANY SIG'S IN
`
`
`HISTORY TABLE
`
`64
`
`66
`
`NO
`
`NOTIFY
`
`80
`
`68
`
`‘39
`
`FIG. 15
`
`
`MATCH
`'2
`
`
`
`FROM DB
`
`READ
`LAST_MOD
`
`.
`NO CEl-I‘LgNGE'
`
`67
`
`Clouding Exhibit 2001, pg. 14
`
`
`
`US. Patent
`
`Jan. 4, 2000
`
`Sheet 14 0f 14
`
`6,012,087
`
`
`
`CONTENT-LENGTH
`REFETCHING
`
`
`
`100
`
`NO
`
`IS
`THERE
`
`
`CONTENT-LENGTH
`
`
`FOR DOC IN HEADER
`
`
`
`
` DOES
`
`HEDAER LEN
`
`MATCH TAG
`
`?
`
`CONTINUE
`
`
`REFETCH WEB-PAGE DOC
`
`
`FIG. 16
`
`Clouding Exhibit 2001, pg. 15
`
`
`
`6,012,087
`
`1
`UNIQUE-CHANGE DETECTION OF
`DYNAMIC WEB PAGES USING HISTORY
`TABLES OF SIGNATURES
`
`RELATED APPLICATION
`
`This application is a continuation-in-part of the applica-
`tion for “Change-Detection Tool Indicating Degree and
`Location of Change of Internet Documents by Comparison
`of CRC Signatures”, U.S. Ser. No. 08/783,625, filed Jan. 14,
`1997 now US. Pat. No. 5,898,836.
`
`FIELD OF THE INVENTION
`
`This invention relates to software retrieval
`
`tools for
`
`networks, and more particularly to improved accuracy for a
`change-detection tool for the Internet.
`
`BACKGROUND OF THE INVENTION
`
`Fast, inexpensive distribution of information has been
`promoted by the widespread acceptance of the Internet and
`especially the world-wide-web (www). This information can
`be easily updated or changed. However, users may not be
`aware of the changes. Unless the user frequently re-reads the
`information, many days or weeks may pass before users
`realize that the information has changed.
`Documents on the web are known as web pages. These
`web pages are frequently changed. Users often wish to know
`when changes are made to certain web pages. The parent
`application disclosed a change-detection tool that allows
`users to register web pages. Each registered web page is
`periodically fetched and compared to a stored checksum or
`signature for the registered page to determine if a change has
`occurred. When a change is detected, the user is notified by
`e-mail. The change-detection tool of the parent application
`allows user to select portions of a web-page document for
`change detection while other portions are ignored.
`Such a change-detection tool as described in detail in the
`parent application is indeed useful and has gained popularity
`with Internet users, as several hundred thousand web pages
`have been registered. For example, patent professionals can
`register the federal regulations and procedures (37 C.F.R.
`and the M.P.E.P) posted at
`the PTO’s web site and be
`notified when any changes are made. The change-detection
`tool is currently free for public use at the www.netmind.com
`web site.
`
`FIG. 1 illustrates a web page registered for change detec-
`tion. This web page contains a copy of one or more of the
`code of federal regulations; specifically the patent office
`regulations at 37 C.F.R § 1.x. Apatent attorney registers this
`web page that contains a copy of the patent rules at 37 C.F.R.
`§ 1.8 to 1.136. The rules may be located on one large web
`page, or spread across many web pages that are each
`registered.
`The user registers this page by using a user-interface for
`the change-detection tool. The user enters his e-mail address
`and the URL for the web page. The change-detection tool
`fetches a copy of this page and generates a signature. The
`signature is a highly-condensed data word that is produced
`by using a cyclical-redundancy-check (CRC) or other algo-
`rithm that produces unique outputs. For the initial page of
`FIG. 1, the signature 5A7 (hex) is generated and stored in a
`database with the user’s e-mail address and the web page’s
`URL.
`
`The change-detection tool periodically fetches this web
`page to see if a change has occurred. A new signature is
`generated for the re-fetched page, and the new signature is
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`compared with the old signature stored in the database. A
`mismatch indicates that a change is detected.
`FIG. 2 shows an updated web page that has a different
`signature that triggers a change notification. Occasionally,
`the patent regulations are updated. Web pages containing a
`copy of these regulations are eventually updated to reflect
`the changed rules. For example, FIG. 2 shows that rule 37
`C.F.R. § 1.62 has been deleted while rule 37 C.F.R. § 1.136
`has been updated, as they were in late 1997.
`The change detection tool re-fetches each registered page
`every few hours or days. Once the rules on the web page are
`updated, a different signature is generated for the updated
`web page. In FIG. 2, the new signature of D6F is generated,
`which does not match the old signature of 5A7 stored in the
`change-detection tool’s database. Thus a change is detected.
`The new signature is stored in the database and the patent
`attorney user is notified by e-mail.
`The user is notified within a few days after the web page
`is updated, allowing the patent attorney to rest easy, not
`having to frequently surf over to the rules page to see if any
`changes have been made.
`False Change Detections—FIG. 3
`The change-detection tool is only useful when it saves
`time and effort for the user. One problem is that false
`notifications can be made, annoying the user with changes
`that are not relevant. The inventors have discovered that the
`world-wide-web itself can trigger false change detections.
`These false detections should be filtered out.
`
`FIG. 3 shows a false change detection caused by a
`non-relevant change in an Internet server. Web pages are
`stored on computer servers. These servers are sometimes
`disconnected from the Internet for maintenance such as
`
`program or hardware updates, or security threats such as
`hacker attacks.
`
`The web server containing the web page with the 37
`C.F.R. patent rules is disconnected from the Internet for
`maintenance. Often such maintenance occurs during low-
`usage times such as weekend nights. Most users do not
`notice that the web pages are offline during these hours.
`Unfortunately, automated software programs such as the
`change-detection tool continue to operate during these
`times, and may perform more fetching during off hours since
`network response times decrease. The change-detection tool
`may find that the web page is not available.
`When no connection can be made with the server, the
`change-detection tool can simply skip the web page until a
`later time. Since TCP/IP packets are not returned from the
`server, the change-detection tool can easily determine that
`the page is not available due to a network problem. The
`change-detection tool does not notify the user, but instead
`tries again later.
`Completely disconnecting servers from the Internet is
`frowned upon since users do not know what is causing the
`errors. Thus many web sites use another server to return a
`message page to the user when the server is down for
`maintenance. This message or error page lets the user know
`that the web page is only temporarily unavailable and the
`user should try back later.
`The error page of FIG. 3 is returned when a user tries to
`retrieve the web page containing the 37 C.F.R. patent rules.
`This same error page is returned to change-detection soft-
`ware trying to fetch the web page. However, since no packet
`or network error is signaled,
`the change-detection tool
`assumes that the error page is the registered web page and
`generates a new signature. The new signature for the error
`page is EB9, which does not match the old signature (D6F)
`that was stored in the database after the last change was
`detected.
`
`Clouding Exhibit 2001, pg. 16
`
`
`
`6,012,087
`
`3
`The change-detection tool then generates a change notice
`that is emailed to the user. The next day when the patent
`attorney reads the change notice, he browses over to the web
`page. By now the server is back up, showing the same web
`page as in FIG. 2. Although the user reads the web page
`carefully, he cannot find any changes.
`Afew days later, the change detection tool again retrieves
`the web page and generates the new signature. Since this
`new signature does not match the error page’s signature that
`was stored, another change notice is generated. The user
`again looks at the web page but finds no changes. At this
`point, after receiving to false change notices,
`the user
`cancels his change-detection service to avoid getting the
`false notifications.
`HTML Headers—FIG. 4
`
`FIG. 4 shows a dynamic web page with HTML headers.
`Acontent-length HTML header <CONTENTiLEN> speci-
`fies the length of the web-page document
`in bytes. A
`last-modified header <LASTiMODIFIED> contains a date
`and time of the last modification of the web page. Dynamic
`content 15 is frequently updated, often by a database or
`search-engine server. Stock quotes are an example of
`dynamic content that appears in a dynamic frame. Dynamic
`images or JAVA programs are often used as dynamic con-
`tent.
`
`Some change-detection software relies solely on the last-
`modified header in the HTTP response from a Web server.
`For example, Microsoft Internet Explorer 4.0 has a feature
`called “Subscriptions” under the “Favorites” menu, which
`detects changes in web pages. This feature relies on the
`last-modified header to determine when a web page has
`changed. Unfortunately, many web pages do not return a
`last-modified header, and Internet Explorer generates false
`change notifications each time it checks a web page lacking
`the last-modified header.
`Not all documents contain a last-modified header. The
`
`last-modified header may or may not reflect changes in
`dynamic content 15. Some web servers update the last-
`modified header only when the static content changes. Thus
`change notifications are not generated when the dynamic
`content changes. This may be undesirable when the dynamic
`content
`is what
`the user desires to have checked. For
`
`example, when the user wants to search newsgroups for the
`appearance of a specific product or company name,
`the
`result of the search is dynamic content. If the web server
`does not return a Last-Modified header, the user is notified
`by an unsophisticated change-detection tool every time the
`search result is checked. If the web server returns a Last-
`
`Modified header based only on the static content, the user is
`not notified when the results of the search—the dynamic
`content—changes.
`The last-modified header may also be updated when the
`HTML header are changed, but not the visible document.
`This can also cause false changes to be reported. Even if the
`change detection tool is intelligent enough to analyze the
`content for changes, rather than relying solely on the Last-
`Modified header, false changes can be reported when the
`server returns only a portion of the web page due to some
`kind of error. The inventors, with the benefit of the experi-
`ence involved in running a change detection tool for hun-
`dreds of thousands of different documents on the Internet,
`have recognized these problems. Without this level of expe-
`rience these problems are not easily recognized.
`What
`is desired is an improved automated change-
`detection tool that detects when changes occur to a regis-
`tered document on the Internet. It is desired that the user not
`
`have to check the web page to see if any changes have
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`occurred. A change-detection tool adapted to filter out false
`change notifications desired. A change-detection tool that
`does not report changes that are not relevant to the user is
`desirable. Identification of temporary error pages is desir-
`able so that
`they are not reported to the user. A more
`sophisticated and more robust change-detection tool
`is
`desired.
`
`SUMMARY OF THE INVENTION
`
`A change-detection web server detects unique changes in
`web pages. A network connection transmits and receives
`packets from a remote client and a remote web-page server.
`A responder is coupled to the network connection. It com-
`municates with the remote client. The responder registers a
`web page for change detection by receiving from the remote
`client a uniform-resource-locator (URL) identifying the web
`page. The responder fetches the web page from the remote
`web-page server.
`A database is coupled to the responder. It receives the
`URL from the responder when the web page is registered by
`the remote client. The database stores a plurality of records
`each containing a URL.
`Ahistory table in each of the records in the database stores
`a most-recent signature and a plurality of older-version
`signatures for a registered web page identified by the URL.
`The older-version signatures are condensed checksums for
`earlier versions of the registered web page previously
`fetched by the change-detection web server. The most-recent
`signature is a condensed checksum for a most-recently-
`fetched copy of the registered web page. A periodic minder
`is coupled to the database and the network connection. It
`periodically re-fetches the web page from the remote web-
`page server by transmitting the URL from the database to the
`network connection. The periodic minder receives a fresh
`copy of the web page from the remote web-page server. The
`periodic minder generates a new signature from the fresh
`copy of the web page. The periodic minder notifies the
`remote client of a unique change when the new signature
`does not match the most-recent signature and does not match
`any of the older-version signatures in the record.
`Thus the unique change in the web page is detected by
`comparing the new signature to the most-recent signature
`and to older-version signatures for the web page. Changes in
`the web page that are not unique but match an earlier version
`of the web page do not notify the remote client.
`In further aspects the database does not store the web
`page. The database stores the most-recent signature and
`earlier-version signatures for the web page. Thus storage
`requirements for the database are reduced by archiving the
`most-recent signature and not entire web pages.
`In still further aspects a permanent history table stores
`new signatures that match one of the older-version signa-
`tures. Thus older-version signatures that are matched are
`copied to the permanent history table.
`In other aspects the history table is a temporary history
`table organized as a first-in-first-out stack. A least-recent
`signature in the history table is replaced by a new signature
`when notification is made. Thus signatures in the permanent
`history table are not deleted by new signatures written to the
`temporary history table.
`In further aspects the older-version signatures are stored
`in both the permanent history table and the history table. The
`periodic minder compares the new signature to older-version
`signatures from both the history table and from the perma-
`nent history table.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a web page registered for change detec-
`tion.
`
`Clouding Exhibit 2001, pg. 17
`
`
`
`6,012,087
`
`5
`FIG. 2 shows an updated web page that has a different
`signature that triggers a change notification.
`FIG. 3 shows a false change detection caused by a
`non-relevant change in an Internet server.
`FIG. 4 shows a dynamic web page with HTML headers.
`FIG. 5 is a diagram of a change detection tool on a server
`on the Internet.
`
`FIG. 6 shows a record with a history table of past
`signatures in the database for the change-detection web
`server.
`
`10
`
`FIGS. 7A—7D illustrate how a history table of signatures
`solves the error-page problem of FIGS. 1—3.
`FIGS. 8A, 8B are a flowchart for the periodic minder
`using history tables and last-modified headers to avoid
`non-relevant change notifications.
`FIG. 9 is a flowchart of notification once a unique change
`is detected.
`
`FIG. 10 shows a history table with both temporary and
`permanent signatures.
`FIG. 11 illustrates how the permanent history table is
`loaded for detected changes when any of the older signatures
`in the temporary history table are matched.
`FIG. 12 shows a modification for loading the permanent
`history table when a non-unique change is detected.
`FIG. 13 shows a change-detection record that tracks a
`number of times that change is detected for a registered web
`page.
`
`FIG. 14 is a flowchart for a frequency-check routine that
`stops signature comparison when too many changes are
`being detected for a web page.
`FIG. 15 is a flowchart for change detection that uses
`signatures and last-modified headers.
`FIG. 16 shows re-fetching when the content length is
`incorrect.
`
`DETAILED DESCRIPTION
`
`in
`invention relates to an improvement
`The present
`change-detection software tools. The following description
`is presented to enable one of ordinary skill in the art to make
`and use the invention as provided in the context of a
`particular application and its requirements. Various modifi-
`cations to the preferred embodiment will be apparent to
`those with skill in the art, and the general principles defined
`herein may be applied to other embodiments. Therefore, the
`present invention is not intended to be limited to the par-
`ticular embodiments shown and described, but
`is to be
`accorded the widest scope consistent with the principles and
`novel features herein disclosed.
`
`Overview of Change-detection Web Server—FIG. 5
`FIG. 5 is a diagram of a change detection tool on a server
`on the Internet. The user operates client 14 from a remote
`site on Internet 10. The user typically is operating a browser
`application, such as Netscape’s Navigator or Microsoft’s
`Internet Explorer, or a browser mini-application such as an
`Internet toolbar in a larger program. Client 14 communicates
`through Internet 10 by sending and receiving TCP/IP pack-
`ets to establish connections with remote servers, typically
`using the hypertext transfer protocol (HTTP) of the world-
`wide web.
`
`Client 14 retrieves web pages of files from document
`server 12 through Internet 10. These web pages are identi-
`fied by a unique URL (uniform resource locator) which
`specifies a document file containing the text and graphics of
`a desired web page. Often additional files are retrieved when
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`a document
`
`6
`is retrieved. The “document” returned from
`
`document server 12 to client 14 is thus a composite docu-
`ment composed of several files of text, graphics, and perhaps
`sound or animation. The physical appearance of the web
`page on the user’s browser on client 14 is specified by layout
`information embedded in non-displayed headers, as is well-
`known for HTML (hyper-text markup language) documents.
`Often these HTML documents contain headers with URL’s
`
`that specify other web pages, perhaps on other web servers
`which may be physically located in different cities or coun-
`tries. These headers create hyper-links to these other web
`servers allowing the user to quickly jump to other servers.
`These hyper-links form a complex web of linked servers
`across the world; hence the name “world-wide web”.
`The user may frequently retrieve files from remote docu-
`ment server 12. Often the same file is retrieved. The user
`
`may only be interested in differences in the file, or learning
`when the file is updated, such as when a new product or
`service is announced. The inventors have developed a soft-
`ware tool that automatically retrieves files and compares the
`retrieved files to an archived signature of the file to deter-
`mine if a change in the file has occurred. When a change is
`detected, the user is notified by an electronic mail message
`(e-mail). Acopy of the new file may be attached to the e-mail
`notification, allowing the user to review the changes.
`Rather than archive the source files from remote docu-
`ment server 12, the invention archives a checksum CRC or
`signature of the source files. These signatures and the e-mail
`address of the user are stored in database 16 of change-
`detection server 20. Comparison is made of the stored or
`archived signature of the document and a fresh signature of
`the currently-available document. The signature is a con-
`densed checksum or fingerprint of the document. Any
`change to the document changes the signature.
`Change-detection server 20 performs three basic func-
`tions:
`
`1. Register (setup) a web page document for change
`detection.
`
`2. Periodically re-fetch the document and compare for
`changes
`3. E-mail a change notice to the registered user if a change
`is detected.
`
`Change-detection server 20 contains three basic compo-
`nents. Database 16 stores the archive of signatures for
`registered web-page documents. The URL identifying the
`web page and the user’s e-mail address are also stored with
`the archived signature. Responder 24 communicates with
`the user at client 14 to setup or register a web page document
`for change detection. Minder 22 periodically fetches regis-
`tered documents from document server 12 through Internet
`10. Minder 22 compares the archived signature in database
`16 to a new signature of the fetched document to determine
`if a change has occurred. When a change is detected, minder
`22 sends a notice to the user at client 14 that the document
`
`has changed.
`Change-Detection of Web Pages
`This change-detection tool is disclosed in the co-pending
`parent application, “Change-Detection Tool Indicating
`Degree and Location of Change of Internet Documents by
`Comparison of CRC Signatures”, U.S. Ser. No. 08/783,625,
`filed Jan. 14, 1997, hereby incorporated by reference. A
`basic change-detection tool without the improved methods
`using the signature history tables has been available for free
`public use at the inventor’s web site, www.netmind.com, for
`more than a year before the filing date of the present
`application. The existing “URL-minder” has over 700,000
`documents or URL’s registered for 3.8 million users.
`
`Clouding Exhibit 2001, pg. 18
`
`
`
`6,012,087
`
`7
`Unique-content, not Mere Change, is Detected
`The inventors have realized that change detection must be
`accurate to be useful. False change detections must be
`avoided and non-relevant changes ignored. Often, the user
`does not want to be notified of all changes, but rather only
`for new content. Thus the inventors notify the user when
`“unique” content is detected; not when a mere “change” to
`old content is detected.
`Rather than just store the last signature, the inventors use
`a table of several older signatures. When any of the older
`signatures match the web page, the content is not unique
`even if it has changed since the last check. The web page
`may have reverted back to an older version.
`Previous change-detection tools generate notifications for
`any change, including changes back to an older version.
`With the improvement,
`the user is not notified for the
`older-version change, even though the web page has
`changed. It is likely that the user has already seen the older
`version of the web page. Only unique web pages that are
`unlike any previous versions cause the user to be notified.
`Thus the improved invention is not a “change”-detection
`tool, but a “Unique-content” tool.
`Database Records Include History Table of Signatures—
`FIG. 6
`
`FIG. 6 shows a record with a history table of past
`signatures in the database for the change-detection web
`server. Database 16 of FIG. 5 contains many such records,
`one for each web page or URL. Multiple e-mail addresses
`can be stored for each web page by using a relational
`(multi-table) database, with a separate table linking e-mail
`addresses to registered web pages.
`Each record has one or more e-mail address 32. When a
`
`unique change is detected, a notification message is sent to
`e-mail address 32. URL 36 is the world-wide-web address
`
`that is used to locate the web page. This URL is translated
`to an IP address of a server machine by Internet directories
`when the page is fetched. Length field 34 stores the length
`of the web page and can be used to ensure that the entire web
`page has been fetched.
`Last-modified field 38 contains a copy of the last-
`modified header from the web server for the particular
`web-page. Although the change-detection tool is primarily
`signature-based, improved detection results when the last-
`modified header in the newly-fetched document is compared
`to last-modified field 38.
`
`Rather than store one signature for the most-recent ver-
`sion of the web page, a table of signatures for many older
`versions of the web page is stored. History table 40 contains
`signatures for the three most-recent versions of the web
`page. Signature 2B9 (hex) is the most-recent signature for
`the web page, and the change-detection tool of the parent
`application stores only this signature, or multiple signatures
`for each section of this one most-recent version of the web
`page.
`History table 40 also stores signature D6F, for the next-
`to-last version of the web page, and signature 5A7 for the
`next earlier version of the web page. Thus three signatures
`for the last three versions of the web page are stored in
`history table 40. If a newly-fetched web page changes to any
`of the two earlier versions, a notification is not made, even
`though a change occurred.
`The number of signatures stored in history table 40 can
`vary; the three signatures of FIG. 6 is just for illustration.
`The size of history table 40 does not have to be fixed; it can
`vary under software control according to available storage in
`the database. The size of history table 40 could be adjusted
`to store all signatures in the last month or year rather than a
`fixed number of signatures.
`
`8
`History