throbber
US006185614B1
`(10) Patent No:
`a2) United States Patent
`US 6,185,614 B1
`Cuomoetal.
`(45) Date of Patent:
`Feb. 6, 2001
`
`
`(54) METHOD AND SYSTEM FOR COLLECTING
`USER PROFILE INFORMATION OVER THE
`WORLD-WIDE WEB IN THE PRESENCE OF
`DYNAMIC CONTENT USING DOCUMENT
`COMPARATORS
`
`(75)
`
`Inventors: Gennaro A. Cuomo,Apex; Binh Q.
`Nguyen, Cary; Sandeep K.Singhal,
`Raleigh, all of NC (US)
`
`(73) Assignee:
`
`International Business Machines
`Corp., Armonk, NY (US)
`
`(*) Notice:
`
`Under 35 U.S.C. 154(b), the term of this
`patent shall be extended for O days.
`
`:
`(21) Appl. No.: 09/084,452
`(22)
`Filed:
`May26, 1998
`(51) Unt. Ch. ieee GO6F 15/173; GO6F 15/16;
`GO6F 7/00
`(52) U.S. Che ceecescecssssseessseeee 709/224; 709/203; 707/104
`(58) Field of Search 0... 709/203, 224;
`707/6, 10, 104, 501, 513, 3, 5
`.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,649,186 *
`7/1997 Ferguson ....ccceceseeceseseeee 707/10
`5,732,218
`3/1998 Bland .............
`«709/204
`
`5,740,430 *
`4/1998 Rosenberg et al.
`..essssesseeen 707/200
`5,745,900 *
`4/1998 BUITOWS veescssccsssssssseecesssees 707/102
`5,813,007 *
`9/1998 Nielsen ......
`707/10
`
`5,890,164 *
`3/1999 Nielsen......
`vee 707/201
`5,892,917
`4/1999 Myerson ....
`w.. 709/204
`5,893,908 *
`4/1999 Cullen et al. oes 707/5
`5,895,470 *
`4/1999 Pirolli Ct Al. ieeeeeeeseeeeee 707/102
`5,898,836 :
`4/1999 Freivald et al.
`- 709/218
`
`oorong .
`rr909 proder et “ ”
`oe ma3
`5041044 * e190 Mowe aes
`“sys
`
`5978.842 * 11/1999 Noble ie al~
`~ 709/218
`
`5,983,268 * 11/1999 Freivald et al...
`.. 799/218
`5,987,480 * 11/1999 Donohueet al.
`..
`.. 707/501
`
`5,999,929 * 12/1999 Goodman ....ecccsceseeeeeeees 707/7
`
`6,012,087 *
`1/2000 Freivald et al. sess 709/218
`FOREIGN PATENT DOCUMENTS
`9831155
`7/1998 (WO).
`OTHER PUBLICATIONS
`
`Brin, S., et al., “Copy Detection Mechansims for Digital
`Documents,” Proc. Of the 1995 ACM SIGMODInt’l. Conf.
`on Management of Data, ACM,pp. 398-409, May, 1995.*
`Garcia—Molina, H., et al, “dSCAM: Finding Document
`Copies Across Multiple Databases,” Proc. of the 4th Int’l.
`Conf.on Parallel and Distributed Information Systems,
`IEEE, pp. 68-79, May 1995.*
`
`* cited by examiner
`
`Primary Examiner—Ahmad F. Matar
`Assistant Examiner—Andrew Caldwell
`(74) Altorney, Agent, or Firm—A. Bruce Clay
`(57)
`ABSTRACT
`
`Disclosed is a method and system for collecting profile
`information about users accessing dynamically generated
`content from one or moreservers. In a specific embodiment,
`a server dynamically generates a web page in response to a
`user request. The server customizes the web
`page content
`
`based on the requested universal resource identifier (URT
`
`and one or moreof: the user’s identity, access permissions,
`demographic information, and previous behaviorat the site.
`The web server then passes the URI, user identity, and
`.
`.
`.
`dynamically generated web ne ae lotoes information
`COuector.
`Ane access information collector generates
`dOCcu-
`ment comparators from the current web page content and
`compares them to document comparators associated with
`previously retrieved web pages. If the current web page is
`sufficiently similar to some previously retrieved web page,
`the access information collector logs the URI, user identity,
`and a documentkey associated with the matching previously
`retrieved page. Otherwise, the access information collector
`generates a new key; stores the new key and the document
`comparators in a database; and logs the URI, user identity,
`and the newly generated documentkey.
`
`27 Claims, 4 Drawing Sheets
`
`
`serenaDn
`aaoCompute Document
`Comparator
`
`Select Candidate
`Documentand Comparator
`
`
`lo
`Comparators diferBY
`lessthan threshold.
`
`”
`
`Generate new Keyfor
`retrieved document
`
`
`
`410
`
`rieve Document Key|Add new entryto Retrieved
`
`torCanidae Dozumert
`\
`|e
`pacument Databaso
`
`
`490
`‘entty to Document
`415
`coomparatr inox Database
`Add entry to Log Fife with
`
`
`
`
`
` 435
`
`CandidateDocument’sKey aAaniyoLogFelewith=
`
`SAMSUNG 1022
`
`IPR2024-00145
`Apple EX1022 Page 1
`
`

`

`L‘Sis
`
`Aemoayer)Of
`
`JOINSCc
`
`fe
`
`oe
`
`tCi[2007
` as=Belly[200
`
`
`YIOMION
`
`at
`
`o8
`
`veFlesee
`
`92
`
`U.S. Patent
`
`Feb. 6, 2001
`
`Sheet 1 of 4
`
`US 6,185,614 B1
`
`IPR2024-00145
`Apple EX1022 Page 2
`
`

`

`U.S. Patent
`
`Feb.6, 2001
`
`Sheet 2 of 4
`
`US 6,185,614 B1
`
`Static Static||PynamicCame Static Coame
`
`
`
`
`Content
`Content
`G ° ah
`G on at
`Content one
`Database
`enerator Database
`enerator Database
`enerator
`
` 232
`
`222
`
`220
`
`Web
`Server
`
`Web
`Server
`
`210
`
`211
`
`Web
`Server
`
`212
`
`AccessInformation Collector
`
`240
`
`205
`
`CLIENT
`
`CLIENT
`
`CLIENT
`
`200
`
`201
`
`202
`
`FIG. 2
`
`IPR2024-00145
`Apple EX1022 Page 3
`
`

`

`U.S. Patent
`
`Feb. 6, 2001
`
`Sheet 3 of 4
`
`US 6,185,614 B1
`
`€‘Sid
`
`SLE
`
`jueWINDOg
`
`Joyesedwog
`
`cle
`
`|fnELLEZLe
`juewinooqjuewns0g
`
`haysoyeredwog
`
`yuswNnoog eseqeieq
`
`yuewnoog
`
`fonTn.
`
`peawiey
`
`yuewNoeg
`
`Joyeseduoy
`
`xepu|
`
`jueWUNDOGg
`
`hey
`
`ZLELeLOE
`
`
`
`
`
`
`
`
`
`
`
`IPR2024-00145
`Apple EX1022 Page 4
`
`

`

`U.S. Patent
`
`Feb.6, 2001
`
`Sheet 4 of 4
`
`US 6,185,614 B1
`
`Receive URI, requesttime,
`client identity, and document
`content
`
`
`
`Compute Document
`Comparator
`
`
`
`
`Select Candidate
`Document and Comparator
`
`400
`
`402
`
`404
`
`Candidate
`document
`
`No
`
`
`
`
`Comparators differ b
`
`Generate new Keyfor
`retrieved document
`
`
`for Candidate Document
`
`Add newentry to Retrieved
`Document Database
`
`Add new entry to Document
`Comparator Index Database
`
`
`less than threshold
`
` Retrieve Document Key
`
`
`
`
`Add entry to Log File with
`Candidate Document's Key
`
`
`
`Add entry to Log File with
`retrieved Document's Key
`
`
`490
`
`FIG. 4
`
`420
`
`425
`
`430
`
`435
`
`IPR2024-00145
`Apple EX1022 Page 5
`
`

`

`US 6,185,614 B1
`
`1
`METHOD AND SYSTEM FOR COLLECTING
`USER PROFILE INFORMATION OVER THE
`WORLD-WIDE WEB IN THE PRESENCE OF
`DYNAMIC CONTENT USING DOCUMENT
`COMPARATORS
`
`2
`access information in the presence of dynamically-generated
`content at a Web server, in order to support the accurate
`generation of user profiles.
`
`SUMMARYOF THE INVENTION
`
`FIELD OF THE INVENTION
`
`This invention relates in general to computer software,
`and in particular to a method and system for collecting
`profile information about users accessing Web pages from a
`plurality of Web servers. More particularly,
`the present
`invention relates to a method and system by which user
`profile information can be collected when the Web content
`is generated dynamically for each request at the Web server.
`
`BACKGROUND OF THE INVENTION
`
`In the World-Wide Web, a content provider deploys a
`plurality of Web servers that deliver Web pagesto clients.
`Whenrequesting a Web page, the client supplies a Uniform
`Resource Locator (URL) or Universal Resource Identifier
`(URDto the server. The server associates this URI with a
`particular page of content and delivers that information to
`the requesting client.
`As the World-Wide Web is being used increasingly to
`support commerce and targeted advertising, content provid-
`ers desire to collect
`information about which users are
`
`accessing the site and what site content those users are
`accessing. This information can be used to establish “pro-
`files” for each site visitor and enable tuning of the Website
`content to meet the visitors’ interests. Traditionally,
`this
`visitor information is collected by the Web server or a proxy
`server in the form of a log file. This log file contains, among
`other things, the requesting host address, the requested URI,
`and the time at which the request was received. Because
`each URIrepresentsa particular piece of static content at the
`Website, the URIis sufficient for a user profile analyzer to
`evaluate which content was received by each user and to
`detect similarities among the behavior of different users.
`Recent Webservers are providing support for server-side
`scripting, whereby the URIis associated with a program or
`script that is executed at
`the Web server. This script is
`responsible for receiving the URI and the user identity and
`using this information to dynamically generate the content
`that should be returned to the requesting user. This generated
`content may account for the user’s previous behaviorat the
`site, his access permissions, his demographic information, or
`any number of other factors. Dynamic server content is
`supported by most Web servers today, including Microsoft’s
`Active Server Pages, Sun’s Dynamic Server Pages, industry-
`standard servlets, Common Gateway Interface (CGI)
`executables, and other mechanisms.
`As a result of this direction, a particular URI can no longer
`be associated with particular content at the Web site. On
`different requests,
`the URI may return wholly different
`content depending on the requesting user and the context in
`which the request was issued. Consequently, existing meth-
`ods for capturing user information are insufficient for pro-
`ducing meaningful user profiles. More specifically, the reli-
`ance on URIs alone prevents the accurate characterization of
`which users are exhibiting similar access behavior.
`Therefore, a method is neededfor efficiently collecting user
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`Oneobject of the present invention is to provide, within
`a networked environment, a method of associating each
`user’s request
`for World-Wide Web information to the
`content of the retrieved document when that document was
`
`generated dynamically.
`invention is to group
`Another object of the present
`together user requests that retrieve the same document
`content. Yet another object of the present invention is to
`ignore minor variations in document content as might occur
`when the documents differ only in the presence of the
`requesting user’s name. Still yet another object of the
`present invention is to enable the use of a range of metrics
`for comparing two documents for similarity.
`To achieve the foregoing objects and in accordance with
`the purpose of the invention as broadly described herein, a
`method and system are disclosed for collecting information
`about user accesses by analyzing the content of retrieved
`documents and associating Document Comparators with
`each document. These and other features, aspects, and
`advantages of the present
`invention will become better
`understood with reference to the following description,
`appended claims, and accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For a more complete understanding of the present inven-
`tion and for further advantages thereof, reference is now
`made to the following Detailed Description taken in con-
`junction with the accompanying Drawings, in which:
`FIG. 1 is a pictorial representation of a data processing
`system which maybe utilized to implement a method and
`system of the present invention;
`FIG. 2 shows a block diagram of a World-Wide Web
`environment
`in which user access information may be
`generated in accordance with the present invention;
`FIG. 3 shows a sample data structure for representing the
`information collected by the Access Information Collector in
`accordance with the present invention; and
`FIG. 4 is a flowchart showing how an Access Information
`Collector analyzes a documentretrieved from a Web server
`and updates its data structures.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Referring to FIG. 1, there is depicted a graphical repre-
`sentation of a data processing system 8, which may be
`utilized to implement the present invention. As maybe seen,
`data processing system 8 may include a plurality of
`networks, such as Local Area Networks (LAN) 10 and 32,
`each of which preferably includes a plurality of individual
`computers 12 and 30, respectively. Of course, those skilled
`in the art will appreciate that a plurality of Intelligent Work
`Stations IWS) coupled to a host processor may be utilized
`for each such network. Each said network mayalso consist
`of a plurality of processors coupled via a communications
`medium, such as shared memory, shared storage, or an
`6
`
`IPR2024-00145
`Apple EX1022 Page 6
`
`

`

`US 6,185,614 B1
`
`3
`interconnection network. As is common in such data pro-
`cessing systems, each individual computer may be coupled
`to a storage device 14 and/or a printer/output device 16 and
`may be provided with a pointing device such as a mouse 17.
`The data processing system 8 may also include multiple
`mainframe computers, such as mainframe computer 18,
`which may be preferably coupled to LAN 10 by means of
`communications link 22. The mainframe computer 18 may
`also be coupled to a storage device 20 which may serve as
`remote storage for LAN 10. Similarly, LAN 10 may be
`coupled via communications link 24 through a sub-system
`control unit/communications controller 26 and communica-
`tions link 34 to a gateway server 28. The gateway server 28
`is preferably an IWS which serves to link LAN 32 to LAN
`10.
`
`With respect to LAN 32 and LAN 10, a plurality of
`documents or resource objects may be stored within storage
`device 20 and controlled by mainframe computer 18, as
`resource managerorlibrary service for the resource objects
`thus stored. Of course, those skilled in the art will appreciate
`that mainframe computer 18 may be located a great geo-
`graphic distance from LAN 10 and similarly, LAN 10 may
`be located a substantial distance from LAN 32. For example,
`LAN 32 maybe located in California while LAN 10 may be
`located within North Carolina and mainframe computer 18
`may be located in New York.
`Software program code which employs the present inven-
`tion is typically stored in the memoryof a storage device 14
`of a stand alone workstation or LAN server from which a
`
`developer may access the code for distribution purposes, the
`software program code may be embodied on anyofa variety
`of known media for use with a data processing system such
`as a diskette or CD-ROMor maybedistributed to users from
`a memory of one computer system over a network of some
`type to other computer systemsfor use by users of such other
`systems. Such techniques and methods for embodying soft-
`ware code on media and/or distributing software code are
`well-known and will not be further discussed herein.
`
`Referring now to FIG. 2, components of a World-Wide
`Web system are shown in which user information may be
`gathered in accordance with the present invention. A plu-
`rality of clients (generally indicated by reference numerals
`200, 201, and 202) access information over a network 205
`using World-Wide Web browsers such as NETSCAPE
`NAVIGATOR,
`a
`trademark of Netscape,
`Inc. or
`MICROSOFT INTERNET EXPLORER, a trademark of
`Microsoft, Inc. These clients access a plurality of Web
`servers (generally indicated by reference numerals 210, 211,
`and 212) such as LOTUS GO, a trademark of Lotus,Inc.,
`MICROSOFT INTERNET INFORMATION SERVICE
`
`Inc. or NETSCAPE
`(IS), a trademark of Microsoft,
`FASTTRACK,a trademark of Netscape, Inc.
`In accessing these Web servers, the clients 200, 201 and
`202 specify a URI. Each of these Web servers 210, 211, and
`212 accesses a Static Content Database (generally indicated
`by reference numerals 220, 221, and 222) and a Dynamic
`Content Generator (generally indicated by reference numer-
`als 230, 231, and 232) that receives a URI and other
`information about the user and generates Web content suit-
`able for display by the browsersat the clients 200, 201, and
`202. These Dynamic Content Generators 230, 231, and 232
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`including Active Server Pages,
`may take many forms,
`servlets, Common Gateway Interface (CGI) binaries, or
`Dynamic Server Pages.
`Upon receiving a URI request from a client, the Web
`server 210, 211, or 212 either retrieves the content from the
`Static Content Database 220, 221, or 222 or from the
`Dynamic Content Generator 230, 231, or 232. An Access
`Information Collector 240 receives client requests and con-
`tent returned from the Static Content Database 220, 221, or
`222 or from the Dynamic Content Generator 230, 231, or
`232 and collects log information that can be used to analyze
`the access patterns of varioususers. It should be understood
`that the physical location of the components shown in FIG.
`2 may vary. In particular, the Access Information Collector
`240 may be embedded in the Web servers 210, 211, and 212.
`Moreover, the Dynamic Content Generators 200, 201, and
`202 and Static Content Databases 220, 221, and 222 may be
`co-located with the Web servers 210, 211, and 212.
`
`FIG. 3 illustrates the information collected by the Access
`Information Collector in accordance with the present inven-
`tion. A Log File 300 contains a sequence of Access Records.
`Each Access Record includesat least a time stamp 301, a
`requested URI 313, and a Document Key 312.
`A Retrieved Document Database 310 contains a reposi-
`tory of Document Records corresponding to documents
`retrieved by users. Each Document Record 311 is indexed by
`a Document Key 312 and contains an associated URI 313,
`document text 314, and a Document Comparator 315. The
`Document Key 312, when combined with the URI 313,
`serves to uniquely identity the Document Record 311. Docu-
`ment Keys may be assigned sequentially or by any other
`appropriate method.
`The Document Comparator 315 is a representation of the
`document’s contents and is used by a Document Comparator
`Function to determine whether there are substantial pre-
`defined similarities, as will be subsequently described in
`greater detail, between the current document and other
`previously retrieved documents. The Document Comparator
`Function receives the Document Comparators for two docu-
`ments and determines whether the two documents are sub-
`
`stantially similar. To make this determination, the Function
`may employ a Document Difference Threshold, a numeric
`value that indicates how much two documents may differ
`before they are no longer deemedto be substantially similar.
`The use of the Document Difference Threshold depends on
`the particular Document Comparator Function being used.
`The use of a Document Difference Threshold allows the
`
`Document Comparator Function to ignore minordifferences
`between two documents. Such minor differences include
`
`timestamps, client name, or client-specific data.
`In the present embodimentof this invention, the Docu-
`ment Comparator 315 is the actual content of the document
`itself, and the Document Comparator Function for any two
`documents is defined to be the number of character
`
`insertions, deletions, or modifications required to convert
`one documentto the other. This computation is well under-
`stood in the prior art (see, for example, the use oftries, as
`described in Chapter 11 of Alan Tharp, File Organization
`and Processing, Wiley, 1988) and will not be discussed
`further. Alternative embodiments of this invention may
`7
`
`IPR2024-00145
`Apple EX1022 Page 7
`
`

`

`US 6,185,614 B1
`
`10
`
`15
`
`310 whose URI matchesthat of the retrieved document.(It
`should be understood that alternative embodiments of this
`
`invention may remove the restriction that the URI of the
`retrieved document and the URIof the Candidate Document
`
`20
`
`match. Alternative embodiments of this invention may also
`introduce additional restrictions on what constitutes a Can-
`
`6
`Referring now to FIG. 4, a flowchart depicts the steps
`taken by the Access Information Collector 240 to analyze a
`documentretrieved from a Web server and to update the Log
`File 300, Retrieved Document Database 310, and Document
`Comparator Index 320 (as shown in FIG. 3). At block 400,
`the Access Information Collector 240 receives the requested
`URI, the time of the request, the identity of the requesting
`client, and the content of the retrieved document. At block
`402, a Document Comparator 315 is computed for the
`retrieved document. At block 404, a Candidate Document
`and Candidate Document Comparator are selected from the
`Retrieved Document Database 310. The Candidate Docu-
`ment is a document in the Retrieved Document Database
`
`5
`compute a Document Comparator 315 by mapping each
`word, paragraph, or section of the document to a binary
`token. In this case,
`the Document Comparator Function
`might count the number of matching binary tokens, and the
`Document Difference Threshold would designate what per-
`centage of the tokens must match (see, for example, “Copy
`Detection Mechanisms for Digital Documents,” by Sergey
`Brin, James Davis, and Hector Garcia-Molina, in Proceed-
`ings of the 1995 SIGMOD International Conference on
`Management of Data, pages 398-409, May 1995). Yet
`another embodimentof this invention may define a Docu-
`ment Comparator 315 as a list of the most significant (as
`predefined) words or phrases in the document; the Docu-
`ment Comparator Function may simply count how many
`words or phrases occur in both documents, and the Docu-
`ment Difference Threshold would designate what percentage
`of words in each document must appear in the other. Other
`comparison methods are well established in the prior art.
`The essential element of a Document Comparator 315 is that
`a metric (i.e.
`the Document Comparator Function) must
`exist for comparing two different Document Comparators to
`determine by how much their respective documents differ.
`Indeed, a Document Comparator 315 mayactually comprise
`multiple Comparators, one per each predefined section of
`the document, each having an associated Document Com-
`parator Function.
`Finally, a Document Comparator Index 320 associates
`each Document Comparator 315 with the corresponding
`Document Key 312. The Index 320 is used to improve the
`performance of the Document Comparator 315 evaluations
`and the selection of Candidate Documents (see FIG. 4).
`However,
`it
`is a performance optimization that may be
`omitted by alternative embodiments of this invention.
`Though the data structures have been illustrated in FIG. 3
`with a particular embodiment, alternative representations of
`this information are possible. The essential attributes of
`these implementations is the association of each Document
`Comparator 315 to a Document Key 312, the association of
`each user URI 313retrieval with a particular Document Key
`then it is
`If the answer to decision block 406 is no,
`312, and the association of each Document Key 312 with
`determined that the retrieved documentis new. At block 420,
`particular document content. It should be noted that various
`a new Document Key is generated for the retrieved docu-
`optimizations are also possible. For example,
`instead of
`ment. At block 425, a new entry is added to the Retrieved
`storing each document’s full content, the Retrieved Docu-
`Document Database 310 to associate the retrieved docu-
`ment Database 310 maystore onlyalist of most significant
`ment’s Document Key with a new Document Record con-
`words or phrases.
`50
`taining the retrieved URI, retrieved document, and retrieved
`When a documentis accessed from the Web server (with
`document’s Document Comparator. At block 430, a new
`a particular URD,
`the Access Information Collector 240
`entry is added to the Document Comparator Index 320
`analyzes the retrieved document (using the Document Com-
`database to associate the retrieved document’s Document
`parator Function) to determine whether it is substantially
`similar
`to another document
`that has been previously
`retrieved from that Web server using the same URI. If a
`substantially similar document has already been generated
`by the Web server, then the user’s access is associated with
`that previous document; however, if a substantially similar
`document has not been previously generated by the Web
`server, then the user’s access is associated with this new
`document. In this way, the Access Information Collector 240
`distinguishes between different dynamically-generated
`documentsretrieved using the same URI while also merging
`access information about documents that are nearly identi-
`cal.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`didate Document.) At decision block 406, it is determined
`whetheror not a Candidate Documenthas been found. If the
`
`answerto decision block 406 is yes, then at decision block
`408, the Document Comparator Function is invoked with the
`Document Comparators of the retrieved documentand of the
`Candidate Document
`to determine whether or not
`the
`retrieved document and the Candidate Document are sub-
`
`stantially similar.
`Continuing with FIG. 4, if the answer to decision block
`408 is yes, then it is determined that the retrieved document
`is sufficiently similar to the Candidate Document and no new
`entry is required to either the Retrieved Document Database
`310 or to the Document Comparator Index 320. At block
`410,
`the Document Key is retrieved for the Candidate
`Document. At block 415, a new entry is added to the Log
`File, including the time stamp, requested URI, and candidate
`document’s Document Key. The process then terminates at
`block 490. If the answer to decision block 408 is no, then
`control returns to block 404, where another Candidate
`Documentis selected for evaluation.
`
`Comparator with the retrieved document’s DocumentKey.
`Atblock 435, a new entry is added to the Log File, including
`the time stamp, requested URI, and retrieved document’s
`Document Key. The process then terminates at block 490.
`Thus, each user accessis associated with a Document Key
`representing a document in the Retrieved Document Data-
`base with a sufficiently close Document Comparator. Each
`URI
`is,
`therefore, potentially linked with multiple
`documents, each having different content. At the same time,
`the analysis ignores minor differences between documents,
`as might arise when page content is customized in minor
`ways to reflect the identity of the requesting user.
`8
`
`IPR2024-00145
`Apple EX1022 Page 8
`
`

`

`US 6,185,614 B1
`
`7
`Although the present invention has been described with
`respect to a specific preferred embodimentthereof, various
`changes and modifications may be suggested to one skilled
`in the art and it
`is intended that
`the present
`invention
`encompass such changes and modifications as fall within the
`scope of the appended claims.
`What weclaim is:
`1. A method of collecting information about document
`retrievals over the World-Wide Web, comprising the steps
`of:
`
`receiving a requesting user identity, requested Universal
`Resource Identifier (URI), and a contentof a retrieved
`document;
`selecting a Candidate Document from a Retrieved Docu-
`ment Database, said Candidate Document associated
`with a Candidate Document Key;
`to said Candidate
`comparing said retrieved document
`Documentto determine a sufficiency of said Candidate
`Document;
`associating said retrieved document with a newly gener-
`ated Retrieved DocumentKeyif said Candidate Docu-
`ment is not deemedto be sufficient;
`adding said retrieved document to said Received Docu-
`ment Database; and
`adding a Log File Entry including said requesting user
`identity, said requested URI, and said Retrieved Docu-
`ment Key.
`2. The method of claim 1, wherein each of a plurality of
`documents in said Retrieved Document Databaseis associ-
`
`ated with a Document Comparator and wherein a first
`Document Comparator may be compared to a second Docu-
`ment Comparator using a Document Comparator Function.
`3. The method of claim 2, wherein said step of comparing
`to determine a sufficiency of said Candidate Document
`further comprises the steps of:
`computing said first Document Comparator for said
`retrieved document;
`retrieving said second Document Comparator for said
`Candidate Document;
`computing with said Document Comparator Function a
`numeric measure of a difference between said first
`Document Comparator and said second Document
`Comparator; and
`comparing said numeric measure against a predefined
`Document Difference Threshold.
`4. The method of claim 2, wherein each said Document
`Comparator comprises content of said each of a plurality of
`documents associated therewith.
`
`for said
`5. The method of claim 4, wherein a URI
`Candidate Document is equal to a URI for said retrieved
`document.
`
`6. The method of claim 2, wherein each said Document
`Comparator is computed by associating predefined portions
`of said each of a plurality of documents to a binary token.
`7. The method of claim 2, wherein each said Document
`Comparator comprises a list of significant words or phrases
`in said each of a plurality of documents.
`8. The method of claim 2, wherein each said Document
`Comparator comprises a Comparator for each of a plurality
`of predefined sections of said each of a plurality of docu-
`ments.
`
`9. The method of claim 2, wherein said step of selecting
`a Candidate Document comprises selecting from a Docu-
`ment Comparator Database.
`
`wn
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`10. A system for collecting information about document
`retrievals over the World-Wide Web, comprising:
`means for receiving a requesting user identity, requested
`Universal Resource Identifier (URI), and a content of a
`retrieved document;
`means for selecting a Candidate Document from a
`Retrieved Document Database, said Candidate Docu-
`ment associated with a Candidate Document Key;
`means for comparing said retrieved document to said
`Candidate Documentto determine a sufficiency of said
`Candidate Document;
`means for associating said retrieved document with a
`newly generated Retrieved Document Key if said Can-
`didate Documentis not deemed to be sufficient;
`means for adding said retrieved document
`to said
`Received Document Database; and
`meansfor adding a Log File Entry including said request-
`ing user
`identity, said requested URI, and said
`Retrieved Document Key.
`11. The system of claim 10, wherein eachofa plurality of
`documents in said Retrieved Document Databaseis associ-
`
`ated with a Document Comparator and wherein a first
`Document Comparator may be compared to a second Docu-
`ment Comparator using a Document Comparator Function.
`12. The system of claim 11, wherein said means for
`comparing to determine a sufficiency of said Candidate
`Document further comprises:
`means for computing said first Document Comparator for
`said retrieved document;
`meansfor retrieving said second Document Comparator
`for said Candidate Document;
`means for computing with said Document Comparator
`Function a numeric measure of a difference between
`
`said first Document Comparator and said second Docu-
`ment Comparator; and
`means for comparing said numeric measure against a
`predefined Document Difference Threshold.
`13. The system of claim 11, wherein each said Document
`Comparator comprises content of said each of a plurality of
`documents associated therewith.
`
`14. The system of claim 13, wherein a URI for said
`Candidate Document is equal to a URI for said retrieved
`document.
`
`15. The system of claim 11, wherein each said Document
`Comparator is computed by associating predefined portions
`of said each of a plurality of documents to a binary token.
`16. The system of claim 11, wherein each said Document
`Comparator comprises a list of significant words or phrases
`in said each of a plurality of documents.
`17. The system of claim 11, wherein each said Document
`Comparator comprises a Comparator for each of a plurality
`of predefined sections of said each of a plurality of docu-
`ments.
`
`18. The system of claim 11, wherein said means for
`selecting a Candidate Document comprises selecting from a
`Document Comparator Database.
`19. A computer program product recorded on computer
`readable medium for collecting information about document
`retrievals over the World-Wide Web, comprising:
`computer readable means for receiving a requesting user
`identity, requested Universal Resource Identifier (URI),
`and a content of a retrieved document;
`
`9
`
`IPR2024-00145
`Apple EX1022 Page 9
`
`

`

`US 6,185,614 B1
`
`9
`computer readable meansfor selecting a Candidate Docu-
`ment from a Retrieved Document Database, said Can-
`didate Document associated with a Candidate Docu-
`
`ment Key;
`computer readable means for comparing said retrieved
`documentto said Candidate Documentto determine a
`
`sufficiency of said Candidate Document;
`computer readable means for associating said retrieved
`document with a newly generated Retrieved Document
`Key if said Candidate Document is not deemed to be
`sufficient;
`computer readable meansfor adding said retrieved docu-
`ment to said Received Document Database; and
`computer readable means for adding a Log File Entry
`including said requesting user identity, said requested
`URI, and said Retrieved Document Key.
`20. The program product of claim 19, wherein each of a
`plurality of documents in said Retrieved Document Data-
`base is associated with a Document Comparator and wherein
`a first Document Comparator may be compared to a second
`Document Comparator according to a predefined distance
`metric.
`
`21. The program product of claim 20, wherein said
`computer readable means for comparing to determine a
`sufficiency of said Candidate Document further comprises:
`computer readable means for computing said first Docu-
`ment Comparator for said retrieved document;
`computer readable means for
`retrieving said second
`Document Comparator for said Candidate Document;
`computer readable means for computing with said Docu-
`ment Comparator Function a numeric measure of a
`
`10
`difference between said first Document Comparator
`and said second Document

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket