throbber
In Search of Reliable Usage Data on the WWW
`James Pitkow
`Xerox Palo Alto Research Center
`Palo Alto California 94304 USA
`pitkow@parc.xerox.com
`
`Abstract
`The WWW is currently the hottest testbed for future interactive digital systems. While much is understood
`technically about how the WWW functions, substantially less is known about how this technology is used
`collectively and on an individual basis. This disparity of knowledge exists largely as a direct consequence
`of the decentralized nature of Web. Since each user of the Web is not uniquely identifiable across the
`system and the system employs various levels of caching, measurement of actual usage is problematic.
`This paper establishes terminology to frame the problem of reliably determining usage of WWW resources
`while reviewing current practice and their shortcomings. A review of the various metrics and analyses
`that can be performed to determine usage is then presented. This is followed by a discussion of the
`strengths and weaknesses of the hit-metering proposal [Mogul and Leach 1997] currently in
`consideration by the HTTP working group. Lastly, new proposals, based upon server-side sampling are
`introduced and assessed against the other proposal. It is argued that server-side sampling provides more
`reliable and useful usage data while requiring no change to the current HTTP protocol and enhancing
`user privacy.
`
`1 Terminology
`Despite several efforts and widespread agreement on the need to establish a lingua-franca for usage and
`demographic collection, consensus does not exist [W3C 1996]. One of the more recent and
`comprehensive efforts [Novak and Hoffman 1996] provides a baseline of terminology specifically
`designed for the advertising and measurement communities. With the intent of generating consensus, this
`paper will build upon their framework for classifying visitors and terminology, clarifying and introducing
`new terminology as needed. Readers familiar with the notions of cookies, hits, and the problems local
`caches and proxy-caches have on reliably determining usage may wish to skip to the next section statistical
`analysis.
`
`1.2 Visitors and Cookies
`The central issues regarding the classification of visitors to a Web site are the ability to uniquely identify
`visitors and the ability to do so reliably, especially across multiple visits to a site. Visitors to WWW sites
`can be separated into the following categories: unidentified, session, tracked, and identified [Novak and
`Hoffman 1996]. For each class of visitors, a definition is provided followed by a discussion of the
`methods used to achieve each level of knowledge about visitors to a site.
`
`A unidentified visitor is a person who visits a Web site where no information is available about the visitor.
`This type of visitor does not truly exist on the Web since the Internet Protocol requires at least a machine
`name to return the requested information. This form of return address reveals information about the users
`in a similar manner to that of a phone number, where a one-to-one or a many-to-one correspondence may
`exist between the address and the number of users at the address. Unidentified visitors may indeed exist in
`other interactive digital systems, where a user’s anonymity is explicitly preserved. It is arguable that
`anonymous proxies, programs that acts as a intermediary between clients and servers, enable users to
`experience the Web in an anonymous manner. While this form of server may exist to a limited number of
`users, it does not scale well to handle the entire user population since it effectively doubles the amount of
`traffic required to request a page (by first sending a request to the anonymous server which then sends a
`second request to the actual server) and requires a fair amount of centralization of resources, another
`potential bottleneck.
`
`In Search of Reliable Usage Data on the WWW
`Proceedings of the Sixth International WWW Conference, 1997.
`
`J. Pitkow
`1
`
`Google Inc.
`GOOG 1023
`IPR of U.S. Patent No. 6,286,045
`
`Page 1 of 13
`
`

`

`A session visitor is a visitor to a Web site where an identifier is created either explicitly via cookie1
`generation or inferred through heuristics as discussed below. This is the default type of visitor on the
`WWW today. Several revealing pieces of information are typically available which enable the heuristic
`identification of users even if cookies are not used. With each request, information about the machine
`name from which the visitor made the request, the type of software the visitor is using to experience the
`WWW, the operating system on which the software operates, and the page viewed prior to the current
`request is typically known. The latter piece of information is called the referrer field. While this ancillary
`information may enable a user to be identified within a session, it is not guaranteed to be accurate nor will
`it be able to reliably identify the same user in future sessions. Some heuristics for identifying users without
`cookies include:
`
`• The use of Internet protocols to help determine if the user is the sole user of the machine making the
`request, e.g., identd, finger, etc. If a one-to-one correspondence exists between a visitor and a
`machine, the machine essentially becomes a unique identifier, and the visitor becomes a ‘tracked
`visitor’ as described below. These techniques fail if a visitor exists behind a proxy (as is the case if
`Figure 1), shares machines with other users, or the machine being used to experience the Web does
`not support or allow these protocols.
`
`• To uniquely identify users suspected of existing behind proxies (see Figure 1), session limits, the site’s
`topology (the global hyperlink structure across pages), and browser characteristics can be used. One
`such algorithm implemented in [Pirolli, Pitkow, and Rao 1996] checks that each incoming request is
`reachable from the set of already visited pages. This is done by consulting the site’s topology. If all
`subsequent requests are made to pages that the visitor could have reached by selecting a hyperlink
`embedded in any of the already requested pages, the user is assumed to be the sole visitor behind the
`site. If requests from the same machine name occur for pages that are not reachable from the set of
`hyperlinks embedded in the pages already visited, multiple visitors are suspected. Multiple visitors
`are also suspected when pages are requested that have already been visited. The algorithm treats these
`cases as separate visitors and adds subsequent page to each visitor’s path based upon the topology of
`the site. A least recently used policy is used to add pages to visitors if ambiguity exists between which
`user could have made the request. Visitors who do not request pages within a certain time limit are
`assumed to have left the site. Appropriate time-out periods are typically determined by inspection of
`the distribution of time between all page requests to a site. While the above algorithm performs
`reasonably well, it is heuristic in nature and has not been shown to reliably identify users with any
`measure of accuracy, especially across sessions.
`
`A tracked visitor is a visitor who is uniquely and reliably identifiable across multiple visits to a site. In the
`earlier days of the Web, the tracking of visitors was often accomplished by inserting identifiers into the
`URLs issued by the server and channeling all subsequent requests through a CGI script. Not only was this
`method expensive computationally to the sever, but it defeated intermediary caching and did not correctly
`handle the exchanging of URLs between people, i.e., the person using a URL of this sort mailed to them by
`a friend could be incorrectly tracked as the friend. These days, this form of identification is typically
`accomplished by setting the expiration of an issued cookie into the far future. Thus, each time a visitor
`returns to the site, the same identifier will be used.
`
`The increased use of this technique to track users has not been without notice by the user community,
`where rather rudimentary but effective practices have emerged that periodically erase cookies stored on
`the visitor’s filesystem or do not permit the storing of cookies between sessions by disabling write
`
`
`1 Cookies are server generated identifiers that enable the management of state between visitors and
`servers. They were initially designed to implement shopping baskets for the Web but have found a killer
`application in the tracking of user behavior. When a visitor requests a page, a server can return an
`identifier, a.k.a. cookie, with conditions on when and how the identifier is to be used.
`
`In Search of Reliable Usage Data on the WWW
`Proceedings of the Sixth International WWW Conference, 1997.
`
`J. Pitkow
`2
`
`Page 2 of 13
`
`

`

`permissions to the appropriate files. These practices have the effect of causing the site issuing the cookie
`to issue another identifier, resulting in potential over-inflation of the number of unique visitors to the site.
`Commercial software that performs cookie obfuscation is also emerging, e.g., PGPCookie.Cutter [PGP;
`1996].
`
`An identified Visitor is a tracked visitor where additional information is available. This is the most
`common type of visitor when persistent identifiers are employed to monitor usage. While it may appear
`that tracked visitors would be more common, additional information as mentioned in the above section of
`session visitors accompanies each request. Rough estimates of the demographics of a user can be made
`since the entity that owns the domain from which the request was issued can be determined somewhat
`reliably from InterNIC’s publicly accessible domain registration database. Once the entity have been
`established, this information can be matched against other databases that enable the construction of user
`profiles. For example, if a request comes from a visitor behind a corporate proxy named xyz.com, via
`InterNIC’s database, one can discover that the actual company that owns that domain is FooBar Corp., and
`then lookup information on FooBar Corp. in other sources. Many of the commercially available log file
`analysis programs come with databases that match domain names to core demographics, e.g., Interse
`[Interse 1996].
`
`Of course, the most obvious method of collecting additional demographics of users is by asking the actual
`users of the site. This is routinely accomplished via online registration forms. However, GVU’s most
`recent WWW User Survey data show that 33% of the over 14,500 respondents have falsified the
`information for online registration forms at least once [Pitkow and Kehoe 1996]. Over 10% reported that
`they provided incorrect information over 25% of the time. Although online registration is currently
`common practice and will remain so for the foreseeable future,
`
`the information collected from online registration systems needs to be thoroughly
`examined on a per-site basis before reliable statements about the users of a site can be
`made from this information.
`
`Other methods for collecting demographics of users at a site include Universal Registration Systems, e.g.
`I/PRO’s I/COUNT [I/PRO 1996]. These systems require a user to register only once in exchange for an
`identifier. This identifier can then be used across the set of participating sites. While the goal is to 1)
`make registration easier for users and 2) provide sites with valuable demographic information, scalability
`issues have hindered the development of most of these types of systems, and as such, few Universal
`Registration Systems exist in practice today.
`
`The above classification of visitors and the accompanying definitions should provide the necessary
`framework to move forward and provide definitions for the terms that are used to measure the items
`visitors request.
`
`1.2 Accesses and Caches
`When a visitor requests and item from a Web site and the server returns the item to the user, an access is
`said to have occurred. In the common speak of the Web, this event is fondly referred to as a ‘hit.’ It is
`widely recognized that the reporting of hits as a measure of usage is meaningless for the following reasons.
`First, when a user accesses a page, the client sends a request to the server in the form of a URL. If the
`item being requested is an HTML page, it may include other URLs embedded into its content, which are
`often images, audio files, video files, applets, as well as other text. When the client gets the requested
`HTML page back from the server, it reads the contents of the page and makes subsequent requests for the
`embedded URLs. Thus, a page with three embedded images results in a total of four hits as measured by
`the server (one hit for the HTML page, and one hit each of the three images). Since the composition of
`pages differs within a site as well as across sites, the comparison of hits reported by different sites is
`useless. For example, in Figure 1, when Visitor 1 requests Page A, Server Z will record four hits but
`Server Z will only record two hits when Page C is requested. What sites really want to compare is the
`
`In Search of Reliable Usage Data on the WWW
`Proceedings of the Sixth International WWW Conference, 1997.
`
`J. Pitkow
`3
`
`Page 3 of 13
`
`

`

`number of pages requested by visitors, which is often called page views. Page views can be crudely
`measured by excluding non-HTML requests from the tabulation process.
`
`The second major problem with hits occurs because of the various levels of caching that occur on the Web.
`Caching is a mechanism that attempts to decrease the time it takes to retrieve a resource by storing a copy
`of the resource at a closer location. Prior to the fall of 1994, most browsers did not include local caches.
`As a result, each time a page was accessed by a user, a separate page request would need to be issued to
`the original site and the contents returned each time to the user. To remedy this inefficient situation,
`browser software began to integrate caches that utilized the user’s local disk space as well as in-memory
`caches that stored pages in memory rather than on disk.
`
`For example, in Figure 1, when Visitor 1 requests Page A, Page A is stored on Visitor 1’s computer in the
`local cache. When Visitor 1 goes off to visit Page B and then returns to Page A via the ‘Back’ button, a
`request to Site Z does not need to be made. Visitor 3 on the other hand would need to issue two requests
`for Page A to perform the same task. Upon tabulation of the access log for Site Z, one would find two
`different totals for the same navigation: Visitor 1 would record only one page view for Page A whereas
`Visitor 3 would record two page views. Which one is correct? Depends on the definition of a page view.
`Since local caching has become the standard in browsers and users can select different cache management
`policies, page views typically are defined to only reflect that the page was viewed at least once. Given this
`frame work, unique page views represent the first time a user requests a page in a session. Reuse page
`views refers to the total times the page is viewed minus the first page view., with total page views being the
`sum of the two. Without mucking with the configuration of the server to defeat local caching, it should be
`clear that no guarantees can be made about the measurement of total page views.
`
`Visitor 1 & 2
`with Local
`
`Visitor 3 without
`Local Cache
`
`Structure of WWW Server Z
`
`Proxy/Cache
`
`Page A
`
`Page B
`
`Page C
`
`Page D
`
`Page A
`
`Page B
`
`Page C
`
`Page D
`
`WWW Server Z
`
`Time
`
`Legend
`
`WWW Page
`
`Embedded Media
`
`Hyperlink
`
`Request
`
`Figure1. This figure represents a visitor making requests for Pages A, B, C, and D at Site Z. Another
`visitor is shown making a simultaneous request for Page D. The topology of the site is depicted in the
`upper right hand corner. Since Visitors 1 and 2 are both behind a proxy-cache, their requests can not
`be differentiated without the use of either cookies or heuristics. It is also unclear from the requests
`make to Server Z whether Visitor 1 requested Page D by selecting the hyperlink on Page A or the
`hyperlink from Page B.
`
`The other form of caching that occurs on the Web happens at the proxy level. The scenario is very similar
`to the one of local caches, but in this case, since the cache is shared by many users, the server may only
`
`In Search of Reliable Usage Data on the WWW
`Proceedings of the Sixth International WWW Conference, 1997.
`
`J. Pitkow
`4
`
`Page 4 of 13
`
`

`

`receive one page request even though the page was viewed by numerous users behind the proxy. In Figure
`1, if Visitors 1 and 2 request the same pages with Visitor 2 making the requests at a later time than Visitor
`1, the proxy-cache would only need to issue one request per page to Sever Z and use the copy stored
`locally to handle requests by Visitor 2. Note that combinations of local caches and proxy/caches can exist
`as well as multiple proxy/caches chained together. In order to help control the caching of pages by local
`caches as well as proxies, the HTTP has evolved to include cache specific headers.
`
`A common resource intensive solution to the problem of reliably determining pages views and users is to
`use the cache specific headers in HTTP to effectively defeat all attempts at caching pages. Despite the
`reasons for employing such an approach, this technique is commonly referred to as cache-busting. This is
`accomplished in several ways including sending “Cache-control: proxy-revalidate”, or “Expires: <past
`date> headers. In order for these attempts to be effective, the caches that are incurred along the way must
`cooperate, that is, they must obey the headers. Unfortunately, there is not way to ensure that all caches
`(local as well as proxy-caches) on the WWW cooperate. As a result, even with attempts to defeat caching,
`an unknown number of users and page views can occur without the knowledge of the originating server.
`This causes the total number of page views to be inaccurately reported and therefore should not be
`considered a trustworthy measure.
`
`From this discussion it ought to be clear that the reliable gathering of visitor and page usage is not an easy
`task. The most widely used solution to these problems include the issuing of cookies to identify visitors
`and the defeating of caching to determine page views. This is only half of the picture however. Once the
`data is collected, it must be analyzed before it is of any use. The next section describes the various flavors
`of statistics one might want to apply to the gathered usage data.
`
`2 Statistical Analysis
`Since page requests are discrete events, they afford several forms of statistical analyzes, with descriptive
`statistics being the most rudimentary and most widely used for analysis of usage data on the WWW today.
`What one gathers from descriptive statistics of a set of events are the frequency, mean, median, mode,
`minimum, maximum, standard deviations, variance, and range. For accesses to WWW sites, the
`frequency of page views is the most commonly reported statistic, though as noted in the above section,
`this statistic may be difficult to determine reliably. This form of event analysis can occur on the page level
`as well as on the site level for all of the different types of visitors mentioned above.
`
`2.1 Temporal Analysis
`While knowledge of how frequently an event has occurred, it tells us nothing about the interactions
`between events. Temporal analysis of page request events can reveal several interesting metrics but
`assumes that visitors can be uniquely identified within sessions (session visitors, tracked visitors, or
`identified visitors). This requirement exists since the sequence of events is needed to construct the
`temporal ordering of page requests. The following discussion also assumes that the complete and correct
`sequence of page request is know, though as described above, this is often not possible due to local caches.
`
`At the page level, the time spent reading a page, or reading time, can be measured as the inter-arrival time
`between the request for the page and a subsequent page request. In Figure 1, the reading time for Page A
`by Visitor 1 is the distance along the x-axis (time) between the request for Page A and Page B. This
`measurement is subject to a fair amount of noise as the visitor’s behavior can not be always be accurately
`determined, e.g., the visitor could be grabbing a cup of coffee, talking on the phone, or actually reading the
`page. Another problem is determining when users leave the site since HTTP only sends “get me this page”
`messages and not “I’m leaving this page” messages. However, if a statistically valid sample size is
`determined and the data collected, reasonable statements about the reading times of pages can be made.
`[Catledge and Pitkow 1995] first reported a session time-out period of 25.5 minutes, which was 1 ½
`standard deviations from the mean of 9.3 minutes between user interface events. A time-out period of 30
`minutes has become the standard used by log file analysis programs, e.g., Interse [Interse 1996] and I/PRO
`In Search of Reliable Usage Data on the WWW
`J. Pitkow
`Proceedings of the Sixth International WWW Conference, 1997.
`5
`
`Page 5 of 13
`
`

`

`[I/PRO 1996]. This value should be determined individually for each site being measured. At the site
`level, temporal analysis can reveal other useful metrics. The total duration of the visit, or session length,
`measures the amount of attention spent by visitors to the site. The time between visits, or inter-visit
`period, helps determine the periodicity with which users revisit the site. There are, of course, other
`interesting analyzes that can be performed with temporal data, including survival analysis, self-similar
`processes, auto-regression, etc.
`
`2.2 Path Analysis
`The goal of path analysis is to understand the sequence of page views by visitors. Although the
`connectivity of pages in a Web site mathematically is best represented as a graph, it is not uncommon for
`designers and users to conceptualize the space hierarchically as a tree. In this representation, the
`establishment of parent-child relationships is usually done via a depth-first traversal of the pages. Once
`this has been done, it becomes possible to talk about the depth of the tree, the number of internal and leaf
`nodes, and the branching factor of the tree. With a tree representation and information about the paths
`people take through the site, the average depth visitors take can be determined, as well as the average
`number of internal and leaf nodes accessed. If one assumes that internal nodes typically serve to facilitate
`navigation and leaf nodes tend to represent content, this metric provide insight into how much time users
`are navigating verses ingesting content. If link typing were to ever become widespread on the Web, this
`metric could be determined more accurately.
`
`Of equal interest to the paths people take is where and with what frequency they enter and leave the site.
`Entry points [Pirolli, Pitkow, Rao 1996] can be identified by looking for differences between the sum of all
`the incoming paths to a page and the total number of requests made for the page. Large differences
`indicate that visitors are not relying completely upon the local topology to access the page. Likewise, exit
`points can be identified by looking for the last element in the path sequence as well as clues in the
`frequency of path traversal verses page requests. Descriptive statistics can be generated for each of these
`metrics on a per page basis.
`
`We know that certain users will visit a page and not continue traversing the hyperlinks contained in that
`page. Others, however, will proceed to traverse the presented links, thus continuing down a path. Attrition,
`introduced initially with respect to the WWW by [Pitkow and Kehoe 1995], can be understood as a
`measure of visitors who stop traversing verses the visitors who continue to traverse the hyperlinks from a
`given page. Attrition is typically calculated across a group of visitors, although it can be applied to
`individual visitors as well. Attrition curves are defined as the plot of attrition ratios for all pages along a
`certain path.
`
`Other methods for analyzing hypertext paths include Markov Chain analysis [Guzdial 1994], Pathfinder
`Networks [Schvaneveldt 1990], and subsequence analysis [Catledge and Pitkow 1995][Tauscher 1996].
`Path information can also be used to cluster users as well [Yan, et al. 1996].
`
`While all these analyses provide insight into the behavior of visitors to a site and the usage of the site’s
`resources, they are only as good as the data they analyze. While cookies provide an already consistently
`implemented approach to identifying users, this is not the case for gathering reliable page views, temporal,
`and path information. This inability to gather solid data is the cause for much concern by the
`measurement community [W3C 1996]. With the aim of rectifying this situation, several proposals have
`been made, with the nit-metering proposal being discussed in the next section in light to the types of
`analyses reviewed in this section.
`
`3 Proposed Solutions
`Several solutions exist that are specifically targeted towards increasing the reliability and amount of usage
`information known to servers. The two most notable are [Hallam-Baker 1996], which proposes complete
`forwarding of access data from proxy-caches, and [Mogul and Leach 1997], which proposes a limited form
`In Search of Reliable Usage Data on the WWW
`J. Pitkow
`Proceedings of the Sixth International WWW Conference, 1997.
`6
`
`Page 6 of 13
`
`

`

`of usage reporting by proxy-caches. Unfortunately, discussion and interest around [Hallam-Baker 1996] by
`the HTTP Working Group has not occurred in over half a year and as such, its RFC status has been
`withdrawn. This, plus space considerations, limit the discussion by this paper to [Mogul and Leach 1997],
`which is commonly referred to as the hit-metering proposal.
`
`Recently, [Mogul and Leach 1997] have generated a rather elegant proposal for hit-metering. Their work
`is aimed at removing the practice of cache-busting by sites wishing to gain an accurate count of the usage
`of the site’s resources. The draft calls for the implementation of a new HTTP header, called “Meter”, that
`enables proxy-caches to report usage and referral information to originating servers. Additional extensions
`permit the originating server more control over the use of cached data by limiting the number of times a
`proxy-cache returns items before requesting a fresh copy. One of the argued side-effects this system if
`implemented, would be reduced network traffic as well as a reduction in server resources. The closest
`approximation of their work outside of the Web occurs in the field of advertising and the measurement of
`hard-copy publications. In that field, publishers try to determine the number of users who read a
`publication at least once. It does not try to measure the number of times users read the publication overall.
`Adhering to the same model, [Mogul and Leach 1997, page 5] specify a system whose “goal is a best-
`efforts approximation of the true number of uses and/or reuses, not a guaranteed exact count.”
`
`The proposal outlines a system where cooperating proxy-caches volunteer to keep totals of the number of
`uses for each page and the number of times each page is reused2. Cooperating proxy-caches can be arrange
`in an hierarchical manner rooted at the originating server, forming what the authors call a “metered
`subtree.” Mechanisms are included in the draft which keep the counting and forwarding of page use
`information consistent throughout the subtree. When the originating sever encounters a proxy-cache that
`volunteers to meter usage, it can turn cache-busting efforts off and allow the proxy-caches to cache
`resources, i.e., actually allow the proxy-caches to do what they were intended to do in the first place. The
`tabulated metering information is then periodically forwarded to the originating server as determined by a
`set of rules that depend on the type of requests being issued by clients and upon deletion of the resource
`from the cache. The originating server is also able to specify limits on the number of times a resource is
`used by the proxy-cache before requesting a fresh page from the server. It is stated that this can be used to
`bound the amount of inaccuracy between reports.
`
`This proposal has many strengths. First, it does not require significant changes to the existing HTPP 1.1
`protocol [Fielding et al. 1997]. However, changes to the server and proxy software to implement the
`behavior behind the new Meter header are required. Second, it appears that it does no worse a job at
`reporting descriptive statistics of usage than the current practice of cache-busting, and given cooperating
`proxy-caches, could arguably reduce network traffic considerably, freeing up valuable resources. Third,
`distinctions are made and reported between unique page views and reuse page views by users behind
`proxy-caches, a distinction that cache-busting techniques are not able to do reliably without ancillary
`techniques that generate unique identifiers for each user, e.g., cookies. For these reasons, the proposal
`does appear to satisfy the goal of a best effort system.
`
`However, the proposal suffers from several potentially critical weaknesses. First, the system depends upon
`cooperating proxy-caches. This cooperation is something that can not be forced to happen, and as a result,
`may not happen. Since cooperation can not be enforced, originating servers may still be tempted to
`implement cache-busting techniques for non-cooperative proxy-caches. Much seems to depend upon the
`level of trust WWW site owners have in the system. However, no mechanism exists within the proposal to
`determine the level of trust to place in the system. It therefore becomes difficult to determine the extent of
`resources that would be saved by this proposal and what level of cooperation would be required. Given this
`
`
`
`2 Couched in the terminology outlined above, “uses” are unique page views, and “reuses” are reuse page
`views.
`
`In Search of Reliable Usage Data on the WWW
`Proceedings of the Sixth International WWW Conference, 1997.
`
`J. Pitkow
`7
`
`Page 7 of 13
`
`

`

`uncertainty, one can not say whether it will scale well as the size and the complexity of the Internet
`increases.
`
`Second, the hit-metering proposal does not enable the collection of temporal statistics or path statistics,
`which are collectable when cache-busting techniques and other proposals (see below section on sampling)
`are used. This poses a serious threat to sites wishing to do market research beyond merely the number of
`page views.
`
`Third, the authors make claims about the ability of their hit-metering solution to address the problem of
`obtaining reasonably accurate counts of the number of users of a resource, not just the number of uses.
`They argue that the separation of unique page views from reuse page views affords the unique
`identification of users behind proxies. While they recognize that scenarios exist where their solution
`would over-report the number of users, they assert that they do not believe this to be a significant source of
`error. However, this assertion is not supported by any empirical data. Further research is needed before
`the validity of this assertion can be determined.
`
`Finally, as the authors themselves acknowledge, no guarantees can be made that the reported usage will
`accurately reflect actual usage. While this is not a problem in and of itself, we do after all live in a world
`of approximations, their system does not enable the amount of error imposed by the system to be
`measured. This means though that the origin sever can not at the end of the reporting cycle determine if
`the usage reported is 100% correct, 75% correct, or 50% correct. As a result, the amount of error in the
`system could be any of the above, or all of the above, but on different days under different conditions.
`This is undesirable from a research and marketing perspective, since the claims can not be made about
`accuracy. Suppose a site notes an increase of 5% in traffic from one week to another, under this scheme,
`there is no way to say for sure that traffic actually increased by the stated amount—the variation could
`have occurred from any number of sources, all of which are impossible to determine using the hit-metering
`proposed solution.
`
`Some of the variability in the system exists because the server can not control the time a resource will be
`deleted fr

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket