`Case 4:18-cv-07229—YGR Document 192-8 Filed 04/19/21 Page 1 of 15
`
`EXHIBIT 7
`
`EXHIBIT 7
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 2 of 15
`
`Proxy Servers and Databases for Managing Web-based
`Information
`
`Judi R. Thomson
`ETHER Group
`Department of Computer Science
`University of Saskatchewan
`57 Campus Drive
`Saskatoon, Sk. 57N 5A9
`Canada
`email: thomson cs.usask.ca
`
`Supervisors: Jim E. Greer & John E. Cooke
`
`Abstract
`
`Although tools for managing Web-based information are primitive, the number of institutions
`using the Web as an information system continues to grow rapidly. The World Wide Web has
`become a world-wide information system in spite of the lack of features common to most
`information systems. Many of the tools used to manage Web-based information utilize either a
`database or a proxy HTTP server to organize and classify information and user-specific data.
`This paper explores the use of these tools and proposes that the combination of proxy server
`and database is a more powerful management tool.
`
`1. Information Management
`
`The World Wide Web represents an information source that will change IT departments world-wide once
`difficulties associated with accessing and organizing the infoimation are addressed. Information systems are
`only useful when they make storing and finding information simpler and faster than placing a paper copy in a
`file cabinet. To fulfill that requirement, information systems must model the information structure, or impose a
`structure to ensure that the location and storage of infoimation is easy and repeatable.
`The Web is an
`unfocused arena with almost no structure [Ibrahim & Franklin, 1995]. In order to benefit from the Web's
`potential, infoimation managers must have tools to assist them with the structuring of Web-based material.
`Tools for IS administrators who wish to use the World Wide Web must facilitate the organization of Web-based
`material around some foci. We have created an experimental set of tools using a proxy HTTP server and
`database system and believe that this combination is a powerful way to solve some of the problems in using the
`Web as an information system. The tools facilitate the creation and viewing of topic-specific document
`collections using World Wide Web documents.
`
`A manager of information must have the means to organize and manipulate the infoli iation system, but also
`must manage the users of the information as well as the hardware and software resources used to process the
`infoimation. An infoimation system is beneficial only when it enhances data security and currency, enables
`shared access to the data, and increases speed and efficiency for both input and retrieval of data. If these goals
`cannot be met, users are better off with some other way to gain the desired infoimation. Currently the Web does
`facilitate shared access to widely distributed data but provides little in the way of facilities for management of
`users and resources.
`
`133
`
`QUALYS00112670
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 3 of 15
`
`Judi R. Thomson
`
`1.1 User and Resource Management
`
`User management is essential to ensure appropriate use of resources and time. Sometimes users cannot be
`permitted free access to an entire information system and some form of user authentication becomes necessary.
`Additionally, an infoimation system should provide a view of the infoimation that is most appropriate for each
`particular class of user. Traditional database systems have the notion of user view built in; other less foimal
`systems may need to provide different interfaces for different classes of users.
`
`Resources within an infoimation system context include data, people, hardware and software. The data must be
`protected from accidental loss or contamination. The number of people involved with the information system
`must be minimized to reduce costs and the hardware and software components must be utilized efficiently. A
`Web-based infoimation system also requires consideration of these factors, but the management of them differs
`from the traditional approach. The management of people, hardware and software for a Web-based system
`requires the understanding that many people will be requesting access to the infoimation and that many of them
`will be novice users. The software they use will typically be a World Wide Web browser that can be configured
`in any number of fashions. Many of them will be using computers over which the information system manager
`has little control. Most of the manager's user management decisions will be concerned with server support and
`network access.
`
`Computing resources are always at a premium. There are never enough computers and always too many people
`trying to use the available machines, and the cost of providing time on-line to a large organization can be
`prohibitive. Once on-line, increasing network traffic will likely mean that much time is spent waiting for pages
`to download. The waiting time increases when users unknowingly request pages that are irrelevant or unusable
`and then must begin again, making an alternate choice and waiting to see if it is available and useable. Waiting,
`whether for computers or web pages, is not an effective use of anyone's time, and at present many managers
`choose not to use the World Wide Web seeking more effective personal resource management.
`
`1.2 Information System Management
`
`Before an infoli iation system can be used, a model or representation of the information should be created. The
`model is examined and tested to ensure that it accurately reflects the information that is collected and that it does
`not violate any of the rules associated with safe storage and retrieval of the information. The model is then
`used by the information system to create storage space for the data. This modelling task is less simple than it
`sounds. Consider the problem of creating a database for a musician's music collection. At first glance it seems
`that a simple library-type representation of the music would suffice, however what if the musician wished to
`find all of the music with a specific type of chord. A traditional library system would not record that
`infoimation. Modelling the data requires that the modeller trade space and complexity considerations against the
`need for the database representation to closely match the real item. A user with information to add to the
`system fits the infoimation within the structure of the model and it becomes part of the database.
`
`Any infoimation system must provide tools for users with differing needs to insert, retrieve, modify and delete
`data as well as the opportunity to create new data models to use with entirely different sets of information. At
`least three different classes of information system users can be identified: those with information needs, those
`with information to store, and those whose task is to manage the stored infoimation. Traditionally the
`programmers who create the tools for these users are also considered a user of the information system [Date,
`1995]. Any user of the information system may fill more than one of these roles.
`
`A user who wishes to find information must first understand the features and limitations of the information
`system. Second, the user must describe the needed infoimation in some form understandable to the system,
`typically some sort of query. Only then can the information be found within the system. The user must
`understand the interface in order to successfully use the system and the interface is commonly tailored to support
`queries on the specific information in the database. Users are supported by the interface of the information
`system. The infoimation system is responsible for providing accurate results from the available data.
`
`134
`
`QUALYS00112671
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 4 of 15
`
`The management of an information system includes selecting the infoli iation that will be presented as well as
`evaluating that infoimation for suitability and authenticity. Once information is selected for use, its context is set
`by the fashion in which the material is organized. A user of an information system usually wants specific
`infoimation within a certain context or focus. For instance, an employee may wish to obtain the office phone
`number of a colleague. The employee has no need for information concerning the colleague's work experience,
`projects, personal interests, or family history. A useful infoimation system should create a view of the available
`information that hides the extraneous infoimation from the employee.
`
`Infoimation systems must contain data known to be accurate and authentic. Noll ially, all information within a
`database is managed by a single organization and authentication is fairly simple. Authentication of web
`documents is a different matter. Although modern browsers and servers can ensure that the infoimation
`ultimately received by the browser is the information originally requested, they can provide no assurances that
`the content of the page is original or accurate. Judgment about the accuracy of the infoimation on a page is left
`to the user and there are no ways to deteimine original authorship of infoimation contained in a Web page.
`
`An infoimation system is managed effectively when the data it stores is secure, protected from unauthorized
`alterations and accessible to the correct persons. Although the Web has few characteristics of a traditional
`infoimation system and lacks management tools, it is presently functioning as one of the world's largest sources
`of infoimation. Web-based infoimation management could be improved markedly with the addition of simple
`information management mechanisms.
`
`2. The World Wide Web as an Distributed Information System
`
`Even though the Web consists of separate collections of data about every imaginable topic, users often treat it as
`though it were a single database. Users of the Web expect that the same interface, a browser and search engine,
`will yield useful infoimation from every available set of data.
`More realistically, the Web is like a set of
`dissimilar infoimation systems for which a common interface has been created. The common interface has no
`knowledge of the structure of the component systems, because there is no underlying data model for Web-based
`infoimation. The Web-based interface can provide much less support for user queries than the implementation-
`specific interfaces found on more traditional infoimation systems.
`
`Most infoimation systems are collections of data that center around some commonality. For instance, a database
`might be the data collected from a single company or data collected about a profession
`(such as medicine). Users who desire information about medicine would not consult the company database nor
`would a manager looking for employee records look through the medical database. The segmentation of data
`into (mostly) homogeneous packages allows data base creators to define a model for the intended data that is
`reasonably accurate. The model is designed to capture the important aspects of the information within the
`database and to allow flexible, ad-hoc queries on the data. Infoimation placed on the World Wide Web cannot
`have any but the most rudimentary of models applied globally. The Web contains heterogeneous data in a
`bewildering number of formats. Anyone can place anything in the World Wide Web infoimation system. Any
`global model of Web infoimation would be nothing more than a super-class for different types of infoimation,
`similar to the paradigm for object oriented programming where all classes descend from one base object class.
`Efforts have been made to model hypeimedia data in order to permit efficient storage and retrieval of
`infoimation [Andrews et al., 1995] but none have yet gained wide acceptance and the models place additional
`constraints on how the data can be represented.
`
`Different types of users occasionally have conflicting requirements of an information system The data-modeller
`obviously requires low-level control over the information system including the ability to completely change its
`structure. The infoimation provider requires the ability to add and change infoimation but must be restricted
`from changing the underlying data model lest the system become unusable. The information seeker must be
`allowed to foimulate and submit queries to the system, but should be prevented from changing either the
`infoimation or the structure of the system. To further complicate matters, an individual user may be a provider
`in some cases, a modeller in a second case and a seeker at other times.
`
`135
`
`QUALYS00112672
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 5 of 15
`
`Judi R. Thomson
`
`Infouiiation systems typically support a variety of users by providing differing views on the information in the
`database. Although the base of information does not change, individual users see only that portion of the
`infoimation that is relevant to their tasks. Hypermedia systems rarely define specific views, although user
`authentication and the retention of state infoimation by servers and browsers allows basic views to be created
`[Maurer & Schneider, 1996]. A cache of World Wide Web documents is in some ways similar to a view of a
`database. The base of infoimation, the World Wide Web, is unchanged but individual users, or classes of users,
`see a portion of the Web as presented through the local cached copies. If the cached versions of the documents
`are enhanced with locally developed information the users are truly getting an individualized view of the
`infoimation. There are many reasons to create local webs, or intranets, within an organization ranging from
`security concerns to the dissemination of privately created materials. Many intranets make use of both internally
`developed documents and cached versions of external World Wide Web documents. Despite the benefits of
`local webs and intranets, their creation is time consuming. Maintenance of the resulting collections of documents
`is equally time consuming. As with any Web site, local webs require HTML editing tools, web servers, script
`processing capabilities, a fairly powerful computer, and someone who can keep the whole thing running. The
`result can be an extremely useful set of information but information systems managers have to be willing to
`create many materials from scratch, or to copy material from elsewhere. As we will see in Section 3.3,
`collections of Web documents can be managed by a proxy server-database combination and provide most of the
`advantages of a stand-alone Web site without the administrative overhead.
`
`Much energy and time have gone into enhancing World Wide Web security both in teims of providing security
`from unauthorized users and in providing assurances that documents are authentic and current [e.g. Netscape,
`1997]. The research has resulted in a much more secure information space and improvements are announced
`regularly [Fielding et al., 1996]. Although security represents a large aspect of infoimation systems, the issues
`surrounding security on the Web are too numerous to be adequately discussed here. Consequently, the
`remainder of this paper will concentrate on issues connected with the input, organization and retrieval of
`infoimation from the World Wide Web from the perspective of an infoimation provider. Some of the techniques
`discussed can also be used to enhance security of infoimation, to manage users and to improve infoimation
`retrieval. Although many of the concepts in the paper apply equally to organizations with intranets, or some
`other foim of distributed infoimation system, in the interests of space, they are not discussed.
`
`3. Management Tools for the Web
`
`Management of Web resources is not a new problem. Much research has been conducted towards creating
`solutions for resource management for the Web and other hypeimedia systems [Maurer & Schneider, 1995].
`Recent advances in browser-interpreted programs such as Java have done little to help with the management of
`Web-based infoimation systems. Many of the problems could be solved with the development of special
`purpose browsers, but Web browsers have become huge and proprietary. Writing a new browser involves re-
`inventing many features that exist in current browsers. As a result most researchers are concentrating on server-
`side solutions to Web-based infoimation management. The solutions that have been proposed for the server side
`of the Web can be categorized into: CGI solutions-often paired with a database, special-purpose Web servers,
`and proxies.
`
`CGI (Common Gateway Interface) provides a protocol for Web servers to communicate with external programs
`such as database servers [NSCA, 1994]. Often CGI scripts are used to provide access to database infoimation
`from a Web browser, or to create pages dynamically based on external information, either from files or from a
`database. Unfortunately, CGI scripts do not provide all the required tools. CGI scripts can simplify access to
`specific information and allow users to interact with a Web site.
`They usually do not help information
`managers to organize information or to manage the needs of different users. Scripts are typically written for a
`specific purpose and are often not reusable when needs change.
`
`Specialized HT TP servers, such as HyperG [Andrews et al., 1995], add navigational support to Web pages,
`monitor user interactions, and maintain a database of meat-information, which can be considered to be a data
`model, about the documents belonging to the server. Typically, specific-purpose HT TP servers are large systems
`requiring a fairly powerful computer system. They often define a special protocol for communication that
`
`136
`
`QUALYS00112673
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 6 of 15
`
`complicates interactions with the rest of the Web, possibly even requiring a special purpose browser.
`Additionally, the server must be contacted directly from the client to provide its extended services. Should
`users make a request to an alternate Web server they will no longer have access to the extended tools provided
`by the specialized server. The developers of Jigsaw, an object oriented HTTP server designed to retain meta-
`infoimation about resources, see promise in moving some of the server functionality to a proxy server [Baird-
`Smith, 1996] to provide an alternative to Web management that is entirely server-based or entirely client-based.
`
`Proxy HTTP servers provide both the means to monitor user interactions and add organizational links to Web
`pages. Users may access pages from any available HTTP server through the same proxy. A proxy server
`combined with a database can provide excellent support for Web-based infoimation systems. Both databases and
`proxy servers have been used to address different aspects of managing the World Wide Web. The remainder of
`this section will discuss some of the current uses of the two.
`
`3.1 Databases
`
`A conventional (usually but not necessarily relational) database used in parallel with a Web server can store
`infoimation about Web documents or can be used to store the content of the pages. The database can be closely
`coupled with the HTTP server or accessed via CGI scripts. A database-Web server merger allows quick,
`seamless communication to facilitate efficient searches on document structure as well as content [Munk & De
`Bra, 1996]. The database may be used to store meta-information about the Web document, such as structure or
`title, possibly including additional structuring infoimation about the Web pages in relation to one another. The
`structuring infoimation can be used to create additional hyperlinks within documents or to monitor the state of
`existing ones. This infoimation about links between documents called a link database and is often stored
`separately from the content of the document, allowing re-use of component documents in different contexts
`[DeRoure et. al., 1995]. If content is stored in a database, it can be used to create HTML formatted pages on-
`the-fly as requests are received by the Web server.
`
`If user infoimation can be procured, a database can keep it on hand and organized in a useful fashion.
`Automated processing provides detailed information about each user, which can be used to construct an
`individualized model of each user. The model is updated with each interaction a user has with the information
`system and is available to computer-based intelligent tutoring systems and adaptive hypermedia systems. The
`adaptive system can then present the most appropriate view of the information system for each user. A well-
`represented user model can allow an adaptive hypeimedia system to tailor the display of a collection of Web
`pages for individuals [Brusilovsky, 1996]. Presently it is difficult to reliably collect the user infoimation unless
`a special server or proxy server is used [see Section 3.2].
`
`Once infoimation is contained in a database and the database is set to interact with a Web server, interconnecting
`the two tools is an obvious requirement. Web database gateways are CGI scripts enabling communication
`between Web server and database system. A database gateway can create Web pages using infoimation
`exclusively from a database or from some combination of database information and existing HTML files.
`Sometimes this infoimation is presented dynamically as formatted by the gateway [Rasmussen, 1995; Lebing et
`al., 1996]. Sometimes the information is stored as static HTML pages which are periodically refreshed from the
`database. For example, lists of people, such as students in a class or a project group, can be created in this way.
`
`While Web database systems can provide tools to organize and structure World Wide Web material, in some
`areas they provide no assistance. Web-database systems do not ensure that users work with appropriate views of
`the Web, and they provide little assistance with data collection because the gateway has no knowledge about
`which data to collect. A proxy HTTP server can be used both for data collection and view management.
`
`3.2 Proxies
`
`A proxy HTTP server is a computer program that operates as an inteii iediary between the World Wide Web
`client and the HTTP server [Altis, & Luotonen,1994]. During a regular transaction between client and server, the
`
`137
`
`QUALYS00112674
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 7 of 15
`
`Judi R. Thomson
`
`World
`Wide
`Web
`Client
`
`World
`Wide
`Web
`Client
`
`World
`Wide
`Web
`Client
`
`World Wide Web
`
`Web
`Server
`
`Proxy
`HTTP
`Server
`
`Web
`Server
`
`Web
`Server
`
`Web
`Server
`
`Figure 1: Proxy HTTP Server
`
`client contacts the server directly, requesting a particular Web page. The server retrieves the page (or an error
`message if the page is unavailable) and sends it back to the client. A proxied communication is similar except
`that the proxy is in the middle. Thus, the communication sequence is client-proxy-server-proxy-client [Figure 1].
`
`Conventionally, proxy servers have been used to improve network performance and to provide secure access for
`persons using the Internet from behind a firewall. Secure access is simpler with a proxy server because only the
`proxying computer need be made secure to the world. Client computers that communicate via the proxy do not
`have to be secure computers. Network perfoimance can be improved because the proxy server can act as a cache
`for pages that are requested frequently or it can act as a load balancer which redirects requests to less heavily
`used sites. A redirecting proxy will decide which of several (identical) sites is the least busy and direct the
`incoming request appropriately [Brooks et al., 1995].
`
`Proxy servers add a level of indirection to the HTTP communication stream and can potentially delay the
`ultimate delivery of the requested document. Proxy servers can log information about users without their
`knowledge and can modify infoimation as it passes between client and server. Whether these features are
`perceived to be services or annoyances depends on the value added by the proxy for the user. For instance, if a
`proxy server creates a user log and monitors the access of each user for the purposes of providing an annotation
`service, the loss of anonymity is probably worth the added value of a shared annotation server. Likewise, if the
`proxy is changing the communication stream to select documents from a cache instead of a distant and
`congested network server, the reduction in time the user spends waiting is worth the overhead introduced by the
`proxy. It is possible, however, to envisage scenarios in which the proxy behaves in a much less helpful fashion.
`Users must balance the added value of the proxy server with the costs associated with its use.
`
`Within an organization, proxy HTTP servers can serve two functions. They can assist the management of Web-
`based information, and they can reduce the resources required to provide World Wide Web access to a number
`of users. With the implementation of HTTP 1.1 [Fielding et al, 1996], proxy servers will be able to authenticate
`users and provide services tailored to specific users. The use of a proxy server allows the use of any available
`Web client (browser) and pages from any server without losing the management features of the proxy. The
`proxy server does not need to make changes to existing HTML files to function properly.
`
`138
`
`QUALYS00112675
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 8 of 15
`
`Infouiiation management is often accomplished through the use of filters on the HTTP stream [Brooks et al.,
`1995]. The filters change the communication slightly to produce a better result for the user. Brooks [1996]
`identified three different styles of transformations that can be accomplished using filters: protocol
`transfoimations, request transfoims or content transformations. Protocol transfoimations and header
`transfoimations are often used to manage resources. Content transforms are more often used to manage
`infoimation. Filters can be built in to special purpose HTTP servers, but more often are included with proxy
`servers. The filter accepts fully specified HTTP requests from Web clients, retrieves the document from the
`Web, and returns the page to the user. The proxy filter may apply some processing to the communication, either
`to the request for a document or to the document and document headers.
`
`Numerous projects have been developed that are based on the idea of content filtering. One research project is
`devoted to providing tools to facilitate the development of proxy based filters [Brooks, 1995]. The Zipper system
`[Brown, 1996] infers the outline of an HTML document from the heading tags before it is passed to the user.
`The user can then choose to display the document in outline form or with some sections expanded and others
`closed. Perrochon and Kennel [1995] have used the structure of HTML documents in a similar fashion to create
`a proxy server that re-foimats HTML pages so they are better suited for use by visually impaired persons using a
`screen reader. In some cases, less information is desired by the user. WebFilter [Boltd, 1996] is a proxy-based
`system that removes undesired advertising from Web pages before displaying them.
`
`Resource management is most often initiated through the use of caching proxies [Abrams et al., 1995]. A
`caching proxy can modify the protocol of a request and cause the document to be retrieved from a local file
`system rather than over the network, reducing the cost to the organization for network access. Another aspect of
`resource management is load balancing. A proxy server can modify the request header to redirect a request from
`a busy web server to a less busy server containing the same infoimation [Brooks et al., 1995]. Large
`organizations commonly run several Web servers that are balanced using a similar process.
`
`One common use for proxy servers is found in education where proxies are used both to filter content and to
`filter requests. A learner's Web browser is connected to a proxy HTTP server and each request the learner
`makes passes through the proxy. The proxy may be be constructed to take some action if the learner strays from
`the educator's prepared document collection. The action taken is arbitrary and could range from denying access
`to anything but the prepared set of Web pages to simply notifying the learner that the requested page is not part
`of the course material. One implementation of a proxy HTTP server locks the learner's Web client to the
`educator's client, thereby allowing the educator to present a series of Web pages as a 'slide show' which students
`watch along with the instructor [Yeh et. al., 1996]. Similarly, a proxy server can be used to add value to existing
`World Wide Web pages. A request for the page is made to the proxy which then retrieves the desired page.
`Before the page is displayed to the user, additional infoimation, often in the form of navigational aides or
`annotations can be added [Hauck, 1996].
`
`A proxy HTTP server placed on an organization's network system not only provides potential information
`management but can also act as a resource manager through request and protocol filtering. If the filter is used
`to make complex decisions about the request or to add significant amounts of information to the document
`content it requires some sort of information management of its own. As previously discussed [Section 3.1]
`databases have been used successfully to organize information about Web documents. A combination of a proxy
`HTTP server and a database system proves most useful when managing and organizing World Wide Web
`information for an organization.
`
`3.3 Proxy Database Combination
`In some senses, each proxy server that adds value to Web documents, or that makes decisions based on some
`stored set of infoimation, is a proxy-database combination. The proxy server must maintain some data upon
`which to base its decisions and that data could be called a database, regardless of the method of storage. Proxy
`servers that manage a large amount of information are more likely to use some sort of database system to store
`the infoimation.
`For this discussion, the storage method of the database is irrelevant. Proxy-database
`combinations are simply proxy servers that manage their own set of data, which is used to filter requests and
`document content.
`
`139
`
`QUALYS00112676
`
`
`
`Case 4:18-cv-07229-YGR Document 192-8 Filed 04/19/21 Page 9 of 15
`
`Judi R. Thomson
`
`Proxy server-database combinations allow organizations to use a database for keeping data about both the Web
`pages that make up the desired set of material and about the users who interact with the material. The proxy
`HT TP server facilitates the collection of the data and can act as a librarian' by guiding users to appropriate
`materials. It can also augment the original Web page with infoimation from the database. The database can store
`and organize meta-information about the Web documents and infoimation about the organization of the
`document collection. The database can also provide the proxy with value-added infoimation such as annotations
`to display along with the original document.
`
`The proxy server can assist administrators by collecting statistics concerned with Web usage. In order to assess
`the resources necessary to support the Web based infoimation system, administrators need information about
`how the system is used. The required statistics could include the frequency with which pages are accessed,
`access patterns for specific classes of users (or specific users), interchanges with interactive pages (those
`containing foims), and a record of the time spent with different portions of the Web.
`
`Many systems exist that could be classed as a proxy-database combination. The delineation between a simple
`proxy and a proxy-database system is far from clear. Generally, any proxy server that adds a significant amount
`of information, or that makes decisions based on external data can be classified as a proxy-database system.
`Proxy-database systems can perform the same kinds of transfoims as other proxy servers, but the value added by
`the proxy can be greater because of the support from the database system.
`
`3.3.1 Request Filters
`
`Proxy servers have already been combined successfully with a database (in the guise of a separate server) to
`provided support for document annotation on the Web [Schickler et al., 1996]. Annotations allow users to share
`experiences and observations about document collections asynchronously. GrAnT (Group Annotation
`Transducer) was con