Second International WWW Conference - Chicago, Oct 17-21/94
`Providing Data on the Web: From Examples to
`Carlos A. Varela, Caroline C. Hayes
`Department of Computer Science
`University of Illinois at Urbana-Champaign
`The World-Wide Web provides access to a global information universe using available technology
`[Berners-Lee et al. 1992]. In order to fully realize the benefits of this information system, we are
`developing a system, Zelig, to provide on-the-fly access to databases and dynamic information through
`effective user interfaces [Varela and Hayes 1994].
`In this paper, we have extended Zelig to generate code for performing conversions from fixed data
`formats into hypertext. Consequently, information providers only need to give examples of their current
`database reports and the desired hypertext to be generated for those particular examples. Zelig produces
`the program to extract relevant data from the reports and the schemata to drive the hypertext generation
`process. We include as an example, an interface to ph/qi, the CCSO nameserver software providing data
`for academic institutions around the world.
`1. Introduction
`The World-Wide Web offers easy access to a universe of information by providing links to
`documents stored on a world wide network of machines in a very simple and understandable fashion.
`Much of its success is due to the simplicity with which it allows users to provide, use and refer to
`information distributed geographically around the globe. Another important feature is its compatibility
`with other existing protocols, such as gopher, ftp, netnews and telnet. Furthermore, it provides users
`with the ability to browse multimedia documents independently of the computer hardware being used.
`The World-Wide Web is based on the HyperText Transfer Protocol, HTTP, and the HyperText Markup
`Language, HTML. HTTP is a generic object-oriented stateless protocol to transmit information between
`servers and clients [Berners-Lee 1992]. HTML is a simple, yet powerful platform-independent
`document language [Berners-Lee and Connolly 1993].
`When the documents to be published are dynamic, like those resulting from queries to databases, the
`hypertext needs to be generated. For this purpose, there are scripts, which are programs that perform
`conversions from different data formats into HTML on-the-fly. Even tough for fixed data formats these
`scripts may be simple, providers need them to be able to publish their data on the Web. Furthermore,
`even basic changes to the data formats or the generated HTML, imply changes to these scripts.
`To overcome these problems, Zelig generates scripts that base their HTML generation on schemata
`[Varela 1994]. In this research we extended Zelig, to additionally produce code for performing
`conversions from fixed data formats into HTML. There are two main stages in this conversion process:
`Second International WWW Conference - Chicago, Oct 17-21/94
`extracting database, record and field information from your traditional database report; and instantiating
`that categorized information along with the query information into a particular schema.
`Using Zelig to provide access to dynamic information in a fixed format, providers only need to give
`examples of their current text reports and the desired hypertext to be generated for those particular
`examples. Our system, Zelig, produces the program to extract relevant data from the reports and the
`schemata to drive the hypertext generation process. Thus, it becomes easier to provide effective user
`interfaces to dynamic information in the World-Wide Web.
`In the next section, we elaborate more on the server-client model used by the World-Wide Web, and the
`functionality of scripts. In section 3, we highlight the problems faced by providing WWW access to
`dynamic data. In section 4, we explain the architecture of Zelig, our system that performs schema-based
`HTML generation. In section 5, we demonstrate the ideas presented with a gateway to the CCSO ph/qi
`nameserver databases. Finally, in section 6, we give some conclusions and results.
`2. Background
`2.1. The World-Wide Web: A Server-Client Model
`The World-Wide Web consists of a network of computers which can act in two roles: as servers,
`providing information; or as clients, requesting for information.
`Fig 2.1. Server-Client Architecture [Berners-Lee and Cailliau 1992].
`This communication is performed under the stateless HTTP protocol. In a stateless protocol, connections
`are created, processed and closed without keeping state information. The server actions depend on
`predefined methods such as GET, POST, PUT, and DELETE.
`The resulting information can be served in different format types and it is the client's responsibility to
`present it in a consistent and clear manner. The most common format is HTML, which contains
`information, and its logical structure; but leaves out those details particular to specific browser
`Second International WWW Conference - Chicago, Oct 17-21/94
`It is important to note that a server can provide static documents to the clients, but it could also provide
`transparent access to databases or other information sources. In other words, the clients can also request
`for specific queries that should be processed by the server. Scripts or gateways take care of this
`processing. These are programs that communicate with the WWW server software under a predefined
`interface. The most common currently used interface is the NCSA's Common Gateway Interface, CGI
`[McCool 1993].
`2.2. Scripts: WWW Gateways to Databases
`Scripts are CGI compliant programs that act as clients to the applications owning the data and
`produce the corresponding hypertext for the requested information. They communicate with the WWW
`servers through an interface, in this case CGI, which establishes how to pass the information from the
`WWW client to the script and from the script back to the WWW server and subsequently to the WWW
`Fig 2.1. Purpose of a script and a WWW server [Berners-Lee and Cailliau 1992].
`These scripts are written in any programming language (like C, C++ or PERL) and their main functions
`together with the WWW server are:
`l To receive the information from the WWW client under the hypertext transfer protocol, HTTP.
`l To perform a query for the database server, allowing the WWW server to act as a database client.
`l To parse the database server results.
`l To generate an HTML document and send it to the WWW client.
`3. Providing Dynamic Data on the Web
`After the short introduction in the previous section to the mechanisms under the Web, let's see
`why we want to automate the script creation process:
`l To provide access to many information sources that are currently using non-hypertext formats.
`Second International WWW Conference - Chicago, Oct 17-21/94
`There are many fixed data formats for which we would need to create different scripts. For
`example: phone information, bibliographic databases (BibTeX), papers (LaTeX), electronic mail,
`server administration usage statistics (logs), UNIX manual pages, file directories.
`l To more easily design hypertext interfaces to databases by making changes at the level of
`schemata, as opposed to modifying, recompiling and retesting scripts.
`l To create evolving user interfaces, by instantiating schemata to most common accessed fields for
`queries, and changing the order or level of the different user interface actions.
`l To increase the functionality of the data management system. For our phone example, in section 5,
`we have included sorting records, which was not in the original ph/qi functionality.
`l To reuse the schemata across different databases to provide similar look-and-feels for users.
`4. Zelig: From Examples to Programs
`Figure 4.1 shows a general framework for Zelig. Scripts generate HTML reports based on
`instantiating schemata to the query info and the categorized database output. The schemata can be taken
`from a library, or generated from HTML report examples. The query info is created by the HTML
`Query Form, which is provided by the application designer. This information is given to the traditional
`database manager system which returns a report in a fixed format. This report is parsed and relevant
`information is extracted and categorized. The resulting HTML Report can contain links to more
`information on particular records, or even additional HTML Query Forms for more database processing.
`Fig 4.1. A General Framework for Zelig.
`In section 4.1, we see how the HTML instantiation process takes place. In section 4.2, we see how to
`extract Query Info from the HTML Query Form, how to extract Categorized Output from the traditional
`DB Report, and how to abstract a schema from user-given HTML Examples.
`Second International WWW Conference - Chicago, Oct 17-21/94
`4.1. Schema-Based HTML Generation
`In traditional HTML generation, user interfaces are created by scripts directly. This implies that
`changes to interfaces have to be performed at the level of the source code of the script. We present a
`methodology based on schemata, to allow designers to debug and maintain the user interfaces without
`directly changing the scripts.
`In this section, we will explain the information that is provided by the schemata to the scripts for
`document generation. Then, we will give a description of ZHTML, the language to write these
`schemata, which is an enhancement to HTML incorporating directives for database interface generation.
`4.1.1. Instantiating schemata
`The scripts base their hypertext generation not only on the current parsed database query results,
`but also on existing ZHTML schemata. We can further categorize this information as:
`l Current database transaction results. These are the results generated by a query to a database.
`We will see in section 4.2 how to interact with the database server, parse its results, and
`standardize the query information to the following taxonomy:
`¡ On an application level.
`The most general database information. For example: name, number of tables, default query
`¡ On an object level.
`The information for a specific object of our application. For example the table PEOPLE
`may have: name, number of records, number of fields, default field for queries.
`¡ On a field level.
`The most specific information about a query for a given record. For example: field names,
`default field values, current field values, length, type, and access policy.
`l Existing ZHTML schemata. These are generated either from HTML report examples given by
`the interface designer, or taken from a library. Some library examples are:
`¡ ISINDEX-based user interface schema.
`This schema is based on the HTML <ISINDEX> tag. It is important to note that there are
`many indexed databases in the Web that use WAIS as the basis for database search.
`¡ Forms-based user interface schema.
`This schema makes use of the forms-based WWW browsers to provide a more friendly user
`interface with menus and widgets to perform the most common database operations.
`¡ Application-specific user interface schema.
`This schema is usually an advanced interface tailored to the specific needs of a database
`user interface. It still allows evolution, in the sense that certain constructs don't imply any
`order or level of access in the generated HTML documents.
`4.1.2. ZHTML Language Description
`A ZHTML schema is an HTML document which has been annotated with comments, which are
`used as directives for the script. These comments are parsed and executed by the script, and the resulting
`text is placed instead of the original comment. This is performed at run-time, using the current database
`query results. Future work includes writing script code generators departing from these schemata.
`Second International WWW Conference - Chicago, Oct 17-21/94
`The ZHTML comments are similar to HTML constructs. They are generally of the form:
`<!ZTAG> ZHTML body <!/ZTAG>
`There are several Zelig directives with different functionality, including: print a variable value, run an
`external function printing its output, conditionally include the ZHTML body, traverse all the current
`database records invoking Zelig recursively on the ZHTML body and traverse all the fields in a specific
`Following are the main constructs of this Document Type, even though a formal Document Type
`Definition (like the DTD shown for HTML in [Berners-Lee & Connolly 1993]) is still in progress:
`l <!ZPRINT variable>
`returns the current value of an HTML form variable, or an application-level variable. If the
`variable is object-level or field-level, then it needs to be in the scope of a ZFOR tag.
`l <!ZRUN external_fn>
`runs the script function external_fn and returns its output.
`l <!ZIF cond-expr> ZHTML body <!/ZIF>
`returns the output from ZHTML body if cond-expr is true. It returns the null string otherwise.
`l <!ZFOR TYPE=traversal_type> ZHTML body <!/ZFOR>
`traverses all the tables, records or fields of the current query, depending on the traversal_type
`(which can be the value TABLES, RECORDS, or FIELDS) and returns the output from ZHTML
`body instantiated to each of the loop elements in the query.
`4.2. Automating the Database Report Extraction
`4.2.1. Database Output Categorization
`We'll concentrate on database manager systems that produce reports with a fixed format. These
`reports usually contain tabular information, where application-level data is in the beginning and end of
`the produced report. For example, the directory being listed, or the university being accessed for phone
`information. In the middle, we often find repetitive information in a structured fixed way. It's repetitive
`because there is one entry for each record matching the original query. These entries are usually
`separated by a record separator, which allows us to differentiate among records. Finally, we also have a
`field separator. which allows us to divide record information into yet more specific detail.
`In a file listing example, the first line has application-level information, the total space occupied by the
`directory. Then, we see records (files) that in turn can be divided into fields (name, size, owner, date...)
`What we do, is to guess where these separators lie and confirm them with the user, prompting her for
`any unknown information. Then, we proceed to generate the data structure, necessary to instantiate the
`ZHTML file once new queries get requested.
`4.2.2. Generating the Query Info from the HTML Form
`HTML forms contain a name and a value pair. For example, a form may contain three variables:
`directory, mask, and sort-by which have default values and get instantiated to the user-given values
`when the form is submitted.
`Note that the query info described above, can contain information that will not be processed by the
`DBMS, but instead it is functionality provided additionally by Zelig, such as sorting by a specific field.
`Second International WWW Conference - Chicago, Oct 17-21/94
`Here in this subsection, we work on generating the database query from the form variable bindings and
`the given query example.
`In our examples, we mainly have to create:
`l % ls <directory>/<mask> or
`l % ph <name> return <fields>
`4.2.3. Generating a Schema from a Given Hypertext Example
`Once we know for a particular example, how we want our hypertext to look like; i.e. we have
`HTML files for specific queries; we can abstract those ZHTML schemata to be instantiated to other
`queries as well.
`We do this by querying the user when we aren't sure if the information parsed is relevant (needs to be
`categorized to subsequently be used by the schema instantiation algorithm) or it is just a separator.
`In the following section, we show an example illustrating a schema, and a couple of its possible
`instantiations depending on the database query.
`5. A Running Example: The CCSO Phone Nameserver Database
`The CCSO nameserver software provides a server-client model for accessing phone directory
`information from academic institutions [Dorner 1992]. The figures in this section have been created
`browsing HTML files in NCSA Mosaic for X [NCSA 1993].
`The following is an HTML Query Form to access those databases:
`Second International WWW Conference - Chicago, Oct 17-21/94
`The following link contains a schema for instantiating the categorized database information, once a
`query has been made. We will show two different instantiations depending on two different user queries
`for this same schema:
`l The first query example asks for the names and phone numbers of all UIUC entries with the name
`Hardin sorted ascendently. Our schema instantiation for such a query results in the following
`HTML file, which in turn, is displayed by the browser as the hypertext shown in the next figure.
`Each of the records is a hyperlink to more information on that particular person. Also, links
`previously visited have a different color.
`l Our second query example asks for the same entries, but it also queries their address. Additionally
`the sort order is descendent, as opposed to ascendent. Following the links, you can see the HTML
`file generated and its corresponding hypertext output.
`Second International WWW Conference - Chicago, Oct 17-21/94
`6. Conclusions
`The success of a distributed information system lies heavily on the simplicity for generating,
`providing, using and referring to information. The World-Wide Web is composed by excellent
`protocols, tools and languages to perform these actions for static information. We have designed an
`extension to this technology to easily provide access to dynamic information, such as the result of
`queries to existing databases.
`The functionality for our system, Zelig, was described in this paper. Its main improvements over
`previous technology include:
`l providing code generation for converting fixed data formats into hypertext;
`l allowing evolving user interfaces for more effective human-computer interaction;
`l increasing the functionality of applications owning the data, by offering additional operations such
`as sorting; and
`l reusing HTML schemata to provide similar look-and-feel interfaces across different applications.
`We provided a hyperlinked example giving WWW access to the CCSO ph/qi nameserver software. This
`gateway running at NCSA, as of September 1994, provides phone directory information for about 250
`academic institutions around the world, and receives more than a thousand queries per day.
`Ultimately, Zelig offers the user an effective way to generate fully customized interfaces to dynamic
`data, further closing the gap between information generation, provision and use.
`Second International WWW Conference - Chicago, Oct 17-21/94
`Thanks to the NCSA Software Development Group for their helpful comments on this paper and
`their excellent research and working environment. Additional thanks to Professor Dershowitz, for his
`comments and motivating research [Dershowitz 1983].
`Carlos A. Varela (
`Received his B.S. in Computer Science (CS) at the University of Illinois at Urbana-Champaign,
`where he is currently a M.S./Ph.D. student. His research interests include integrating formal methods of
`artificial intelligence in software engineering, specially information systems.
`Carlos has also been a research assistant at the National Center for Supercomputing Applications
`(NCSA) since 1991. At NCSA he has worked in different projects including an alpha shapes visualizer
`(NCSA Walvis), a World-Wide Web browser ( NCSA Mosaic for X/Windows), and a World-Wide Web
`server (NCSA httpd for Unix).
`In the past, Carlos has been a Math and Computer Science teaching assistant for classes up to
`differential equations and information systems at the University of Los Andes, Bogota, Colombia. He
`has also been a consultant for Arthur Andersen & Co., and an Artificial Intelligence fellow at the
`Beckman Institute for the Advancement of Science and Technology.
`Caroline C. Hayes (
`Received her B.S. in Math, M.S. in Knowledge-Based Systems, and Ph.D. in Robotics, all from
`Carnegie Mellon University.
`Currently she is an assistant professor at the Department of Computer Science and at the Beckman
`Institute of the University of Illinois at Urbana-Champaign.
`Her research interests include artificial intelligence, specially planning, design, abstraction, and
`knowledge-based systems; as well as computer-aided manufacturing and design. Professor Hayes is
`particularly interested in tools evaluating, criticizing and optimizing designs in areas from machined
`parts, intersection design, roofing design and software design.
