throbber
Purdue University
`Purdue e-Pubs
`
`Computer Science Technical Reports
`
`Department of Computer Science
`
`1981
`
`The Flat File Database Generator Ffg
`
`Douglas E. Comer
`Purdue University, comer@cs.purdue.edu
`
`Report Number:
`81-379
`
`Comer, Douglas E., "The Flat File Database Generator Ffg" (1981). Computer Science Technical Reports. Paper 306.
`http://docs.lib.purdue.edu/cstech/306
`
`This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
`additional information.
`
`000001
`
`Symantec 1030
`IPR2015-01892
`Symantec v. Finjan
`
`

`
`The Flat File Database Generator Ffg
`
`Douglas Gomer
`
`Computer Science Department
`Purdue University
`West Lafayette, IN 47907
`
`September 1981
`
`ABSTRACT
`
`It consists of a
`A fiat file is the simplest possible database.
`single. unformatted text file in which each line corresponds to a
`record. k-l occurrences of a separator character divide each
`record into k variable length fields;
`the separator character does
`not otherwise appear in the file. Unlike most database systems,
`the fiat file system is not a single. large program.
`Instead,
`it con(cid:173)
`sists of a set of small,
`independent programs. called primitives.
`that each perform one basic operation. The user composes a sub(cid:173)
`set of the primitives by directing the output of one to the input of
`the next in order to perform complex retrieval or update opera(cid:173)
`tions. Because
`they arc
`independent,
`primitives
`are
`easily
`modified or replaced, and one can add programs to the set of prim(cid:173)
`itives. Both the selection of primltives as well as their implementa(cid:173)
`tion are discussed.
`
`CSD-TR-379
`
`000002
`
`

`
`-2-
`
`1. Introduction
`
`A fiat flle is the simplest possible database.
`
`It consists of a single text file.
`
`F. containing zero or more lines, where each line is thought of as a record.
`
`Records are further divided into k fields, f l , f 2, "',(k by k·l occurrences of a dis(cid:173)
`tinguished separator character. S. Although k is fixed over all records,
`the
`
`length of indlvtdual fields is not. The fiat file generator, fig, is a database system
`
`that provides facilities to create, query. and modify a flat file database.
`
`Unlike most commercial database systems that consist of one or two large
`
`programs to process queries and modify the stored data (see DATE75. ULLMBO),
`
`ffg consists of many small programs, called primitives, that each perform one
`
`basic operation. A user composes a subset of the primitives by directing the
`
`output of one to the input of the next in order to perform a complex task.
`
`Because the primitives each perform one basic operation, selecting an appropri(cid:173)
`
`ate combination is
`
`straightforward and natural. Because primltives are
`
`independent programs. they can be modified or replaced. and one can add pro(cid:173)
`
`grams to the set. The ease of extension and modification is important in achiev(cid:173)
`
`ing fiexibility because it allows one to tailor fig for each application.
`
`Constructing systems as a set of primitives is not new. Kernighan and
`
`Plauger [KEPL76] describe primitives for program and text manipulation; Han(cid:173)
`
`son [HANS79] extends them. Borden et. al.
`
`[BOGS79] describe an electronic
`
`mail preparation and reading system implemented in primitives.
`
`lnterestingly enough, most of the experiments with primitives have their
`
`roots in the UNIX operating system [RITH7B,KEMA79]. Unlike most systems
`
`which encourage one to build large integrated programs, UNIX encourages ona
`
`to build independent programs and connect them together. It pro~ides fl simple
`
`and efficient mechanism for passing data between running programs.
`
`includes a convenient and simple notation for describing a composition.
`
`It
`
`It
`
`000003
`
`

`
`- 3-
`
`treats 110 to files, devices (like terminals), and other programs uniformly, so
`
`bne does not need to know how a program will be used when writing it. UNIX
`
`contains sets of primitives for text processing and language development.
`
`The next section or
`
`this paper describes pertinent parts of
`
`the UNIX
`
`environment in more detail, and gives the reader some appreciation of haw UNIX
`
`influenced the fiat file design. The following sections describe the fiat file primi(cid:173)
`
`tives, give an example of using fiat files, and discuss their implementation. The
`
`paper concludes by discussing the merits of systems constructed from primi(cid:173)
`
`tives.
`
`2. The UNIX Environment
`
`UNIX contains a large set of independent primitives, called commands, and
`
`a- mechanism for composing them. called a shell. The UNIX shell [BOUR78] is a
`
`simpie programming language that can be used interactively (as a command
`
`interpreter would be). or invoked to read input from a file (as a programming
`
`language interpreter would be). Shell programs are called shell scripts. or just
`
`scripts.
`
`If one tries to execute a file that contains a shell script. ,the system
`
`automatically invokes th~ shell to interpret it. Thus, a shell script functions just
`
`like a compiled program.
`
`In tact, some of the system commands are imple w
`
`mented as shell scripts.
`
`The shell has facilities to invoke commands, direct the output of one com(cid:173)
`
`mand to the input of the next, or direct the input (output) of a command to a
`
`file or I/O device. Control statements (e.g.. while, JOT, iJ, etc.) prOVide indefinite
`
`iteration, conditional execution, and definite iteration much like conventional
`
`programming languages. Unlike conventional languages, the shell only supports
`
`one data type -
`
`that of character string.
`
`It relies on commands to evaluate
`
`numeric expressions. test file status and accessibll1ty. and handle complex com(cid:173)
`
`pulations. 1 One learns qUickly that the art of constructing shell programs lies in
`
`IOther shells, like the C-9hell written at U,C. Berkeley, evaluate expressioIUI directly.
`
`r····
`
`000004
`
`

`
`- 4-
`
`composing and invoking commands, not in using the shell exactly as one would
`
`use Algol 60 or PascaL.
`
`UNIX includes a mechanism for composing primitives called a pipe. Pipes,
`denoted by "I" in shell scripts. connect the output from one program to the
`
`input of another. One writes
`
`alb
`
`to invoke programs a. and b with the output from a. connected to the input of b.
`
`The line
`
`a argl I b arg2 arg31 c
`
`specifies a pipeline connecting the output of program a to the input of program
`
`b and connecting the output of program b to the input of program c. Program a
`
`has one argument (argl), program b has two (arg2 and arg3). while program c
`
`has none.
`
`fl., b. and c, could be the names of shell scripts or compiled programs.
`
`UNIX contributed to the construction of fig in several other ways:
`
`1. All system services are avallable at the command level. One cnn create
`
`illes, change protectlon modes, trap exceptions. and perform other tasks
`
`directly from the shell in UNIX. On many systems such tasks require
`
`special programs, often in assembler language.
`
`2. UNIX provides a rich set of text manipulation primitives that fig uses
`
`extensively.
`
`3. UNIX is a late binding system. There is little distinction between data
`
`and program; one can write a file and invoke the shell to run it as a
`
`script.
`
`The UNIX environment is not perfect, but it contrLbuted nicely to the exper(cid:173)
`
`OJ
`
`j
`
`iment.
`
`000005
`
`

`
`-5-
`
`3. Evolution of Flat me- Primitives
`
`Recall that. a fiat file consists of a single. unformatted text tlle where each
`
`Une corresponds to a record. and that each record is divided into variable length
`
`fields by occurrences of a separator character.. The operations that one typi(cid:173)
`
`cally perro~ms on a fiat file include:
`
`add
`
`select
`
`delete
`
`change
`
`format
`
`sort
`
`new records to the file.
`
`record(s) from the file having certain characteristics,
`
`record(s) from the file having certain characteristics,
`
`field(s) on specified record(s),
`
`the file for human consumption,
`
`the records according to the contents of some field(s)
`
`Our local version of UNIX contains many fiat files. One of the more well
`
`lmown,
`
`/etc /passwd contains a record for each user; it assocIates a symbolic
`
`name, encrypted password, and other information with the user's login id.
`
`Another flat file contains inventory 'information for computer terminals.
`
`The- terminal inventory database is significant for two reasons. First, it pro(cid:173)
`
`vided the early motivation and testbed for the fiat file experiment; it will be used
`
`here to illustrate the process or choosing primitives. Second, it demonstrates
`
`how the flat file system can be extended to each application.
`
`In particular, the
`
`historlcal narrative that tollows shows how the flat file primitives evolved, and
`
`how they have been extended for the terminal inventory application.
`
`The first terminal inventory database consisted of two fiat files -- one for
`
`"termlnals" and the" other for "ports", Both were maintained by hand, using a
`
`text editor. The former contained a record for each computer terminal. giving
`
`its type. seriai nwnber, physical location, and connection to the machine. The
`
`latter contained a record for each machine connection (port). giving, among
`
`.~
`
`000006
`
`

`
`- 6-
`
`other things, the terminal that was connected to it. When the number of ports
`
`and terminals grew to more than a handful. keeping the information in the two
`
`files accurate and consistent became difficult.
`
`It was decided that a program
`
`could be written to help with the maintenance. Unfortunately,
`
`thinking of the
`
`data as two separate files with many cross references made a program to mani-
`
`pulate it both highly specialized and cumbersome.
`
`The first step toward a fiat file database system occurred when the two data
`
`files were combined into one, called termin/a. Records in the termin.i'o file
`
`corresponded either to a port or to a terminal: those with a null "terminal" field
`
`corresponded to 110 ports on the machine that were not connected to a term!-
`
`nal, those with a null "port" field corresponded to terminals that were not con-
`
`nected to the machine, and those with data for both "port" and "terminal" fields
`
`corresponded to a connection. The point here is that the basic operations
`
`worked on connections (terminal,port) rather then on terminals or ports;
`
`the
`
`process of identifying those primitive operations brought this out.
`
`Terminfo became the source of all information about terminals and ports
`
`throughout the system. All system files are generated from it automatically,
`
`eliminating the need to change them by hand every time a terminal is moved.
`
`For example, the system expects the llle /etc/ttys to contain a Hne for each
`
`port on the system, with a code indicating whether that port is connected to a
`
`terminal or not (Le. allows user login). A utility program was built to scan ter-
`
`minfo and extract the information for /etc/ttys. Other utility programs were
`
`built to extract information for other files. Whenever terminfo changes, the util-
`
`ity programs are run to correct other files throughout the system.
`
`The set of utility programs to manipulate terminfo grew quickly, but there
`
`was little or no attempt
`
`to maintain unlformity or to make them work well
`
`together. There was a program called lookup to retrieve the record for a termi-
`
`000007
`
`

`
`-7-
`
`nal with a given serial number, and one called format to write the database in a
`
`n~atly formatted fashion. The move toward a consistent, uniform set of pri.ml-
`
`lives began when lookup and format were modified to work in conjunction with
`
`each other. Using the modified versions, a user could type
`
`lookup -terminal 53 I format -
`
`to obtain a neatly formatted listing (including headings) of the data for terminal
`
`53. The change to the format primitive was simple: given a minus sign (".11) as an
`
`argument. it formatted its input: otherwise, it formatted the entire terminfo file .
`
`. Although a general purpose format primitive was a good idea, forcing the user to
`
`distinguish when it was used in a pipeline was not. Often, one forgot the argu-
`
`rnent as in:
`
`lookup -terminal 53 Iformat
`
`and received a listing of the entire file.
`
`In spite of the problems with the utilities, others began to copy and modify
`
`the set of programs to create their own databases. Mter some experience with
`
`termlnfo and related databases,
`
`the set of primitives were redesigned com-
`
`pletely to achieve several goals:
`
`1. The primitives should supply the ability to retrieve, sort, format, delete.
`
`replace. and edit any database stored as a fiat file.
`
`2. All fields in the flle should be named; one should not have to specify a
`
`field by its relative position as in the early version and in some of the
`
`UNIX commands.
`
`3. The database system should work from a single descriptor file that
`
`~,
`
`... j
`
`described the fieids. their names. and their format.
`
`000008
`
`

`
`- B -
`
`4. All programs should work together in a simple, uniform. and automatic
`
`way. For example,
`
`it should be possible to retrieve a subset of the
`
`records, sort them, and format them. One should not have to type spe-
`
`cial names or arguments to use the programs in a pipeline.
`
`5. The system should protect against loss of information.
`
`6. The system should be implemented as a set of primitives.
`
`The redesign resulted in a flat me generator called fig. The next section
`
`describes the fig primitives in detail and shows how they work together.
`
`4. Primitives in Ffg
`
`The set of fig primitives includes the following (see Appendix A for a detailed
`
`description of the parameters for each primitive).
`
`delete
`
`edit
`
`enter
`
`fielded
`
`format
`
`lookup
`
`Omit specified records, write out the others.
`
`Allow the user to invoke a text editor on the database directly.
`
`Edit makes a backup of the database for the undo primitive.
`
`Interactively enter records one at a time.
`
`Change the contents of specified fields on specified records.
`
`Format the data for human consumption.
`
`Retrieve records satisfying given criteria. Lookup is shorthand
`
`for simple retrieval requests.
`
`retrieve
`
`Retrieve records satisfying given criteria.
`
`showtlelds Display the fields description file (usually as an aid for users who
`
`forget field names).
`
`sortby
`
`Sort the input according to one or more fields.
`
`C'J
`
`. ,
`
`000009
`
`

`
`I.
`
`~ 1
`
`- 9 -
`
`undo
`
`update
`
`verify
`
`Restore the entire database to its previous value.
`
`Replace the database by the file given as input.
`
`Verify the internal consistency of the data.
`
`Ffg achieves most of the design goals listed above. The above primitives all
`
`expect their arguments to contain symbolic field names as specified in a fields
`
`desGription file that the user supplies when creating the database. They work
`
`together. and automatically detect whether their input is connected to the out-
`
`put of another program. reading from the database if it is not. The system is
`
`implemented as a set of primitives, and the system does have a limited form of
`
`protection. The follOWing dlscussions show, in more detail. how the flat file prim-
`
`itives achieve these goals.
`
`The fig primitives depend on a :fields description (FD) file to relate symbolic
`
`field names to relative positions. The FD :file contains k lines, one line for each of
`
`the k fields in the flat file. Each field is described by giving its relative position,
`
`its name.
`
`its sort type (e.g .. numeric.
`
`to be placed in descending order), its
`
`length on a formatted listing. and two lines of heading information to be printed
`
`'on formatted listings. The six items for each field are terminated by colons. For
`
`example. the FD :file:
`
`1:1ast::20: Last Naroe:-------------:
`
`2:flrst::20: First name:--------------·---;
`
`3:phone:n: 13:Phone Number:----------:
`
`describes the three :fields for records in a phone book. The first. field, named
`
`"last", holds a last name, the seconq field, named "first" holds a first name, and
`
`the third field, named "phone", holds a phone number. For purposes of sorting,
`
`(\' J _\
`
`the third field is considered numeric:
`
`the first
`
`two are sorted in dictionary
`
`order. When a fiat file is formatted using this description file, it will look like:
`
`000010
`
`

`
`- 10 -
`
`Last Name
`
`First Name
`
`Phone Num.
`
`lIUIl
`
`flfltl
`
`pppppp
`
`where the actual data for last names, first names and phone. numbers appears
`
`in place of nUll,
`
`tIfIff, and pppppp. Fields longer than the number of columns
`
`allocaled in the listlng are truncated, and fields shorter than the number of
`
`columns allocated are padded with blanks;
`
`the user can specify whether the
`
`padding is to the right or left.
`
`The retrieve primitive is especially interesting because it illustrates the
`
`power of the flat file system. Retrieve takes as an argument a Boolean expres~
`
`sion, B, and retrieves all records that satisfy B. The expression can contain
`
`comparative operators less than. «), greater than. (»; equal to (==). not equal
`
`to (!=), etc.. logical operators and (&&). DT (11), and not (!), arithmetic operators
`
`(+. -.•, I, etc.). and pattern matching operators matches (expr...... /pattern/), and
`
`does not match (expr! ...... /paltern/).2 One can ask questions like "find all records
`
`where the last name starts with the letter C and contains the letter r"
`
`retrieve ·last...... / ....C,·r.·I'
`
`Dr "find all records where the tax field is greater than 50 and the department
`
`field is equal to cs or where the manager is smith and the department is not cs"
`
`retrieve '(tax>50&&dept=="cs") II (manager=="smith"&&dept!="cs")'
`
`It Is lmportant to note that one can only ask for intra-record comparisons, not
`
`fDr inter-record ones. Thus, one cannot ask for records with salary field greater
`
`than the previDus one. nor can one ask for all records where the salary field is
`
`greater than the salary field Df the 2nd record.
`
`2See Appendix 8 for details o! expression synlW[.
`
`000011
`
`

`
`- 11 -
`
`The flat file primitives work well together, and automatically read from the
`
`database when their input is not. connected to another program. For example,
`
`l;>nce the field description tlle and database are in place, one merely types:
`
`format.
`
`~o obtain a formatted listing of the data with headings. Typing
`
`sortby phone Iformat
`
`instead. causes format to read and format the output of ,the sortby primitive. ]n
`
`this example, the list.ing will be sorted by phone number. Similarly, typing:
`
`retrieve last=="comer" I sortby phone I format
`
`causes the retrieve primitive to select. all records with last name equal to the
`
`string "comer" ,. pass the results to the sortby primitive which will order them by
`3
`
`phone number before passing them to format. where they will be formatted.
`
`One need not specify the origin of the data as a parameter.
`
`Ffg helps prevent the loss of information through the update primitive. To
`
`make a permanent change to the data, one must create the new file and pipe it
`
`into update. Thus,
`
`to sort the example database according to last name, one
`
`types:
`
`sortby last I update
`
`. Update saves a copy of the old file before replacing it. so one can recover the
`
`previous state of the database by typing:
`
`undo
`
`L()
`
`3The shell syntax a.ctually requires that the quotes be escaped by typing a be.cblash in front
`of them.
`
`000012
`
`

`
`- 12 -
`
`Update 1s more sophisticated than one might expect.
`
`It actually unlocks, writes,
`
`and then relocks the database so that under usual circumstances even the
`
`owner cannot write directly to the file. Keeping the data flle unwritable is espe(cid:173)
`
`cially important in UNIX where it is easy to direct the output of a program to· a
`
`file, or to accidently pass a file name as an argument to a command. Update
`
`also maintains a mutual exclusion among processes that wish to update the
`
`database. The most common way to enter records interactively is by invoking
`
`the primitive enter which prompts for each field:
`
`enter I update
`
`Due of the chief advantages of the primitives-based approach is that it
`
`allows users to intermix their own primitives with those that are supplied. For
`
`example, our fig version of the terminfo database has a command to move a ter(cid:173)
`
`minal from one port to another because terminals are moved frequently.
`
`In
`
`another application. a primiUve called "gather" has been added to gather statis(cid:173)
`
`tlcs on program use and write them into a flat file. The fig system itself does not
`
`need to know about moving terminals. gathering statistics, or any of the other
`
`special commands that users invent. Yet having the primitives from ffg do most
`
`of the work made both applications significantly easier to implement.
`
`The evolution of the fiat tile primitives took about 3 months -- much longer
`
`than expected. Most of the time went into testing. Several users built fiat file
`
`databases, but measurements showed that they spent most of their time doing
`
`simple retrieval and formatting. Gradually, they added thelr own primitives, and
`
`began exploring new ways to connect old ones. Of course, others suggested
`
`changes that were tried in later versions.
`
`(0
`
`From the experience.
`
`two observations can be made about the choice of
`
`primitives:
`
`000013
`
`

`
`jI,,,
`
`- 13 -
`
`1. Ad hac extensions to a untlled set of primitives almost always result in
`
`disaster. For example, at one point we added a "delete" primitive that
`
`actually modified the database by retrleving records lJ;l.at were not to be
`
`deleted and passing them to update (unlike other primitives that had to
`
`be composed with update explicitly). One had to remember that delete
`
`worked differently than other commands, and that it could not be corn·
`
`posed with them. Worst of all, composlng delete with update created two
`
`processes that tried to modify the database. so one of them gave the
`
`cryptic report: "database is locked while another process updates it".
`
`2. The greatest asset in the design of a clean, uniform set of primitives is a
`
`single person who has ultimate responsibility. This is akin to the chief
`
`programmer concept [BAKE72).
`
`3. Designs by a single individual are prone to gross omissions in functional-
`
`ity. This should not come as a surprise, but it did.
`
`5. The Implementation of Ffg Using UNIX Progro.ms
`
`If the primitives-based approach to computing works so welL why not use it
`
`to build the primitives themselves? This section answers that question by
`
`explaining how the fig system, includLng the primitives, are built from existing
`
`UNIX programs.
`
`It discusses the UNIX programs upon which the fiat file genera·
`
`tor are built, the generation of a database. and binding of names.
`
`The UNIX «;:ommand awk [AHKW79], forms the backbone of the fig retrieve
`
`and format primitives. Awk invokes an interpreter for a simple, but powerful
`
`string processing language. The interpreter reads an awk program, sometimes
`
`called an u'wk script, and then reads and processes. a texl
`
`file linc-by-line
`
`according to the program. Awk divides each line of the input file into fields
`
`based on occurrences of a separator character, and permits one to examine or
`
`000014
`
`

`
`'Write the contents of the Ith field.
`
`- 14-
`
`,
`(To reference the Ith field of the current
`
`record, one writes $1 in the awk program). Awk supports assignment state-
`
`menU;:, fairly powerful arithmetic,
`
`logical, and string operators. and even for-
`
`matted output.
`
`In short, an awk script suffices for fiat file retrieval or format-
`
`ling provided Doe finds a way to translate field names inlo positional references.
`
`How can an expression containing field names be processed by awk which
`
`only understands positional references? One might expect the implementation
`
`of retrieve to solve the problem as follows:
`
`1. A user invokes retrieve, passing it an expression, B. that contains field
`
`names.
`
`2. Retrieve passes the expression to a program, T. that parses the exprcs-
`
`sion,
`
`translates field names into positional
`
`references, merges
`
`the
`
`modified expression with the skeleton of an awk program, and writes the
`
`result on file F.
`
`3. Retrieve invokes awk giving it F as input. The program contains only
`
`positional references.
`
`4. Interpreting the program on F. awk reads the database. evaluates the
`
`expression for each record, and writes out those that satisfy it.
`
`This design was not used because it meant writing a program to parse and
`
`translate expressions; the objective was to use existing programs.
`
`Retrieve turns the solution around, leaving the expression alone. but giving
`
`a.wk enough informati0Il: to evaluate it. To do so, retrieve introduces k variables
`
`into the awk program and assigns them the contents of the k fields with k
`
`aSSignment statements. The essential piece of the awk script is:
`
`co
`
`000015
`
`

`
`- 15-
`
`field! =$1
`field 2=S2
`
`tleldk=Sk
`
`if ( EXPRESSION) write out the record
`
`where field
`
`denotes the Ith field name and EXPRESSION denotes the Boolean
`
`i
`expression as typed by the user. When evaluating the expression, awk binds
`
`references to field names to the variables that have been assigned. the contents
`
`of the field. Making the extra assignments introduced some extra overhead;
`
`measurements are given in a later. section. Similar constructions were used in
`
`other commands.
`
`Implementing most of the remaining primitives from UNIX commands was
`
`not difficult, but a few problems arose. Processing minimal abbreviations for
`
`field names presented the worst challenge because no simple combination of
`
`UNIX commands produced the desired result. For example. if the set of possible
`
`field names are: "salary", "dept", "division", "dependents", and "name", one need
`
`only give sortby a pretlx of the field name that uniquely identifies it (this
`
`specification was made before the implementation was considered).
`
`It means
`
`that "0." suffices for "name", but nothing shorter than "depe" may be used to
`
`designate "dependents" because it does not distinguish "dependents" from
`
`·'dept". The shell supports pattern matching, so such abbreviations can be han-
`
`dled there. To do so, one must translate a list of field names like "salary",
`
`"dept", division", "dependents", and "name" into a list of patterns like "s·",
`
`"dept", "di.", "depe·", and "0..". Ffg performs these translation with an awk
`
`script, although it is more or less a conventional program. The result is Lhat CTg
`
`c
`
`contains no compiled programs, but it does contain some programming.
`
`Fig is a more than a collection of shell scripts for the primitives; it is a fiat
`
`000016
`
`

`
`• 16 -
`
`file database generator as well. When lnvoked as a command, trg builds a Oat file
`
`database system. including a copy of the primitives. a field description file, and
`
`an access command. The user supplies information on the separator character,
`
`protection modes, fields description file, and the location of the access com,.
`
`mand: trg generates the necessary files.
`
`Each fiat file dalabase resides in a separate directory along with copies of
`
`the primitives and two subdirectories: "Specs" and ",system". The subdirectory
`
`Specs contains specifications like the fields descriptions
`
`that a user may
`
`change. Such changes are infrequent, however, so the information is kept out of
`
`the main directory. Additional files. that the user should not change, are kept in
`
`the .system subdirectory (e.g., mutual exclusion lock files).
`
`Each fiat file has an access command that one invokes to move to the data-
`
`base environment. When invoked, an access command changes the user to the
`
`database directory, records the user's presence, and invokes an interactiv:e
`
`shell that reads and processes commands. After the user finishes work and exits
`
`from the interactive shell,
`
`the access command returns to the environment
`
`from which it was invoked. Normally, only one user can gain access to a flat file
`
`at a time; the access command refuses to grant access to a database that is in
`
`use. One can obtain nonexclusive use, find the status of active users. when they
`
`began, and their system identification. One can also ask for the creation time,
`
`mode. and size of the database and backup files.
`
`li'fg oplimizes Lhe primitives by performing some bindings early. For cxo.m~
`
`ple, when trg constructs the retrieve primitive, it reads the field description tlle
`
`and binds field names into the shell script as described earlier in this secLion.
`
`This simple optimization improves performance dramatically because it elim-
`
`,.,
`
`inates the need to open the field description file, build the awk program. and
`
`have awk read the program back in.
`
`It also means that the user must inform
`
`000017
`
`

`
`- 17-
`
`the system of changes in the description file. Whenever such a change occurs,
`
`U1e primitive rebuild will correctly recreate the primitives (including itself.
`
`if
`
`necessary). One would expect such changes relatively infrequently, however,
`
`when compared to the other operations.
`
`UnUke most primitives which must be rebuilt manually, the format primi~
`
`live is capable of detecting new formats automatically. The user views format as
`
`a late binding command. one that searches a special directory for a named for-
`
`mat description file every time one invokes it. Actually, the names and format
`
`specifications. are botuld into the shell script to speed execution. The command
`
`searches for new formats only if the named file has not been bound previously.
`
`When it detects that a new file exists but has not been bound, format moves
`
`itself out of the way, uses rebuild to create a new version of itself, and then
`
`replaces the running version with the new one (i.e .. performs a UNIX exec). Sub-
`
`sequent uses of the new format run at high speed.
`
`6. Execution speed
`
`The obvious advantage of early binding is execution speed:
`
`the obVious
`
`disadvantage is user impact. As on most timesharing systems, performance is
`
`best measured by response time. Users gladly tolerate a response delay of a few
`
`seconds for retrieval from a ZOO-line database, but they will not wait 30 seconds
`
`for the same information. Without early binding. response times for a pipeline of
`
`five primitives approached 30 seconds on our moderately loaded system. On the
`
`other hand. the optimized versions of the primitives were able to handle much
`
`larger fUes. Table 1 shows response times for a database of 1900 records. In the
`
`table.
`
`lhe command "cat" is a UNIX program that copies a file to its output
`
`unchanged; one expects cat to run at the maximum possible speed. Another
`
`UNIX command. "grep", scans a file and prints those lines that match a pattern.
`
`Finally. the UNIX command "we" counts the lines, words, and characters in a file.
`
`000018
`
`

`
`- 18-
`
`command
`primitive
`
`grep
`
`wc
`
`retrieve
`
`awk (retrieve program
`called directly)
`
`awk (retrieve program
`positional references)
`
`format
`
`format (output discarded)
`
`cat
`
`cat (output discarded)
`
`response time
`in seconds
`
`4(2.2 cpu)
`
`4(1.9 cpu)
`
`22(18.1 cpu)
`
`21(17.7 cpu)
`
`17(10.9 cpu)
`
`2:34(40.4 cpu)
`
`46(20.8 cpu)
`
`1:48(5.6 cpu)
`
`6(3.2 cpu)
`
`Table 1.
`Times for various fiat me primitives and UNIX commands
`on a file of 1923 lines, 101204 characters. Timings
`reported here are the mean from several funs. A large
`variation in. real time occurred with system load.
`
`Unfortunately. all times, especially the real time, varied under system load.
`
`Sl1ll, several observations can be made. First, the highly optimized "cat" com-
`
`mand copies a file to the user's terminal at roughly 937 characters/second (real
`
`time), while the fiat file primitive "format" displays a formatted version of the
`
`same file at 900 characters/second.
`
`In both cases, the system I/O speed, not
`
`the process speed limited the display speed (the terminal used for testing ran at
`
`9600 baud). Second. the Lntroduclion of variables and assignments in thc awk
`
`<'J
`
`program during retrieval produced a measurable delay in processing. The aver-
`
`age real
`
`time required to process a 1900 line file increased from 17 to 22
`
`000019
`
`

`
`- 19-
`
`seconds (29%), but very few users notice any difference. Third,
`
`the time
`
`teqUired for the shell to parse the script, redirect the input and output. and
`
`start execution remained very small. We conclude that rewriting the primitives
`
`in a lower level language would produce little or no benefit to the user.
`
`7. Failures of Fig
`
`So far,
`
`the primitives-based implementation of fig has been described in
`
`glowing terms. But the primitives approach has its limitations as well. Prob(cid:173)
`
`lems can be separated into three main categories: error detection problems,
`
`optimization problems. and error propagation problems.
`
`Some of the error detection problems in fig are inherent in the approach,
`
`others arise from the implementation. Because a primltive cannot know what
`
`other primitives precede it or succeed it in a pipeline, detection of some errors
`
`is impossible. For example, typing:
`
`format I update
`
`will replace the entire database with a formatted versio

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket