throbber
- 62 -
`
`Design and Implementation of the WordNet Lexical Database
`and Searching Software†
`
`Richard Beckwith, George A. Miller, and Randee Tengi
`
`Lexicographers must be concerned with the presentation as well as the content of
`their work, and this concern is heightened when presentation moves from the printed
`page to the computer monitor. Printed dictionaries have become relatively standardized
`through many years of publishing (Vizetelly, 1915); expectations for electronic lexicons
`are still up for grabs. Indeed, computer technology itself is evolving rapidly; an
`indefinite variety of ways to present lexical information is possible with this new
`technology, and the advantages and disadvantages of many possible alternatives are still
`matters for experimentation and debate. Given this degree of uncertainty, manner of
`presentation must be a central concern for the electronic lexicographer.
`WordNet is a pioneering excursion into this new medium. Considerable attention
`has been devoted to making it useful and convenient, but the solutions described here are
`unlikely to be the final word on these matters. It is hoped that readers will not merely
`note the shortcomings of this work, but will also be inspired to make improvements on it.
`One’s first impression of WordNet is likely to be that it is an on-line thesaurus. It is
`true that sets of synonyms are basic building blocks, and with nothing more than these
`synonym sets the system would have all the power of a thesaurus. When short glosses
`are added to the synonym sets, it resembles an on-line dictionary that has been
`supplemented with synonyms for cross referencing (Calzolari, 1988). But WordNet
`includes much more information than that. In an attempt to model the lexical knowledge
`of a native speaker of English, WordNet has been given detailed information about
`relations between word forms and synonym sets. How this relational structure should be
`presented to a user raises questions that outrun the experience of conventional
`lexicography.
`In developing this on-line lexical database, it has been convenient to divide the
`work into two interdependent tasks which bear a vague similarity to the traditional tasks
`of writing and printing a dictionary. One task was to write the source files that contain
`the basic lexical data — the contents of those files are the lexical substance of WordNet.
`The second task was to create a set of computer programs that would accept the source
`hhhhhhhhhhhhhhh
`† This is a revised version of "Implementing a Lexical Network" in CSL Report #43, prepared
`by Randee Tengi. UNIX is a registered trademark of UNIX System Laboratories, Inc. Sun, Sun 3
`and Sun 4 are trademarks of Sun Microsystems, Inc. Macintosh is a trademark of Macintosh La-
`boratory, Inc. licensed to Apple Computer, Inc. NeXT is a trademark of NeXT. Microsoft Win-
`dows is a trademark of Microsoft Corporation. IBM is a registered trademark of International
`Business Machines Corporation. X Windows is a trademark of the Massachusetts Institute of
`Technology. DECstation is a trademark of Digital Equipment Corporation.
`
`Page 1 of 25
`
`GOOGLE EXHIBIT 1030
`
`

`

`- 63 -
`
`files and do all the work leading ultimately to the generation of a display for the user.
`The WordNet system falls naturally into four parts: the WordNet lexicographers’
`source files; the software to convert these files into the WordNet lexical database; the
`WordNet lexical database; and the suite of software tools used to access the database.
`The WordNet system is developed on a network of Sun-4 workstations. The software
`programs and tools are written using the C programming language, Unix utilities, and
`shell scripts. To date, WordNet has been ported to the following computer systems:
`Sun-3; DECstation; NeXT; IBM PC and PC clones; Macintosh.
`The remainder of this paper discusses general features of the design and
`implementation of WordNet. The ‘‘WordNet Reference Manual’’ is a set of manual
`pages that describe aspects of the WordNet system in detail, particularly the user
`interfaces and file formats. Together the two provide a fairly comprehensive view of the
`WordNet system.
`
`Index of Familiarity
`One of the best known and most important psycholinguistic facts about the mental
`lexicon is that some words are much more familiar than others. The familiarity of a word
`is known to influence a wide range of performance variables: speed of reading, speed of
`comprehension, ease of recall, probability of use. The effects are so ubiquitous that
`experimenters who hope to study anything else must take great pains to equate the words
`they use for familiarity. To ignore this variable in a lexical database that is supposed to
`reflect psycholinguistic principles would be unthinkable.
`In order to incorporate differences in familiarity into WordNet, a syntactically
`tagged index of familiarity is associated with each word form. This index does not
`reflect all of the consequences of differences of familiarity — some theorists would ask
`for strength indices associated with each relation — but accurate information on all of
`the consequences is not easily obtained. The present index is a first step.
`Frequency of use is usually assumed to be the best indicator of familiarity. The
`closed class words that play an important syntactic role are the most frequently used, of
`course, but even within the open classes of words there are large differences in frequency
`of occurrence that are assumed to correlate with — or to explain — the large differences
`in familiarity. The frequency data that are readily available in the technical literature,
`however, are inadequate for a database as extensive as WordNet. Thorndike and Lorge
`(1944) published data based on a count of some 5,000,000 running words of text, but
`they reported their results only for the 30,000 most frequent words. Moreover, they
`defined a ‘‘word’’ as any string of letters between successive spaces, so their counts for
`homographs are untrustworthy; there is no way to tell, for example, how often lead
`occurred as a noun and how often as a verb. Francis and Kucvera (1982) tag words for
`their syntactic category, but they report results for only 1,014,000 running words of text
`— or 50,400 word types, including many proper names — which is not a large enough
`sample to yield reliable counts for infrequently used words. (A comfortable rate of
`speaking is about 120 words/minute, so that 1,000,000 words corresponds to 140 hours,
`or about two weeks of normal exposure to language.)
`
`Page 2 of 25
`
`

`

`- 64 -
`
`Fortunately, an alternative indicator of familiarity is available. It has been known at
`least since Zipf (1945) that frequency of occurrence and polysemy are correlated. That is
`to say, on the average, the more frequently a word is used the more different meanings it
`will have in a dictionary. An intriguing finding in psycholinguistics (Jastrezembski,
`1981) is that polysemy seems to predict lexical access times as well as frequency does.
`Indeed, if the effect of frequency is controlled by choosing words of equivalent
`frequencies, polysemy is still a significant predictor of lexical decision times.
`Instead of using frequency of occurrence as an index of familiarity, therefore,
`WordNet uses polysemy. This measure can be determined from an on-line dictionary. If
`an index value of 0 is assigned to words that do not appear in the dictionary, and if values
`of 1 or more are assigned according to the number of senses the word has, then an index
`value can be made available for every word in every syntactic category. Associated with
`every word form in WordNet, therefore, there is an integer that represents a count (of the
`Collins Dictionary of the English Language) of the number of senses that word form has
`when it is used as a noun, verb, adjective, or adverb.
`A simple example of how the familiarity index might be used is shown in Table 1.
`If, say, the superordinates of bronco are requested, WordNet can respond with the
`sequence of hypernyms shown in Table 1. Now, if all the terms with a familiarity index
`(polysemy count) of 0 or 1 are omitted, which are primarily technical terms, the
`hypernyms of bronco include simply: bronco @fi
`pony @fi
`horse @fi
`animal @fi
`organism @fi
`entity. This shortened chain is much closer to what a layman would
`expect. The index of familiarity should be useful, therefore, when making suggestions
`for changes in wording. A user can search for a more familiar word by inspecting the
`polysemy in the WordNet hierarchy.
`WordNet would be a better simulation of human semantic memory if a familiarity
`index could be assigned to word-meaning pairs rather than to word forms. The noun tie,
`for example, is used far more often with the meaning {tie, necktie} than with the
`meaning {tie, tie beam}, yet both are presently assigned the same index, 13.
`
`Lexicographers’ Source Files
`WordNet’s source files are written by lexicographers. They are the product of a
`detailed relational analysis of lexical semantics: a variety of lexical and semantic
`relations are used to represent the organization of lexical knowledge. Two kinds of
`building blocks are distinguished in the source files: word forms and word meanings.
`Word forms are represented in their familiar orthography; word meanings are represented
`by synonym sets — lists of synonymous word forms that are interchangeable in some
`syntax. Two kinds of relations are recognized: lexical and semantic. Lexical relations
`hold between word forms; semantic relations hold between word meanings.
`WordNet organizes nouns, verbs, adjectives and adverbs into synonym sets
`(synsets), which are further arranged into a set of lexicographers’ source files by syntactic
`category and other organizational criteria. Adverbs are maintained in one file, while
`nouns and verbs are grouped according to semantic fields. Adjectives are divided
`between two files: one for descriptive adjectives and one for relational adjectives.
`
`Page 3 of 25
`
`

`

`- 65 -
`
`Hypernyms of bronco and their index values
`
`Polysemy
`Word
`iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
`bronco
`1
`@fi mustang
`1
`@fi
`pony
`5
`@fi
`horse
`14
`@fi
`equine
`0
`@fi
`odd-toed ungulate
`0
`@fi
`placental mammal
`0
`@fi mammal
`1
`@fi
`vertebrate
`1
`@fi
`chordate
`1
`@fi
`animal
`4
`@fi
`organism
`2
`@fi
`entity
`3
`
`Table 1
`
`Appendix A lists the names of the lexicographers’ source files.
`Each source file contains a list of synsets for one part of speech. Each synset
`consists of synonymous word forms, relational pointers, and other information. The
`relations represented by these pointers include (but are not limited to):
`hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy. Polysemous
`word forms are those that appear in more than one synset, therefore representing more
`than one concept. A lexicographer often enters a textual gloss in a synset, usually to
`provide some insight into the semantics intended by the synonymous word forms and
`their usage. If present, the textual gloss is included in the database and can be displayed
`by retrieval software. Comments can be entered, outside of a synset, by enclosing the
`text of the comment in parentheses, and are not included in the database.
`Descriptive adjectives are organized into clusters that represent the values, from one
`extreme to the other, of some attribute. Thus each adjective cluster has two (occasionally
`three) parts, each part headed by an antonymous pair of word forms called a head synset.
`Most head synsets are followed by one or more satellite synsets, each representing a
`concept that is similar in meaning to the concept represented by the head synset. One
`way to think of the cluster organization is to visualize a wheel, with each head synset as a
`hub and its satellite synsets as the spokes. Two or more wheels are logically connected
`via antonymy, which can be thought of as an axle between wheels.
`The Grinder utility compiles the lexicographers’ files. It verifies the syntax of the
`files, resolves the relational pointers, then generates the WordNet database that is used
`with the retrieval software and other research tools.
`
`Page 4 of 25
`
`

`

`- 66 -
`
`Word Forms
`In WordNet, a word form is represented as the orthographic representation of an
`individual word or a string of individual words joined with underscore characters. A
`string of words so joined is referred to as a collocation and represents a single concept,
`such as the noun collocation fountain_pen.
`In the lexicographers’ files a word form may be augmented with additional
`information, necessary for the correct processing and interpretation of the data. An
`integer sense number is added for sense disambiguation if the same word form appears
`more than once in a lexicographer file. A syntactic marker, enclosed in parentheses, is
`added to any adjectival word form whose use is limited to a specific syntactic position in
`relation to the noun that it modifies. Each word form in WordNet is known by its
`orthographic representation, syntactic category, semantic field, and sense number.
`Together, these data make a ‘‘key’’ which uniquely identifies each word form in the
`database.
`
`Relational Pointers
`Relational pointers represent the relations between the word forms in a synset and
`other synsets, and are either lexical or semantic. Lexical relations exists between
`relational adjectives and the nouns that they relate to, and between adverbs and the
`adjectives from which they are derived. The semantic relation between adjectives and
`the nouns for which they express values are encoded as attributes. The semantic relation
`between noun attributes and the adjectives expressing their values are also encoded.
`Presently these are the only pointers that cross from one syntactic category to another.
`Antonyms are also lexically related. Synonymy of word forms is implicit by inclusion in
`the same synset. Table 2 summarizes the relational pointers by syntactic category.
`Meronymy is further specified by appending one of the following characters to the
`meronymy pointer: p to indicate a part of something; s to indicate the substance of
`something; m to indicate a member of some group. Holonymy is specified in the same
`manner, each pointer representing the semantic relation opposite to the corresponding
`meronymy relation.
`Many pointers are reflexive, meaning that if a synset contains a pointer to another
`synset, the other synset should contain a corresponding reflexive pointer back to the
`original synset. The Grinder automatically generates the relations for missing reflexive
`pointers of the types listed in Table 3.
`A relational pointer can be entered by the lexicographer in one of two ways. If a
`pointer is to represent a relation between synsets — a semantic relation — it is entered
`following the list of word forms in the synset. Hypernymy always relates one synset to
`another, and is an example of a semantic relation. The lexicographer can also enclose a
`word form and a list of pointers within square brackets ([...]) to define a lexical relation
`between word forms. Relational adjectives are entered in this manner, showing the
`lexical relation between the adjective and the noun that it pertains to.
`
`Page 5 of 25
`
`

`

`- 67 -
`
`WordNet Relational Pointers
`
`iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
`Noun
`Verb
`Adjective
`Adverb
`iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
`Antonym
`Antonym
`Antonym
`Antonym
`!
`!
`!
`!
`Hyponym
`Troponym
`Similar
`& Derived from \
`Hypernym @ Hypernym @ Relational Adj.
`\
`Meronym
`#
`Entailment
`*
`Also See

`=
`Holonym
`% Cause
`>
`Attribute
`=
`Attribute
`Also See

`iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
`
`ccccccccc ccccccccc ccccccccc ccccccccc ccccccccc
`
`Table 2
`
`Reflexive Pointers
`
`iiiiiiiiiiiiiiiiiiiiiii
`Pointer
`Reflect
`iiiiiiiiiiiiiiiiiiiiiii
`Antonym
`Antonym
`Hyponym
`Hypernym
`Hypernym Hyponym
`Holonym
`Meronym
`Meronym
`Holonym
`Similar to
`Similar to
`Attribute
`Attribute
`iiiiiiiiiiiiiiiiiiiiiii
`
`cccccccccc cccccccccc cccccccccc
`
`Table 3
`
`Verb Sentence Frames
`Each verb synset contains a list of verb frames illustrating the types of simple
`sentences in which the verbs in the synset can be used. A list of verb frames can be
`restricted to a word form by using the square bracket syntax described above. See
`Appendix B for a list of the verb sentence frames.
`
`Synset Syntax
`Strings in the source files that conform to the following syntactic rules are treated as
`synsets. Note that this is a brief description of the general synset syntax and is not a
`formal description of the source file format. A formal specification is found in the
`manual page wninput(5) of the ‘‘WordNet Reference Manual’’.
`
`Page 6 of 25
`
`~
`~
`

`

`- 68 -
`
`[1] Each synset begins with a left curly bracket ({).
`[2] Each synset is terminated with a right curly bracket (}).
`[3] Each synset contains a list of one or more word forms, each followed by a
`comma.
`[4] To code semantic relations, the list of word forms is followed by a list of
`relational pointers using the following syntax: a word form (optionally preceded
`by "filename:" to indicate a word form in a different lexicographer file) followed
`by a comma, followed by a relational pointer symbol.
`[5] For verb synsets, "frames:" is followed by a comma separated list of applicable
`verb frames. The verb frames follow all relational pointers.
`[6] To code lexical relations, a word form is followed by a list of elements from [4]
`and/or [5] inside square brackets ([...]).
`[7] To code adjective clusters, each part of a cluster (a head synset, optionally
`followed by satellite synsets) is separated from other parts of a cluster by a line
`containing only hyphens. Each entire cluster is enclosed in square brackets.
`
`Archive System
`The lexicographers’ source files are maintained in an archive system based on the
`Unix Revision Control System (RCS) for managing multiple revisions of text files. The
`archive system has been established for several reasons — to allow the reconstruction of
`any version of the WordNet database, to keep a history of all the changes to
`lexicographers’ files, to prevent people from making conflicting changes to the same file,
`and to ensure that it is always possible to produce an up-to-date version of the WordNet
`database. The programs in the archive system are Unix shell scripts which envelop RCS
`commands in a manner that maintains the desired control over the lexicographers’ source
`files and provides a user-friendly interface for the lexicographers.
`The reserve command extracts from the archive the most recent revision of a given
`file or files and locks the file for as long as a user is working on it. The review command
`extracts from the archive the most recent revision of a given file or files for the purpose
`of examination only, therefore the file is not locked. To discourage making changes,
`review files do not have write permission since any such changes could not be
`incorporated into the archive. The restore command verifies the integrity of a reserved
`file and returns it to the archive system. The release command is used to break a lock
`placed on a file with the reserve command. This is generally used if the lexicographer
`decides that changes should not be returned to the archive. The whose command is used
`to find out whether files are currently reserved, and if so, by whom.
`
`Grinder Utility
`The Grinder is a versatile utility with the primary purpose of compiling the
`lexicographers’ files into a database format that facilitates machine retrieval of the
`information in WordNet. The Grinder has several options that control its operation on a
`set of input files. To build a complete WordNet database, all of the lexicographers’ files
`
`Page 7 of 25
`
`

`

`- 69 -
`
`must be processed at the same time. The Grinder is also used as a verification tool to
`ensure the syntactic integrity of the lexicographers’ files when they are returned to the
`archive system with the restore command.
`
`Implementation
`The Grinder is a multi-pass compiler that is coded in C. The first pass uses a parser,
`written in yacc and lex, to verify that the syntax of the input files conforms to the
`specification of the input grammar and lexical items, and builds an internal representation
`of the parsed synsets. Additional passes refer only to this internal representation of the
`lexicographic data. Pass one attempts to find as many syntactic and structural errors as
`possible. Syntactic errors are those in which the input file fails to conform to the input
`grammar’s specification, and structural errors refer to relational pointers that cannot be
`resolved for some reason. Usually these errors occur because the lexicographer has made
`a typographical error, such as constructing a pointer to a non-existent file, or fails to
`specify a sense number when referring to an ambiguous word form. Pass one cannot
`determine structural errors in pointers to files that are not processed together. When used
`as a verification tool, as from the restore command, only pass one is run.
`In its second pass, the Grinder resolves all of the semantic and lexical pointers. To
`do this, the pointers that were specified in each synset are examined in turn, and the
`target of each pointer (either a synset or a word form in a synset) is found. The source
`pointer is then resolved by adding an entry to the internal data structure which notes the
`‘‘location’’ of the target. In the case of reflexive pointers, the target pointer’s synset is
`then searched for a corresponding reflexive pointer. If found, the data structure
`representing the reflexive pointer is modified to note the ‘‘location’’ of its target, the
`original source pointer. If a reflexive pointer is not found, the Grinder automatically
`creates one with all the pertinent information.
`A subsequent pass through the list of word forms assigns a polysemy index value, or
`sense count, to each word form found in the on-line dictionary. There is a separate sense
`count for each syntactic category that the word form is found in. The Grinder’s final pass
`generates the WordNet database.
`
`Internal Representation
`The internal representation of the lexicographic data is a network of interrelated
`linked lists. A hash table of word forms is created as the lexicographers’ files are parsed.
`Lower-case strings are used as keys; the original orthographic word form, if not in
`lower-case, is retained as part of the data structure for inclusion in the database files. As
`the parser processes an input file, it calls functions which create data structures for the
`word forms, pointers, and verb frames in a synset. Once an entire synset had been
`parsed, a data structure is created for it which includes pointers to the various structures
`representing the word forms, pointers, and verb frames. All of the synsets from the input
`files are maintained as a single linked list. The Grinder’s different passes access the
`structures either through the linked list of synsets or the hash table of word forms. A list
`of synsets that specify each word form is maintained for the purposes of resolving
`
`Page 8 of 25
`
`

`

`- 70 -
`
`pointers and generating the database’s index files.
`
`WordNet Database
`For each syntactic category, two files represent the WordNet database — index.pos
`and data.pos, where pos is either noun, verb, adj or adv (the actual file names may be
`different on platforms other than Sun-4). The database is in an ASCII format that is
`human- and machine-readable, and is easily accessible to those who wish to use it with
`their own applications. Each index file is an alphabetized list of all of the word forms in
`WordNet for the corresponding syntactic category. Each data file contains all of the
`lexicographic data gathered from the lexicographers’ files for the corresponding syntactic
`category, with relational pointers resolved to addresses in data files.
`The index and data files are interrelated. Part of each entry in an index file is a list
`of one or more byte offsets, each indicating the starting address of a synset in a data file.
`The first step to the retrieval of synsets or other information is typically a search for a
`word form in one or more index files to obtain all data file addresses of the synsets
`containing the word form. Each address is the byte offset (in the data file corresponding
`to the syntactic category of the index file) at which the synset’s information begins. The
`information pertaining to a single synset is encoded as described in the Data Files
`section below.
`One shortcoming of the database’s structure is that although all the files are in
`ASCII, and are therefore editable, and in theory extensible, in practice this is almost
`impossible. One of the Grinder’s primary functions is the calculation of addresses for the
`synsets in the data files. Editing any of the database files would (most likely) create
`incorrect byte offsets, and would thus derail many searching strategies. At the present
`time, building a WordNet database requires the use of the Grinder and the processing of
`all lexicographers’ source files at the same time.
`The descriptions of the Index and Data files that follow are brief and are intended to
`provide only a glimpse into the structure, syntax, and organization of the database. More
`detailed descriptions can be found in the manual page wndb(5) included in the
`‘‘WordNet Reference Manual’’.
`
`Index Files
`Word forms in an index file are in lower case regardless of how they were entered in
`the lexicographers’ files. The files are sorted according to the ASCII character set
`collating sequence and can be searched quickly with a binary search.
`Each index file begins with several lines containing a copyright notice, version
`number and license agreement, followed by the data lines. Each line of data contains the
`following information: the sense count from the on-line dictionary; a list of the relational
`pointer types used in all synsets containing the word (this is used by the retrieval
`software to indicate to a user which searches are applicable); a list of indices which are
`byte offsets into the corresponding data file, one for each occurrence of the word form in
`a synset. Each data line is terminated with an end-of-line character.
`
`Page 9 of 25
`
`

`

`- 71 -
`
`Data Files
`A data file contains information corresponding to the synsets that were defined in
`the lexicographers’ files with pointers resolved to byte offsets in data.pos files.
`Each data file begins with several lines containing a copyright notice, version
`number and license agreement. This is followed by a list of the names of all the input
`files that were specified to the Grinder, in the order that they were given on the command
`line, followed by the data lines. Each line of data contains an encoding of the
`information entered by the lexicographer for a synset, as well as additional information
`provided by the Grinder which is useful to the retrieval software and other programs.
`Each data line is terminated with an end-of-line character. In the data files, word forms
`in a synset match the orthographic representation entered in the lexicographers’ files.
`The first piece of information on each line is the byte offset, or address, of the
`synset. This is slightly redundant, since almost any computer program that reads a synset
`from a data file knows the byte offset that it read it from; however this piece of
`information is useful when using UNIX utilities like grep to trace synsets and pointers
`without the use of sophisticated software. It also provides a unique ‘‘key’’ for a synset,
`if a user’s application requires one. An integer, corresponding to the location in the list
`of file names of the file from which the synset originated, follows. This can be used by
`retrieval software to annotate the display of a synset with the name of the originating file,
`and can be helpful for distinguishing senses. A list of word forms, relational pointers,
`and verb frames follows. An optional textual gloss is the final component of a data line.
`Relational pointers are represented by several pieces of information. The symbol
`for the pointer comes first, followed by the address of the target synset and its syntactic
`category (necessary for pointers that cross over into a different syntactic category),
`followed by a field which differentiates lexical and semantic pointers. If a lexical pointer
`is being represented, this field indicates which word forms in the source and target
`synsets the pointer pertains to. For a semantic pointer, this field is 0.
`
`Retrieving Lexical Information
`In order to give a user access to information in the database, an interface is required.
`Interfaces enable end users to retrieve the lexical data and display it via a window-based
`tool or the command line. When considering the role of the interface, it is important to
`recognize the difference between a printed dictionary and a lexical database. WordNet’s
`interface software creates its responses to a user’s requests on the fly. Unlike an on-line
`version of a printed dictionary, where information is stored in a fixed format and
`displayed on demand, WordNet’s information is stored in a format that would be
`meaningless to an ordinary reader. The interface provides a user with a variety of ways
`to retrieve and display lexical information. Different interfaces can be created to serve
`the purposes of different users, but all of them will draw on the same underlying lexical
`database, and may use the same software functions that interface to the database files.
`User interfaces to WordNet can take on many forms. The standard interface is an X
`Windows application, which has been ported to several computer platforms. Microsoft
`Windows and Macintosh interfaces have also been written. An alternative command line
`
`Page 10 of 25
`
`

`

`- 72 -
`
`interface allows the user to retrieve the same data, with exactly the same output as the
`window-based interfaces, although the specification of the retrieval criteria is more
`cumbersome, and the whole effect is less impressive. Nevertheless, the command line
`interface is useful because some users do not have access to windowing environments.
`Shell scripts and other programs can also be written around the command line interface.
`The search process is the same regardless of the type of search requested. The first
`step is to retrieve the index entry located in the appropriate index file. This will contain a
`list of addresses of the synsets in the data file in which the word appears. Then each of
`the synsets in the data file is searched for the requested information, which is retrieved
`and formatted for output. Searching is complicated by the fact that each synset
`containing the search word also contains pointers to other synsets in the data file that may
`need to be retrieved and displayed, depending on the search type. For example, each
`synset in the hypernymic pathway points to the next synset in the hierarchy. If a user
`requests a recursive search on hypernyms a recursive retrieval process is repeated until a
`synset is encountered that contains no further pointers.
`The user interfaces to WordNet and other software tools rely upon a library of
`functions that interface to the database files. A fairly comprehensive set of functions is
`provided: they perform searches and retrievals, morphology, and various other utility
`functions. Appendix C contains a brief description of these functions. The structured,
`flexible design of the library provides a simple programming interface to the WordNet
`database. Low-level, complex, and utility functions are included. The user interface
`software depends upon the more complex functions to perform the actual data retrieval
`and formatting of the search results for display to the user. Low-level functions provide
`basic access to the lexical data in the index and data files, while shielding the
`programmer from the details of opening files, reading files, and parsing a line of data.
`These functions return the requested information in a data structure that can be
`interpreted and used as required by the application. Utility functions allow simple
`manipulations of the search strings.
`The basic searching function, findtheinfo(), receives as its input arguments a word
`form, syntactic category, and search type; findtheinfo() calls a low-level function to find
`the corresponding entry in the index file, and for each sense calls the appropriate function
`to trace the pointer corresponding to the search type. Most traces are done with the
`function traceptrs(), but specialized functions exist for search types which do not
`conform to the standard hierarchical search. As a synset is retrieved from the database, it
`is formatted as required by the search type into a large output buffer. The resulting
`buffer, containing all of the formatted synsets for all of the senses of the search word, is
`returned to the caller. The calling function simply has to print the buffer returned from
`findtheinfo().
`This general search and retrieval algorithm is used in s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket