`Primary Examiner—Diane D. MiZrahi
`(74) Attorney, Agent, or Firm—Lumen Intellectual
`Property Services, Inc.
`A frame-based knowledge representation system is built on
`a relational database that is completely transparent to the
`user. A user at a client machine sends standard knowledge
`base queries across a distributed computer system and the
`system translates the queries into a language suitable for
`querying the database, such as Structured Query Language
`(SQL). The system stores a hierarchical data model that
`includes classes, particular instances of the classes, and
`relations among the classes and instances. Primitive objects,
`such as classes and instances, are organized with their
`associated attributes into frames. The system consists of
`three main tables and auxiliary tables. The frames table
`stores frames with associated slots and values, along with
`associated ownerships, access permissions, and other facets.
`The superclass-set table stores the frames and associated
`superclasses or ancestor classes. The third table, the classes
`table, stores class frames, slots, and values, and a slot type
`designating a slot as own or template. The database also
`includes tables for security de?nitions, logging, and other
`features. To query the knowledge base, the user submits a
`query, preferably according to the Open Knowledge Base
`Connectivity protocol, and the system translates the query
`into SQL. The result is formatted and processed to check
`user permissions before being returned to the user over the
`computer network. The system is accessed through a variety
`of interfaces, including a Web browser and various appli
`cation programming interfaces.
`US 6,442,566 B1
`This application is based on provisional application No.
`60/112,423 ?led Dec. 15, 1998, Which is herein incorporated
`by reference.
`This invention Was supported in part by grant numbers
`DBI-9600637 from the National Science Foundation (NSF),
`and LM-05652 and LM-06422 from the National Institutes
`of Health (NIH). The US. Government has certain rights in
`the invention.
`This invention relates generally to information manage
`ment systems. More particularly, it relates to a frame-based
`knowledge representation system built using a relational
`One of the groWing problems facing scienti?c researchers
`is hoW to integrate and process the enormous amount of data
`being produced daily. While a great deal of data is available
`on the World Wide Web, simply having access to the data is
`useless Without robust methods for searching, organiZing,
`and analyZing the data. Various data models have been
`developed for storing information that can be categoriZed
`using ontologies. An ontology is a system that speci?es the
`classes and relations among classes Within a domain of
`discourse. As ontologies become more complex, existing
`tools for representing data are no longer able to suf?ciently
`represent the data.
`Relational database management systems (RDBMS) are
`by far the most dependable and Widely used architectures for
`building large databases. They contain a feW tables of data
`in Which one or a feW dependent values are associated With
`a set of useful independent features that can be searched
`quickly. An example of a RDBMS table is shoWn in FIG. 1.
`By searching on the names, address and telephone numbers
`for each person can be retrieved quickly. In general, rela
`tional databases are most effective to use and easy to
`maintain When there are a limited number of tables of
`information, linked together logically, and With a very large
`number of records in each table. They are also an ideal
`solution When the structure of the data model is very Well
`understood, not subject to change, and in routine use.
`Changes to queries and data ?elds are dif?cult to implement
`Without completely taking the system out of use and restruc
`turing the model. For example, adding an email address for
`each person in the database containing the table of FIG. 1
`requires redesign of the table structure (either a neW column
`or table) and existing queries. Furthermore, for many data
`sets, the structure is too complex to be represented effec
`tively by a relational database. StraightforWard relational
`representations can leave out important dependencies of
`interest, and effectively ?t the data to the capabilities of the
`database structure, instead of ?tting the structure to the data.
`When the data model becomes a large netWork of interacting
`tables, queries are also much more dif?cult to Write.
`More ?exible data structures, knoWn as knoWledge bases,
`have been developed to more closely model the entities in
`the system of interest and the interactions among them. The
`key distinction betWeen a knoWledge base and a relational
`database is the manner of organiZation of the data. In a
`relational database, data is organiZed into tables that are
`accessed by specifying roWs and columns of the table—the
`tables do not re?ect conceptual knoWledge of the data. In
`contrast, design and organiZation of knoWledge bases
`requires conceptual knoWledge and representation of the
`data. In a knoWledge representation system, all of the
`concepts in the domain of discourse are organiZed into a
`hierarchical tree of classes, With instances of classes located
`at the leaves of each branch. Further, the attributes associ
`ated With instances are stored With the instance, and not
`distributed throughout all of the tables of a relational data
`The tWo primary innovations in knoWledge bases have
`been object oriented databases and frame-based representa
`tion systems. Object oriented approaches alloW more modu
`lar modeling than relational databases. Each piece of data in
`the system is considered an object, and the properties of an
`object are stored locally With the object, along With pointers
`to related objects. Complex data models are easier to
`implement, and hierarchies of objects can be created to help
`organiZe the large amounts of information. These systems
`typically provide bene?ts over relational database systems in
`the richness of available queries over more complex data
`types. HoWever, object oriented databases have signi?cant
`draWbacks that have prevented their acquiring a broad base
`of established users. They require not only that a researcher
`specify the properties of entities, but also that they map them
`onto programming language and database structures. Users
`interested in the stored information often have neither time
`for nor interest in learning about the underlying database
`structure. In addition, object oriented databases suffer from
`the lack of a universally agreed upon query language.
`Frame-based representation systems can be considered
`object oriented architectures that provide built-in support for
`dynamic and hierarchical data modeling, for distinguishing
`betWeen general concepts and particular instances of these
`concepts, for associating particular attributes With each
`concept, for inheriting attribute values from parent concepts,
`and for linking concepts With named relationships. They
`alloW modi?cation of the data model Without the need to
`rebuild the structure, and have a common communication
`protocol for reading to and Writing from the knoWledge
`bases. Developers have created several frame-based knoWl
`edge representation tools, including Ontolingua (A.
`Farquhar, R. Fikes, and J. Rice, “The Ontolingua Server: A
`Tool for Collaborative Ontology Construction,” Tech.
`Report KSL-96-26, Knowledge Systems Laboratory, Stan
`ford University, Stanford, Calif., 1996); Protege (M. A.
`Musen et al., “Protege-II: An Environment for Reusable
`Problem-Solving Methods and Domain Ontologies,” Proc.
`1] CAI ’ 93 1993 Int’l Joint Conf Arti?cial Intelligence, Mor
`gan Kaufmann, San Francisco, 1993); and Theo (T. Mitchell
`et al., “THEO: A FrameWork for Self-Improving Systems,”
`Architectures for Intelligence, K. Van Lehn, ed., LaWrence
`Erlbaum, Hillsdale, N.J., 1989). Such tools have an array of
`features, default reasoning strategies, and knoWledge
`representation constraints. HoWever, some require users to
`install special softWare, and others lack important features,
`such as a persistent back-end storage system for scalability,
`facilities for controlling access based on user permissions,
`an API for prototype development, or easy compatibility
`With Web protocols.
`Several existing frame-based systems map a knoWledge
`model into a relational database, thereby solving the above
`11 of 19

`US 6,442,566 B1
`described problem of requiring specialized software to
`implement knowledge representation systems. These sys
`tems instead allow developers to use well-known and widely
`available databases along with their existing tools, providing
`systems that that are easy to use and access through a variety
`of interfaces. For example, the PERK database back-end to
`the GKB-Editor (P. D. Karp, K. L. Myers, and T. Gruber,
`“The Generic Frame Protocol,” Proc. IJCAI-95. 1995 Int’l.
`Joint Conf Arti?cial Intelligence, Morgan Kaufmann, San
`Francisco, 1995, pp. 768—774) and EcoCyc frame based
`knowledge-base tool (P. D. Karp et al., “EcoCyc: Electronic
`Encyclopedia of Escherichia Coli Genes and Metabolism,”
`NucleicAcia's Research, 27(1), pp. 55—58, 1999) both use a
`relational database for storage. The PERK storage system is
`discussed in detail in P. D. Karp, V. K. Chaudhri, and S. M.
`Paley, “A Collaborative Environment for Authoring Large
`Knowledge Bases, 1997. In the PERK system, individual
`frames (objects and associated attributes) are stored in a
`RDBMS as compressed ASCII text and are unpacked into
`memory on demand. Frames must be loaded from the ?at ?le
`in order to be queried, leading to a start-up delay and limits
`on scalability. More importantly, the client machine access
`ing the information stored on PERK must have special
`software to unwrap the objects and put them in temporary
`A knowledge representation model built on a relational
`database is disclosed in P. M. Nadkarni, “QAV: querying
`entity-attribute-value metadata in a biomedical database,”
`Computer Methods and Programs in Biomedicine, 53, pp.
`93—103, 1997. However, the database structure cannot fully
`support a frame-based knowledge representation system,
`which contains a hierarchy of classes and particular
`instances of classes, because it does not necessarily include
`the key relation of “instance of.” Furthermore, users query
`ing the database must have explicit knowledge of the
`underlying structure, and applications that interact with the
`data must be designed speci?cally for the particular database
`system used.
`There is still a need for a knowledge base data storage
`system that can serve large amounts of data to a variety of
`client interfaces and that uses standard relational database
`tools and standard knowledge-base protocols.
`Accordingly, it is a primary object of the present invention
`to provide a frame-based knowledge representation system
`built on a commercial relational database management sys
`tem that is transparent to a user. Bene?ts associated with
`relational databases, including good performance, data
`coherency, concurrent users, and automatic backup, are
`therefore also provided.
`It is a further object of the invention to provide a knowl
`edge base that requires only conventional relational database
`management software, and does not require specialiZed
`software. The knowledge base is therefore easy to use and
`develop without requiring specialiZed skills.
`It is another object of the present invention to provide a
`knowledge base that is compatible with the current standard
`knowledge base query protocol, Open Knowledge Base
`Connectivity, thereby making the underlying database struc
`ture transparent to users and developers.
`It is an additional object of the invention to provide a
`knowledge base system that is highly scalable to large data
`It is a further object of the invention to provide a knowl
`edge base that is both ?exible and highly structured, allow
`ing for easy modi?cation of the structure as the data model
`develops, and also for complicated hierarchies of data.
`It is another object of the present invention to provide a
`frame-based representation system that allows for data own
`ership and access privileges to be speci?ed for each piece of
`It is an additional object of the invention to provide a
`knowledge base that may be accessed through a Web
`accessible browser or through Application Programming
`Interfaces, allowing for universal accessibility of data and
`facilitating creation of new interfaces, while maintaining
`transparency of the underlying data structure.
`These objects and advantages are attained by a frame
`based representation system built on a relational database
`that is hidden from the user. While the data model is
`consistent with other frame-based systems, it uses a simple
`and novel data structure to organiZe and store the ontology
`and instances. Standard frame-based queries for retrieving
`speci?c portions of the stored data are implemented in a
`novel manner that is consistent with the underlying data
`The present invention provides a computer-readable
`medium encoded with a relational database for storing a
`frame knowledge system that has classes, relations, and
`instances of the classes. The database includes a frames table
`that has columns for storing frames, including class frames
`representing classes and instance frames representing
`instances, at least one slot associated with each of the
`frames, and a value associated with each of the slots. The
`slots represent relations, and include the relation known as
`instance-of, which describes the relation between an
`instance of a class and the class. Preferably, the frames table
`also has columns for storing an access permission, an
`ownership, and at least one facet associated with each slot.
`Preferably, the database also includes a classes table that
`has columns for storing the class frames with associated
`class slots and class values, all of which are also stored in the
`frames table. However, the classes table also includes a slot
`type—own or template—associated with each slot, and
`distinguishes between own slots, which characteriZe the
`class, and template slots, which characteriZe instances of the
`class. Each template slot associated with a particular class
`frame in the classes table is also stored in the frames table,
`where it is associated with a corresponding instance frame of
`the particular class. The database may also include a class
`hierarchy table that has columns for storing class frames and
`at least one superclass associated with each class frame.
`The present invention also provides a method for query
`ing a frame-based representation system that contains a set
`of relational database tables for storing frames and associ
`ated attributes. The method occurs in a server in a distributed
`computer system, and includes the following steps: receiv
`ing a query in a ?rst format for a subset of frames from a
`client computer; translating the query into a second format
`for querying the tables; applying the query in the second
`format to the set of tables to select the subset, and trans
`mitting output including the subset to the client computer.
`The ?rst format is preferably a knowledge base format, most
`preferably OKBC, and the second format is preferably a
`relational database format, most preferably SQL. The query
`in the second format includes a predetermined attribute
`related to the query in the ?rst format, and each frame in the
`retrieved subset is associated with the predetermined
`attribute. The query in the second format may be applied to
`12 of 19

`US 6,442,566 B1
`one, some, or all of the tables. Preferably, the set of
`relational database tables includes the tables described
`above. In that case, the predetermined attribute includes a
`predetermined slot and predetermined value. The method
`preferably includes a step of processing the subset to gen
`erate formatted output. The processing step may include
`comparing an access permission of each retrieved frame,
`stored in the frames table, with a client identi?er for the
`client computer. Based on the comparison, it is determined
`whether the client may access the frames in the subset. The
`database may be stored on the server or on a database
`computer that is distinct from the server.
`The present invention also provides a method for adding
`data to a frame-based representation system that contains a
`set of relational database tables for storing frames and
`associated attributes. The method occurs in a server in a
`distributed computer system, and includes the following
`steps: receiving a query in a ?rst format to create a new
`frame from a client computer; translating the query into a
`second format for querying the tables; and applying the
`query in the second format to the set of tables to create a new
`record. The ?rst format is preferably a knowledge base
`format, most preferably OKBC, and the second format is
`preferably a relational database format, most preferably
`SQL. The second format includes parameters related to the
`query in the ?rst format, and the new record represents the
`new frame and contains the parameters. Preferably, the
`tables are the tables described above, and the parameters
`include a predetermined slot and value. Preferably, the new
`record also contains a client identi?er associated with the
`client computer.
`FIG. 1 illustrates a table of a prior art relational database
`management system.
`FIG. 2 is an example of a prior art class hierarchy showing
`classes and subclasses.
`FIG. 3 is a schematic diagram of a fragment of a prior art
`frame-based data model showing class and instance frames.
`FIG. 4 illustrates a frames table of the present invention.
`FIG. 5 illustrates a superclass-set (or class hierarchy) table
`of the present invention.
`FIG. 6 illustrates a classes table of the present invention.
`FIG. 7A is a block diagram of a preferred architecture for
`implementing the present invention.
`FIG. 7B is a block diagram of an alternate architecture for
`implementing the present invention.
`FIG. 8 illustrates a Web browser interface used with the
`present invention, in browse mode.
`FIG. 9 illustrates a Web browser interface used with the
`present invention, in edit mode.
`Although the following detailed description contains
`many speci?cs for the purposes of illustration, anyone of
`ordinary skill in the art will appreciate that many variations
`and alterations to the following details are within the scope
`of the invention. Accordingly, the following preferred
`embodiment of the invention is set forth without any loss of
`generality to, and without imposing limitations upon, the
`claimed invention.
`The present invention, known as Sophia, provides a
`knowledge base storage system that uses a relational data
`base back-end that is completely transparent to a user or
`application program. To access the stored data, a user
`submits standard frame-based queries to the system, which
`translates the queries into a query format suitable for the
`relational database. The database query, preferably in Struc
`tured Query Language (SOL), is parameteriZed to use vari
`ables supplied by the initial query, but combines these
`variables with explicit knowledge of the database’s table
`structure to retrieve the correct result.
`Although Sophia may be used for storage and access of
`knowledge bases in any domain of interest, it will be
`described below with reference to RiboWeb, a particular
`computer-based environment for cooperative storage and
`computation of data relating to the structure of the ribosome.
`RiboWeb has been described in R. B. Altman, M. Bada, X.
`J. Cahi, M. W. Carillo, R. O. Chen, and N. F. Abernethy,
`“RiboWeb: An Ontology-Based System for Collaborative
`Molecular Biology,” IEEE Intelligent Systems, 14(5), pp.
`68—76, 1999. It is to be understood that RiboWeb is an
`example meant to illustrate, but not limit, the implementa
`tion of Sophia.
`Sophia uses a standard frame-based knowledge model for
`organiZing the data in the domain of discourse. This knowl
`edge model has been developed in tandem with Open
`Knowledge Base Connectivity (OKBC), discussed further
`below, a standard protocol for accessing knowledge bases.
`The knowledge model contains the following elements:
`Classes (also called concepts) are organiZed in a tax
`onomy or “is-a” hierarchy that begins with the most
`general classes and continues to specialiZe to narrowly
`de?ned classes.
`Instances occur at the leaves of the hierarchy and are used
`to store speci?c data.
`Relations can be unitary or binary and are used to orga
`niZe the classes and instances within the hierarchy.
`The set of classes and relations constitute an ontology; the
`combination of ontology and instances constitutes acknowl
`edge base. FIG. 2 illustrates a class hierarchy used in
`RiboWeb, with only one branch (Biochemical-Data) fully
`expanded. Under the primary class Thing are six subclasses:
`Data, Methods, Organism, Physical-Thing, Reference
`Information, and RiboWeb-Output. Each subclass is then
`further divided into subclasses. A parent class is the class
`directly above a particular class; for example, Footprinting
`Data is a parent class of Chemical-Footprinting-Data. Super
`classes or ancestor classes include all classes above a
`particular class; Chemical-Footprinting-Data has as its
`ancestors Footprinting-Data, Biochemical-Data, Measured
`Data, Data, and Thing. Instances (i.e. particular pieces of
`data) occur only at the ends of each branch; for example, any
`instance of Biochemical-Data must be further classi?ed into
`a subclass. However, an instance of a class is also considered
`to be an instance of each of its superclasses. Any knowledge
`base used to store data organiZed into such a hierarchy as
`shown in FIG. 2 must include basic relations of subclass-of
`and instance-of. While the organiZational structure of data in
`knowledge bases can be altered as more data is gathered and
`more relations understood, a conceptual hierarchy similar to
`the one in FIG. 2 must be determined during design of a
`particular knowledge base.
`A frame-based system contains a particular representation
`of a knowledge base. It uses frames as primitive objects that
`represent entities in the domain of discourse. Each object,
`including classes and instances, in the system is represented
`as a frame, so called because a frame encloses the entire
`description of an object. An example of a small fragment of
`a frame-based knowledge system is illustrated in FIG. 3,
`13 of 19

`US 6,442,566 B1
`With frames shown as rectangles and circles. Rectangles
`represent class frames, and circles represent instance frames.
`In FIG. 3, three classes are shown: Peer-RevieWed
`Publication; and tWo of its subclasses, Journal-Article, and
`Conference-Proceeding. Also illustrated is a particular
`instance of Journal-Article, Biochemistry-Boileau-22-3162.
`Frames in the system are linked by “is-a” relationships:
`Journal-Article is a subclass of Peer-RevieWed-Publication,
`and Biochemistry-Boileau-22-3162 is an instance of
`Journal-Article. Associated With each frame are slots and
`values providing the de?nitions of the frame. Aslot can have
`a default value that is inherited by the instances underneath
`the class. Classes have tWo types of slots: oWn slots and
`template slots. Template slots describe slots and values
`considered to hold for each instance of the class, While oWn
`slots hold for the class itself. The slots illustrated in FIG. 3
`for each class are template slots. The frame model supports
`inheritance: a subclass inherits slots and values from a class,
`and an instance inherits template slots and values from its
`class. Frames may also contain facets that modify slots and
`provide secondary information, such as annotations of data
`Note that values of slots associated With instances may be
`strings, numbers, or references (i.e. hyperlinks) to other
`instances. For example, in FIG. 3, one value of the slot
`Reports is Biochemistry-Boileau-22-3162-Crosslink-1. This
`value is an instance of the class Cross-Linking-Data. The
`instance frame for Biochemistry-Boileau-22-3162
`Crosslink-1 (not shoWn) contains speci?c crosslinking data
`that Were obtained experimentally and reported in this
`particular journal article. This linking of instances creates a
`highly interWoven netWork of data, Which cannot in general
`be adequately represented using a standard relational data
`base and traditional methods. This example also implies that
`the slot Reports, a binary relation, has particular character
`istics; it must have a domain in the class Journal-Articles and
`a range in the class Data (and its subclasses). Thus, there
`must be a frame for storing the characteristics of Reports as
`slots and values. Values of slots may also be links to external
`online databases or to image ?les.
`Sophia implements a frame knoWledge system using a
`relational database backend and Structured Query Language
`for querying the database. Any relational database manage
`ment system, and any server computer, may be used for
`storing information. Selection of an appropriate RDBMS
`and server is in?uenced primarily by the amount of data,
`number of

