`
`
`
`
`
`Merrill Communications LLC
`d/b/a Merrill Corporation
`Exhibit 1006 pt. 8
`
`
`
`W IDLandXML
`RPC
`
`I Application interoperability
`
`I Web Interface Definition Language (WIDL)
`
`I XML Remote Procedure Call (RPC)
`
`I WID L specification on CD-ROM
`
`© 1 9 98 THE XML HANDBOOKTM
`
`
`
`W IDLandXML
`RPC
`
`I Application interoperability
`
`I Web Interface Definition Language (WIDL)
`
`I XML Remote Procedure Call (RPC)
`
`I WIDL specification on CD-ROM
`
`©1998 THE XML HANDBOOK1M
`
`
`
`Chapter __ _
`
`J8
`
`ngineers numbered 12-345-68 through 23-457-89 at Oops£(cid:173)
`Commerce Co1poration say 'XML is the solution to interopera(cid:173)
`bility. "These mgineers gang up on the managers until the corpo(cid:173)
`rate gears succumb and reverse direction. Soon the sales reps are saying the
`words "universal data format" more often than the words "object-ori(cid:173)
`ented. " Oops XML-enables its popular Loops product, renames the prod(cid:173)
`uct to Xoops, and then ships Xoops out the door.
`Over the following weeks we eavesdrop on the support engineers: "Well,
`if you have Company Q's product you can use our XML feature with it. ..
`Well, to get it to talk to your purchasing system, you'll have to XML-enable
`the purchasing system ... Well, their program uses a different DTD from
`ours, so Xoops won't interoperate with it."
`Woops, Oops goofed with Xoops: XML alone is not quite enough.
`
`©199R TH E XML HANll~llOK'·"
`
`555
`
`
`
`55 6
`
`CHAPTER 38 I WIDL AND XML RPC
`!18.1 1 XML alone is not quite enough
`
`A client that hands a server data must tell the server what to do with the
`data. The client does this by naming a service. A client must also under(cid:173)
`stand the data that the service returns. Two applications may communicate
`only if they agree on the names of the services and on the types of the input
`and the output data.
`Furthermore, applications must agree on how to represent this data in
`the messages that transfer between them. XML provides a way to represent
`the data, but it does not associate input data and output data with service
`names, and it does not provide a way to map between message types. Some(cid:173)
`thing is missing.
`
`ill. J. J The missing piece
`
`The obvious solution to the problem is to associate input DTDs with out(cid:173)
`pur DTDs and to give these associations service names. This does provide
`enough information for two applications to communicate, but it requires
`both applications to be XML- nabled and it requires the applications to
`conform to the same DTDs. W hile there may not be many XML-enabled
`applications right now eventually there will be, but it is unlikely that all
`will agree on rhe same DTDs.
`A better solution to the interoperability problem is to define application
`interfaces in an abstract way. CORBA, DCOM, and DCE have all taken
`this approach, and in these systems the abstractions are known as interface
`specifications.
`Interface pecifications allow developers to create different but compati(cid:173)
`ble implementadons of interfaces. In CORBA, DCOM, and DCE inter(cid:173)
`Face specifications allow applications written in different programming
`languages to communicate. We need to take this a step further. We must
`also bridge b tween appUcarions whose XML messages conform to different
`DTDs.
`The missing piece is an IDL- an Interface Definition Language. An IDL is
`a language in which interface specifications are written.
`webMethods, Inc. has specified an IDL fo r th is purpose, an IDL called
`WIDL. WIDL interface specifications enable middleware to map transpar(cid:173)
`ently between appli<=<'ltion interfaces and XML message DTDs. By delegat(cid:173)
`ing XML intelligence and accessibility issues to IDL-aware middleware, we
`
`© 1 99 8 TH E XML H AN DB OO K™
`
`
`
`also simplifY the application. An IDL such as WIDL allows us to maximize
`an application's accessibility.
`
`38.2 I WIDL THE IDL
`
`557
`
`J8, 1.2 The role of WID L
`
`WIDL is an acronym for Wt?b Interface Definition Language. It is an IDL
`that is expressed in XML. OMG IDL and Microsoft IDL are other exam(cid:173)
`ples of IDLs, but there are important differences between WIDL and con(cid:173)
`ventional IDLs.
`WIDL differs from other IDLs primarily because it satisfies the 80/20
`rule. It provides 80% of the capability of a conventional IDL with only
`20% of the complexity. WIDL is consequently easy to learn, easy to read,
`and relatively easy to implement.
`This fact provides WIDL with a potentially large user base, but still
`leaves room for more sophisticated IDLs, including new ones based on
`XML. WIDL also goes a step further than conventional IDLs by requiring
`all data items to have names, which simplifies the process of translating
`documents into interfaces.
`webMethods originally developed WIDL to wrap Web sites within APis,
`thereby giving applications programmatic access to the Web. Consequently,
`the WIDL l.x and 2.x specifications defined a single language that both
`specified interfaces and defined how interface specifications map onto a
`Web site.
`WIDL 3.0 places the interface specification and the document-mapping
`implementation in separate XML documents. WIDL 3.0 therefore defines
`two components: an IDL component and a document-mapping compo(cid:173)
`nent. Together these components allow applications to communicate over a
`network regardless of the programming languages in which the applications
`are written, regardless of whether the applications speak XML, and regard(cid:173)
`less of the DTDs to which XML-speaking applications conform.
`
`11.2 I WID L the IDL
`
`Let's take a look at the IDL component ofWIDL 3.0. Example 38-1 shows
`a short but complete example of a WIDL 3.0 interface specification.
`
`©1998 THE XML HANDBOOK™
`
`
`
`55 8
`
`CHAPTER 38 I WIDL AND XML RPC
`
`Example 38-1. A WIDL 3.0 interface specification.
`<WIDL NAME="corn.Fortunes-R-Us.Purchasing" VERSION="3.0">
`<RECORD NAME="FortuneOrder">
`<VALUE NAME="accountiD" TYPE="i4" / >
`<VALUE NAME="zodiacSign"/>
`</RECORD>
`<RECORD NAME="FortuneReceipt">
`<VALUE NAME="orderNumber" TYPE="i4"/>
`<VALUE NAME="fortune"/>
`<VALUE NAME="accountBalance" TYPE="r4" / >
`< / RECORD>
`<METHOD NAME="orderFortune" INPUT="FortuneOrder"
`OUTPUT="FortuneReceipt" RETURN="orderNumber"/>
`</WIDL>
`
`A WIDL document specifies a single interface. Example 38-2 is a DTD
`that defines WIDL documents sufficiently for our purposes.
`
`Interfaces should have names that are unique within their scope of use.
`Naming an interface relative to the reverse order of a domain name provides
`one way to accomplish this. A client may then identifY interfaces by name.
`
`A WIDL element contains one or more RECORD or METHOD ele-
`ments.
`
`i8.2.1 Methods
`
`The METHOD element identifies a service that the client may invoke.
`
`Method names must be unique within the document. Methods may
`optionally have input and output parameters, as indicated by the optional
`INPUT and OUTPUT attributes.
`
`The INPUT attribute provides a link to a RECORD element that enu(cid:173)
`merates the method's input parameters. The OUTPUT attribute provides a
`link to a RECORD element that enumerates the method's output parame(cid:173)
`ters. The tag may optionally indicate that one of the output parameters is
`the return value of the method when the interface is implemented in a pro(cid:173)
`gramming language. Methods may also identifY the exceptions that they
`raise in order to report method invocation failures.
`
`©1998 THE XML HANDBOOK™
`
`
`
`3 8. 2 WIDL THE IDL
`
`559
`
`Example 38-2. WIDL interface DTD.
`(RECORD I METHOD)+>
`< ! ELE!•!ENT WIDL
`<!ATTLIST WIDL
`NAME
`VERSION
`
`CDATA #REQUIRED
`CDATA #FIXED 113.011
`
`>
`<!ELEMENT METHOD
`<!ATTLIST METHOD
`NAME
`INPUT
`OUTPUT
`RETURN
`
`>
`<!ELEMENT RECORD
`<!ATTLIST RECORD
`NAME
`BASE
`
`EMPTY>
`
`CDATA #REQUIRED
`CDATA #IMPLIED
`CDATA #IMPLIED
`CDATA #IMPLIED
`
`(VALUE I LIST I RECORDREF)* >
`
`CDATA #REQUIRED
`CDATA #IMPLIED
`
`>
`
`<1-- Parameters -->
`EMPTY >
`<!ELEMENT VALUE
`<!ATTLIST VALUE
`NAME
`TYPE
`DIM
`
`CDATA #REQUIRED
`11 String 11
`CDATA
`NMTOKEN 0
`
`>
`<!ELEMENT LIST
`<!ATTLIST LIST
`NAME
`DIM
`
`EMPTY >
`
`CDATA #REQUIRED
`NMTOKEN 0
`
`>
`<!ELEMENT RECORDREF EMP'rY >
`<!ATTLIST RECORDREF
`NAME
`RECORD
`DIM
`
`CDATA #REQUIRED
`CDATA #IMPLIED
`Nl'1TOKEN 0
`
`>
`
`J8,2,2 Records
`
`A RECORD element represents a record and conforms to the DTD shown
`in Example 38-2. Record names must be unique within a document. A
`record consists of a collection of zero or more parameter elements, each of
`which must have a unique name within the scope of the record. If the
`record provides a BASE attribute, the record inherits all of the named
`
`© I 9 9 H T H E X M L H A N D B ') () I( LM
`
`
`
`560
`
`CHAPTER 38 I WIDLANDXMLRPC
`
`parameter elements found within the RECORD element to which the
`attribute points.
`The parameter element types are VALUE, LIST, and RECORDREF.
`
`VALUE
`An element that represents lexical data and has an optional TYPE
`attribute that identifies the datatype. Datatypes include strings
`("string"), integers ("i4"), and floats ("r4").
`
`LIST
`A LIST element represents a vector of arbitrary size consisting of
`an arbitrary set of types.
`
`RECORD REF
`The RECORDREF element identifies a RECORD element that
`nests within the RECORDREF's parent record.
`
`Parameters have an optional DIM attribute. When DIM has a value of
`"1 " or "2" the parameter represents a single- or two-dimensional array.
`When the attribute is absent, the value defaults to "0" to indicate that the
`parameter is a single data item and not an array.
`WIDL provides only a small number of simple data types. These data
`rypes are sufficient ro represcn t most of the types available to programming
`languages. WIDL is compatible with other data definition languages such
`as XML-Data and Resource Description Framework (RDF), so WIDL
`may accommodate the sophisticated schema languages that are emerging.
`This allows WIDL to support complex data types without itself becoming
`complex.
`
`JB,J 1 Remote procedure calls
`
`WlDL provides the information that applications need to communicate,
`but it does not perform the a rual communication. An application char
`requests a service of another application must i sue a Remote Procedure
`all, or RPC, ro the oth r application. An app lication issues an RPC by
`packaging a message, sending the message to the other application, and
`then waiting for the r ply message.
`
`© 19 9 8 T H E XML HAND B OO KTM
`
`
`
`38.3 I REMOTE PROCEDURE CALLS
`
`561
`
`The RPC mechanism requires the applications to agree on the form of
`the messages and on the transfer protocol by which the messages travel.
`HTTP provides a POST method that allows a client to submit a document
`to a server and to receive a document in response, so HTTP is a candidate
`protocol. Since HTTP is nearly ubiquitous and since it tunnels through
`firewalls, it's obvious that we should use HTTP. The question is, should
`XML be the message form?
`IIOP and DCE are both industry standards for RPC messages. Either of
`these would work, as it is possible to send them over HTTP. We might
`notice that these message representations are inflexible: senders and receiv(cid:173)
`ers must agree on how a message decomposes data into arguments, includ(cid:173)
`ing the positions of the individual arguments and the structures of these
`arguments.
`Yet if the message representation were XML, the applications would still
`have to agree on the DTDs to which the messages conformed. Just as appli(cid:173)
`cations that use different IIOP or DCE message types cannot communi(cid:173)
`cate, applications that use different DTDs cannot communicate. Without
`looking more closely, we might be inclined to conclude that XML is all
`hype after all.
`However, we are going to look more closely. These problems do afflict
`XML, IIOP, and DCE alike. No reneging here. When we take that closer
`look we find that, unlike IIOP and DCE, XML provides a way to solve the
`problem.
`That is, XML provides a way to ensure that so long as two applications
`agree to conform to the same abstract interface specification, then those
`two applications may communicate - even if the applications are hard(cid:173)
`coded to use different DTDs.
`
`iii,J.I Representing RPC messages in XML
`
`XML is an ideal notation for RPC messages because it allows us to label the
`indjvidual data constituents of a message semantically. These labels are
`XMX.:s tags.
`The only semantic labels available in IIOP and DCE are the numeric
`positions of the constituents. IIOP and DCE do not allow data to move to
`new positions and they do not allow data to grow or shrink in unforeseen
`ways. They also do not allow applications to discover the absence of data
`
`@1998 THE XML HANDBOOK™
`
`
`
`562
`
`CHAPTER 38 I WIDLANDXMLRPC
`
`from a message or to introduce new data items into a message indepen(cid:173)
`dently.
`But the greatest benefit that XML brings to RP
`is that XML moves a
`its l£
`significant amount of information about a message into the messag
`It is a benefit because it moves an equal amount of information out of the
`programs that process the messages. This simplifies the programs that inte(cid:173)
`grate applications.
`tandard
`In all probability, industries will never completely agree on
`interfaces or standard DTDs, so it will always be necessary to translate
`between interfaces. XML provides interoperability by enabling a new class
`of middleware to serve as generic application integrators.
`
`il.i.t Generic and custom message DTDs
`
`There are two ways to represent RPC messages in XML. A generic docu(cid:173)
`ment type is capable of representing any message. The interface specifica(cid:173)
`tion determines the form that a message takes in a generic document type.
`More specifically, the definition of a method uniquely determines the
`DTDs of the request and reply messages that correspond to the method.
`On the other hand, a custom document type is designed only to contain
`the inputs or the outputs of a p rti ular kind of. ervi e. There are many
`possible custom d umeot type definition tor a given interfa method.
`Let's I k at a few examples that ar based on th Fortune -R-Us pw-(cid:173)
`chasing interface shown in Example 38-l. Example 38-3 contains rhree
`RPC messages.
`The first portrays what an instance of a generic document type might
`look like for a message that invokes the "orderFortune" method. The same
`document type scheme might be used for the reply message, which is the
`second message of Example 38-3. The third message shown is an instance
`of a custom-DTD reply.
`There are many possible generic XML document types, and we can
`exp ct to e industries creating them and using rhem. There are also many
`possibl custom d cument types for any given method. We can a.lso expecr
`to ee appli ations using custom document types to message oilier applica(cid:173)
`tions.
`The trick is to ensure that we can integrate applicc rions that use different
`document types to represent the same information. Wirhout this we do not
`have interoperability. XML makes it feasible to provide large-scale interop-
`
`©1998 THE XML HAN D BOOK™
`
`
`
`3 8 0 4
`
`IN T EGRA T ING APPLICATIONS
`
`563
`
`Example 38-3. Generic- and custom-DTD RPC messages.
`<!U'C TYPE="REQUEST " >
`<VALUE NAME="accountiD" TYPE="i4" >2001< / NUMBER>
`<VALUE NAME="zodiacSign" >Aquarius </ VALUE>
`</RPC>
`
`<RPC TYPE="REPLY">
`<VALUE NAME="orderNurnber" TYPE="i4" >438553< / NUMBER>
`<VALUE NAME="fortune" >You will use XML for RPC </VALUE>
`<VALUE NAME="accountBalance" TYPE="r4 " > 65.00< / NUMBER>
`< /RPC>
`
`<FORTUNE-RECEIPT>
`<orderNurnber>438553< / orderNurnber>
`<fortune >You will use XML for RPC </ fo r tune>
`<accountBalance>65.00 </ accountBalanc e>
`</FORTUNE-RECEIPT>
`
`erability, but only if we design our messages so that integration middleware
`may robustly identifY data constituents by !abel.
`
`i8.4 1 Integrating applications
`
`WIDL and XML RPC together enable middleware to integrate applica(cid:173)
`tions. We'll use the term integration server to refer to middleware that
`assumes this kind of responsibility.
`A W1DL interface specification supplies an integration server with the
`information the server needs to map between XML RPC messages and
`native application interfaces. Interface specifications do not themselves
`define the mappings, but they provide a common language in which to
`express them.
`Figure 38-1 shows how integration servers connect applications.
`Integration servers need to integrate a wide variety of application inter(cid:173)
`faces. One application may implement an interface as a set of Java or C++
`methods. Another may implement an interface as a set of functions in C.
`Another application may input and output XML documents conforming
`to custom DTDs. Still another may input and output XML documents in
`~he form of generic RPC messages. Integrating applications requires bridg(cid:173)
`tng between programming languages and document represen tations.
`
`© 1 9 9 8 T H E X M L H A N D B o 0 K TM
`
`
`
`564
`
`CHAPTER 38 I WIDLANDXMLRPC
`
`828 Integration Server
`
`INTERNET
`OR
`EXTRANET
`
`The B2B Integration Server connects applications to applications
`and applications to Web sites, over the Internet or an Extranei.
`
`Figure 38-1 Connecting applications with XML RPC and integration servers.
`
`il8.4.1 Stubs
`
`Conventional RPC bridges programming languages through code snippets
`known as stubs. A stub translates between the details of an interface and a
`common data representation. One side of a stub speaks the language that is
`native to an application and the other side speaks a common data represen(cid:173)
`tation.
`By connecting the data representation ends of two stubs, one may bridge
`between any two programming languages. In a client stub, the language(cid:173)
`specific side consists of a set of APis (functions) that the client may call. In
`a server stub, the language-specific side calls APis that the server itself
`exposes.
`Figure 38-2 illustrates this property of stubs by portraying four stub pair(cid:173)
`ings. Here, XML is the common data representation, but in the usual case
`intervening middleware will hide knowledge ofXML from the stubs.
`In diagram (a) an application written in Java is communicating with
`another application written in Java. Diagrams (b) and (c) show that the
`same application may also communicate with applications written in C++
`or C. Diagram (d) depicts the Java application communicating with an
`
`©1998 THE XML HANDBOOK™
`
`
`
`38.4 I INTEGRATING APPLICATIONS
`
`565
`
`application that speaks XML. In this last scenario the XML-speaking appli(cid:173)
`cation has no stub, since the XML messages pass directly to the application.
`
`IJAVAIXMLI·~~r------..~
`(c) ~
`
`XML
`
`Figure 38-2 Using stubs to make applications interoperable.
`
`Figure 38-3 portrays how a developer uses stubs to integrate applications.
`A developer generates an interface specification in WIDL and then runs the
`specification through a WIDL compiler.
`The WIDL compiler generates two source files in a programming lan(cid:173)
`guage of the developer's choice. Both files are stubs, but one file is a client
`stub and the other is a server stub. The developer then links the appropriate
`stub into the client or server application. The stubs free the application
`from knowledge of XML and allow middleware to map transparently
`between interfaces and different XML document types.
`
`!8.4.2 Document mapping
`
`The document-mapping component of WIDL defines mappings between
`interfaces and XML or HTML documents. This is the portion that pro(cid:173)
`vides the bridge between XML RPC messages and application APis; that is,
`the portion that makes the different XML documenc types indistinguish(cid:173)
`able to the application. webMethods originally developed this facility to
`encapsulate HTML-based Web sites within APis, but because XML does a
`better job of labeling data than HTML does, the technology reaps more
`benefits from XML.
`WlDL document-mapping does its job through bindings. A binding
`specifies how to map raw data into an RPC message or vice versa, where
`"raw data" means "data represented in a way that is natural to a program-
`
`©1998 THE XML HANDBOOKTM
`
`
`
`566
`
`CHAPTER 38 I WIDLANDXMLRPC
`
`---
`
`Integration
`Server
`
`Figure 38-3 Using WIDL for RPC over the Web.
`
`ming language". The best way to make sense of this is to look at an exam(cid:173)
`ple, so consider Example 38-4.
`
`Example 38-4. A WIDL binding.
`<OUTPUT-BINDING NAME="OrderReplyBinding">
`<VALUE NAME="orderNumber" TYPE="i4">
`doc.orderNumber[O] .text</VALUE>
`<VALUE NAME="fortune" >doc. fortune[O] .text</VALUE>
`</OUTPUT-BINDING>
`
`This binding applies to the custom-DTD reply message of Example
`38-3. Each VALUE element corresponds to a data item that the binding
`extracts from the message. In this case the binding extracts two strings, but
`bindings may extract other data types, including records and even XML
`documents.
`Upon receiving the Teply message middlewar applies this binding and
`pas es the Lwo strings ro the applicati n. Since the application ordered the
`tub rerums the
`fortune by issuing a function call on a client stub, the
`strings to the application as utput parameters of the function. MiddJeware
`
`©1998 THE XML HANDB OO K™
`
`
`
`3 8. 4
`
`I INTEGRATI NG A p pLIcA T I 0 N s
`
`567
`
`completely shields the application from knowledge of XML and from
`dependence on a specific XML document type.
`
`In this example, the binding only retrieves the order number and the for(cid:173)
`tune from the reply message, indicating that the application cannot utilize
`the account balance. The content of each VALUE element is a query,
`expressed in a document query language, that specifies where to find these
`items within the message. In this particular case, the query uses the web(cid:173)
`Methods Object Model, but WIDL is compatible with other query lan(cid:173)
`guages as well.
`
`A binding may also define how to translate data into an RPC message.
`WIDL supports several forms of messages. For request messages, the bind(cid:173)
`ing may have the data submitted via the HTTP GET or POST methods,
`thus providing the data as CGI query parameters. The binding may also
`have the data submitted as an XML or an HTML message, constructing the
`message from a particular template. Templates are a straightforward way to
`generate XML.
`
`Bindings provide a simple way to make applications compatible with a
`variety ofXML message DTDs. Bindings are most useful with custom doc(cid:173)
`ument types, since it is possible to hard-code document-mapping for
`generic document types. Generic document types do not require the flexi(cid:173)
`bility that bindings provide, and by hard-coding them middleware can pro(cid:173)
`vide more efficient document-mapping.
`An integration server puts bindings to work by using them to mask dif(cid:173)
`ferences in XML document types. By connecting the variable names of
`bindings to parameter names in interface specifications, an integration
`server may map any XML document type into any programming language.
`
`To get a feel for the benefits of this capability, take a look at Figure 38-4.
`Here industries and businesses have defined a variety of DTDs to which
`different RPC document types conform. The interface defined with WIDL
`captures a superset of the services and data available through the DTDs.
`Although different client applications use different XML document types,
`the integration server is able to bridge these differences to make the applica(cid:173)
`tion universally accessible.
`
`© 1 99 8 THE XML HANDB OOK™
`
`
`
`568
`
`CHAPTER 38 I WIDLANDXMLRPC
`
`Integration
`Server
`
`Back-end App
`(e.g. ERP/MAP,
`E-commerce system
`or Database App)
`
`Figure 38-4 UsingWIDL to make different XML messages interoperable.
`
`!18.5 1 Interoperability attained
`
`WIDL, XML RPC, and integration servers are the pieces that provide
`application interoperability. With them one can make any application
`accessible over a network via XML and HTTP.
`One can also make a single application available to client applications
`that use different XML message formats. Or one can upgrade an applica(cid:173)
`tion, or substitute one application for another, and still allow all previous
`clients to communicate with the new application.
`These capabilities should give us second thoughts about hard-coding
`servers to use specific XML document types. Servers should leave document
`type decisions to middleware, empowering middleware to make the server
`widely accessible.
`
`©1998 THE XML HANDBOOK™
`
`
`
`by Microsoft Corporation,
`
`
`he Internet holds within it the porential for integrating all
`information into a global network (with many private but
`
`integrated domains), promising access to information any
`
`H time and anywhere. However, this potential has yet to be realized. At
`present, the Internet is merely an access medium.
`To realize the Internet’s potential, we need to add intelligent search, data
`xexchange, adaptive presentation, and personalization. The Internet must go
`beyond setting an information access standard and must set an information
`understanding standard, which means a standard way of representing data so
`that software can better search, move, display, and otherwise manipulate
`information currently hidden in contextual obscurity.
`XML is an important step in this direction. XML is a standardized nora—
`tion for representing structured information. It is well-founded theoreti-
`cally and is based on extensive industry experience. Although XML
`documents are simple, readily/«transn'iitted characrer Strings,
`the notation
`easily depicrs a tree strucrure. A tree is a natural Structure that is richer than.
`a Simple flat list, yet also respectful of cognitive and data processing require—
`ments for economy and simplicity.
`
`XML-Data is the name of a proposal for a DTD
`schema language, a new way to create and augment
`document type definitions. This chapter is sponsored
`
`
`
`
`
`
`
`
`@1998 THE XML HANHHDLMLW
`
`57]
`
`
`
`
`
`572
`
`CHAPTER 39 | XML-DATA
`
`Valid XML documents belong to classes m document types — that deter-
`mine the tree structure and other properties of their member documents.
`The properties of the classes themselves comprise their document type defi—
`nitions, or DTDs, which serve the same role for documents that schemas
`do for databases.
`
`And that is where the potential for enhancing the Web lies.
`Today, the only standardized method of creating document type defini-
`tions is through the use of markup declarations, a specialized syntax used
`only for this purpose. What is needed is a method of augmenting the exist-
`ing set ofDTD properties with additional properties that will enable the
`goal of true information understanding.
`Fortunately, there are ways to accomplish this goal by using XML itself.
`The W3C XML Working Group has agreed to work on a DTD schema [am—
`gzmge for XML. The DTD schema language will provide a means of using
`XML instances to define augmented DTDs.
`As a contribution to this effort, ArborText, DataChannel, Inso, and
`Microsoft have co—authored the XIWL-Dam submission to the W3C.
`XZWL-Dam is a notation, in the form of an XML document, that is both
`an alternative to markup declarations for writing DTDs and a means of
`augmenting DTDs with additional capabilities. For example:
`
`I XZWL—Dam supports rich data types, allowing for tighter
`validation of data and reduced application effort. Developers
`can use a list of standard data types, such as numbers or ISO
`8601 dates, or define their own.
`I Through the namespaces facility, X/WL—Dzzm improves
`expressiveness, ensuring the existence of uniquely qualified
`names.
`
`I XML-Dam provides for greater and more efficient semantic
`facilities by incorporating the concept of inheritance, enabling
`one schema to be based on another. For instance, a bookstore
`purchase order schema could be based on a general purpose
`electronic-commerce purchase order schema.
`
`Since XIWL—Dam uses XML instance syntax, there are a number of other
`benefits:
`
`I The same tools that are used to parse XML can be used to
`parse the XIWL-Dam notation.
`
`©1998 THE XML HANDBOOK‘M
`
`
`
`
`
`39.1 |
`
`INTRODUCTION
`
`573
`
`I As the syntax is very similar to HTML, it should be easy for
`HTML authors to learn and read.
`
`I It is easily extensible.
`
`The text of the XZWL—Dam proposal follows, as contained in W3C Note
`05flm 1998. A browseable version, can be found on the CD-ROM and at
`
`http: / /www.w3 . org/TR/l998/NOTE—XML—data. That version identifies the
`individual authors and others whose help and contributions to the proposal
`the authors acknowledged.
`
`39.I
`
`| Introduction
`
`Schema: define the characteristics of classes of objects. This paper describes
`an XML vocabulary for schemas, that is, for defining and documenting
`object classes. It can be used for classes which as strictly syntactic (for exam—
`ple, XML) or those which indicate concepts and relations among concepts
`(as used in relational databases, KR graphs and RDF). The former are called
`“syntactic schemas;” the latter “conceptual schemas.”
`
`For example, an XML document might contain a “book” element which
`lexically contains an “author” element and a “title” element. An XML—Data
`schema can describe such syntax. However, in another context, we may
`simply want to represent more abstractly that books have titles and authors,
`irrespective of any syntax. XML-Data schemas can describe such concep—
`tual relationships. Further, the information about books, titles and authors
`might be stored in a relational database,
`in which XML—Data schemas
`describe row types and key relationships.
`
`One immediate implication of the ideas in this paper is that XML docu—
`ment types can now be described using XML itself, rather than DTD sys—
`tax. Another is that XML—Data schemas provide a common vocabulary for
`ideas which overlap between syntactic, database and conceptual schemas.
`All features can be used together as appropriate.
`
`Schemas are composed principally of declarations for:
`
`@1998 THE XML HANDBOOKTM
`
`
`
`
`
`574
`
`CHAPTER 39 | XML-DATA
`
`Concepts
`
`Classes of objects
`
`I Class hierarchies
`
`I Properties
`
`Relationships
`
`
`
`l Indicated by primary key to foreign key matching
`I Indicated by URI
`
`XML DTD Grammars and Compatibility
`
`I grammatical rules governing the valid nesting of the elements
`and attributes
`
`I
`
`attributes of elements
`
`I internal and external entities, represented by intEntityDecl
`and extEntityDecl
`l notations, represented by notationDcl
`
`Datatypes giving parsing rules and implementation formats.
`
`Mapping rules allowing abbreviated grammars to map to a
`conceptual data model.
`
`39.! | The Schema Element Type
`
`All schema declarations are contained within a schema element, like this:
`<?XML version=‘1.0'
`?>
`<?xml:namespace
`name:”urnzuuid:BDC6E3FO—6DA3—lldl—A2A3—OOAAO0C14882/”
`as=”s"/?>
`<s:schema id='ExampleSchema'>
`<l—— schema goes here. ——>
`</s:schema>
`
`The namespace of the vocabulary described in this document is named
`“urnzuuid:BDC6E3F0-6DA3-l1dl-A2A3—00AAO0C14882/”.
`
`©19981Wm XML HANDBOOKTM
`
`
`
`
`
`39.3 | THE ELEMENTTYPE DECLARATION
`
`575
`
`!
`
`39.3 | The ElementType Declaration
`
`The heart of an XML—Data Schema is the elememType declaration, which
`defines a class of objects (or “type of element” in XML terminology). The id
`attribute serves a dual role of identifying the definition, and also naming
`the specific class.
`<elementType id="author"/>
`Within an elementType, the description subelement may be used to pro-
`vide a human—readable description of the elements purpose.
`<elementType id="author">
`<description>The person, natural or otherwise, who wrote
`the book.</description>
`
`</elementType >
`
`39.1 | Properties and Content Models
`
`Subelements Within elemenflj/pe define characteristics of the classs mem-
`bers. An XML “content model” is a description of the contents that may
`validly appear within a particular element type in a document instance.
`<elementType id=”author">
`<string/>
`</elementType>
`
`<elementType id=”Book”>
`
`<element
`type=”#author” occurs=”ONEORMORE"/>
`</e1ementType>
`The example above defines two elements, author and book, and says that
`a book has one or more authors. The author element may contain a string
`of character data (but no other elements). For example, the following is
`valid:
`<Book>
`<author>Henry Ford</author>
`<author>Samuel Crowther</author>
`</Book>
`
`Within an elementType, various specialized subelements (element,
`group, any, empty, string etc.) indicate which subelements (properties) are
`allowedlrequired. Ordinarily, these imply net only the cardinality of the
`subelernents but also their sequence. (We discuss a means to relax sequence
`later.)
`
`©19981Wm XML HANDBOOKTM
`
`
`
`
`
`
`
`576
`
`CHAPTER 39 | XML—DAT