`Extensible Markup Language (XML) 1.0
`W3C Recommendation 10-Feb-98
`This version
`Latest version
`Previous version
`Tim Bray, Textuality and Netscape (tbray@
`Jean Paoli, Microsoft (jeanpa@
`C. M. Sperberg-McQueen, University of Illinois at Chicago (cmsmcq@
`The Extensible Markup Language (XML) is a subset of SGML that is completely described
`in this document. Its goal is to enable generic SGML to be served, received, and processed
`on the Web in the way that is now possible with HTML. XML has been designed for ease of
`implementation and for interoperability with both SGML and HTML.
`Status of this document
`This document has been reviewed by W3C Members and other interested parties and has
`been endorsed by the Director as a W3C Recommendation. It is a stable document and may
`be used as reference material or cited as a normative reference from another document.
`W3C's role in making the Recommendation is to draw attention to the specification and to
`promote its widespread deployment. This enhances the functionality and interoperability of
`the Web.
`This document specifies a syntax created by subsetting an existing, widely used international
`text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as
`amended and corrected) for use on the World Wide Web. It is a product of the W3C XML
`Activity, details of which can be found at A list of current W3C
`Recommendations and other technical documents can be found at
`This specification uses the term URI, which is defined by [Berners-Lee], a work in progress
`expected to update [RFC1738] and [RFC1808].
`The list of known errors in this specification is available at
`Please report errors in this document to

`Table of Contents
`1. Introduction ........................................................................................................................................ 1
`1.1 Origin and Goals........................................................................................................................ 1
`1.2 Terminology.............................................................................................................................. 1
`2. Documents........................................................................................................................................... 2
`2.1 Well-Formed XML Documents.................................................................................................. 3
`2.2 Characters.................................................................................................................................. 3
`2.3 Common Syntactic Constructs.................................................................................................... 3
`2.4 Character Data and Markup........................................................................................................ 4
`2.5 Comments ................................................................................................................................. 5
`2.6 Processing Instructions............................................................................................................... 5
`2.7 CDATA Sections....................................................................................................................... 5
`2.8 Prolog and Document Type Declaration ..................................................................................... 5
`2.9 Standalone Document Declaration.............................................................................................. 7
`2.10 White Space Handling.............................................................................................................. 8
`2.11 End-of-Line Handling .............................................................................................................. 8
`2.12 Language Identification............................................................................................................ 8
`3. Logical Structures............................................................................................................................. 10
`3.1 Start-Tags, End-Tags, and Empty-Element Tags....................................................................... 10
`3.2 Element Type Declarations ...................................................................................................... 11
`3.3 Attribute-List Declarations....................................................................................................... 13
`3.4 Conditional Sections ................................................................................................................ 15
`4. Physical Structures ........................................................................................................................... 16
`4.1 Character and Entity References............................................................................................... 16
`4.2 Entity Declarations .................................................................................................................. 17
`4.3 Parsed Entities ......................................................................................................................... 19
`4.4 XML Processor Treatment of Entities and References............................................................... 20
`4.5 Construction of Internal Entity Replacement Text..................................................................... 22
`4.6 Predefined Entities................................................................................................................... 23
`4.7 Notation Declarations .............................................................................................................. 23
`4.8 Document Entity...................................................................................................................... 23
`5. Conformance..................................................................................................................................... 24
`5.1 Validating and Non-Validating Processors................................................................................ 24
`5.2 Using XML Processors ............................................................................................................ 24
`6. Notation............................................................................................................................................. 24
`A. References ........................................................................................................................................ 26
`A.1 Normative References ............................................................................................................. 26
`A.2 Other References..................................................................................................................... 26
`B. Character Classes............................................................................................................................. 27
`C. XML and SGML (Non-Normative) ................................................................................................. 29
`D. Expansion of Entity and Character References (Non-Normative).................................................. 29
`E. Deterministic Content Models (Non-Normative)............................................................................. 30
`F. Autodetection of Character Encodings (Non-Normative)................................................................ 30
`G. W3C XML Working Group (Non-Normative)................................................................................ 32

`Extensible Markup Language (XML)1.0
`1. Introduction
`Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents
`and partially describes the behavior of computer programs which process them. XML is an application
`profile or restricted form of SGML, the Standard Generalized Markup Language [ISO8879]. By
`construction, XML documents are conforming SGML documents.
`XML documents are made up of storage units called entities, which contain either parsed or unparsed data.
`Parsed data is made up of characters, some of which form character data, and some of which form markup.
`Markup encodes a description of the document's storage layout and logical structure. XML provides a
`mechanism to impose constraints on the storage layout and logical structure.
`A software module called an XMLprocessor is used to read XML documents and provide access to their
`content and structure. It is assumed that an XML processor is doing its work on behalf of another module,
`called the application. This specification describes the required behavior of an XML processor in terms of
`how it must read XML data and the information it must provide to the application.
`1.1 Origin and Goals
`XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board)
`formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon
`Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously
`known as the SGML Working Group) also organized by the W3C. The membership of the XML Working
`Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.
`The design goals for XML are:
`1. XML shall be straightforwardly usable over the Internet.
`2. XML shall support a wide variety of applications.
`3. XML shall be compatible with SGML.
`4. It shall be easy to write programs which process XML documents.
`5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
`6. XML documents should be human-legible and reasonably clear.
`7. The XML design should be prepared quickly.
`8. The design of XML shall be formal and concise.
`9. XML documents shall be easy to create.
`10.Terseness in XML markup is of minimal importance.
`This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet
`RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country
`name codes), provides all the information necessary to understand XML Version 1.0 and construct
`computer programs to process it.
`This version of the XML specification may be distributed freely, as long as all text and legal notices remain
`1.2 Terminology
`The terminology used to describe XML documents is defined in the body of this specification. The terms
`defined in the following list are used in building those definitions and in describing the actions of an XML
`Conforming documents and XML processors are permitted to but need not behave as described.
`Extensible Markup Language (XML)1.0
`Conforming documents and XML processors are required to behave as described; otherwise they are
`in error.
`A violation of the rules of this specification; results are undefined. Conforming software may detect
`and report an error and may recover from it.
`fatal error
`An error which a conforming XML processor must detect and report to the application. After
`encountering a fatal error, the processor may continue processing the data to search for further errors
`and may report such errors to the application. In order to support correction of errors, the processor
`may make unprocessed data from the document (with intermingled character data and markup)
`available to the application. Once a fatal error is detected, however, the processor must not continue
`normal processing (i.e., it must not continue to pass character data and information about the
`document's logical structure to the application in the normal way).
`at user option
`Conforming software may or must (depending on the modal verb in the sentence) behave as
`described; if it does, it must provide users a means to enable or disable the behavior described.
`validity constraint
`A rule which applies to all valid XML documents. Violations of validity constraints are errors; they
`must, at user option, be reported by validating XML processors.
`well-formedness constraint
`A rule which applies to all well-formed XML documents. Violations of well-formedness constraints
`are fatal errors.
`(Of strings or names:) Two strings or names being compared must be identical. Characters with
`multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and
`base+diacritic forms) match only if they have the same representation in both strings. At user option,
`processors may normalize such characters to some canonical form. No case folding is performed. (Of
`strings and rules in the grammar:) A string matches a grammatical production if it belongs to the
`language generated by that production. (Of content and content models:) An element matches its
`declaration when it conforms in the fashion described in the constraint Section 3: Element Valid.
`for compatibility
`A feature of XML included solely to ensure that XML remains compatible with SGML.
`for interoperability
`A non-binding recommendation included to increase the chances that XML documents can be
`processed by the existing installed base of SGML processors which predate the WebSGML
`Adaptations Annex to ISO 8879.
`2. Documents
`A data object is an XMLdocument if it is well-formed, as defined in this specification. A well-formed XML
`document may in addition be valid if it meets certain further constraints.
`Each XML document has both a logical and a physical structure. Physically, the document is composed of
`units called entities. An entity may refer to other entities to cause their inclusion in the document. A
`document begins in a "root" or document entity. Logically, the document is composed of declarations,
`elements, comments, character references, and processing instructions, all of which are indicated in the
`document by explicit markup. The logical and physical structures must nest properly, as described in
`Section 4.3.2: Well-Formed Parsed Entities.
`Extensible Markup Language (XML)1.0
`2.1 Well-Formed XML Documents
`A textual object is a well-formed XML document if:
`1. Taken as a whole, it matches the production labeled document.
`2. It meets all the well-formedness constraints given in this specification.
`3. Each of the parsed entities which is referenced directly or indirectly within the document is well-
`::= prolog element Misc*
`Matching the document production implies that:
`1. It contains one or more elements.
`2. There is exactly one element, called the root, or document element, no part of which appears in the
`content of any other element. For all other elements, if the start-tag is in the content of another
`element, the end-tag is in the content of the same element. More simply stated, the elements,
`delimited by start- and end-tags, nest properly within each other.
`As a consequence of this, for each non-root element C in the document, there is one other element P in the
`document such that C is in the content of P, but is not in the content of any other element that is in the
`content of P. P is referred to as the parent of C, and C as a child of P.
`2.2 Characters
`A parsed entity contains text, a sequence of characters, which may represent markup or character data. A
`character is an atomic unit of text as specified by ISO/IEC 10646 [ISO10646]. Legal characters are tab,
`carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. The use of
`"compatibility characters", as defined in section 6.8 of [Unicode], is discouraged.
`Character Range
`::= #x9 | #xA | #xD | [#x20-#xD7FF] |
`[#xE000-#xFFFD] | [#x10000-#x10FFFF]
`/*any Unicode character,
`excluding the surrogate blocks,
`FFFE, and FFFF. */
`The mechanism for encoding character code points into bit patterns may vary from entity to entity. All
`XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling
`which of the two is in use, or for bringing other encodings into play, are discussed later, in Section 4.3.3:
`Character Encoding in Entities.
`2.3 Common Syntactic Constructs
`This section defines some symbols used widely in the grammar.
`S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs.
`White Space
`::= (#x20 | #x9 | #xD | #xA)+
`Characters are classified for convenience as letters, digits, or other characters. Letters consist of an
`alphabetic or syllabic base character possibly followed by one or more combining characters, or of an
`ideographic character. Full definitions of the specific characters in each class are given in Appendix B:
`Character Classes.
`A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with
`letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names
`Extensible Markup Language (XML)1.0
`beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m')
`('L'|'l')), are reserved for standardization in this or future versions of this specification.
`NOTE: Th e colon ch aracte r w ith in XMLnam e s is re s e rve d for e xpe rim e ntation w ith nam e s pace s . Its
`m e aning is e xpe cte d to be s tandardiz e d ats om e future point, atw h ich pointth os e docum e nts us ing th e
`colon for e xpe rim e ntalpurpos e s m ay ne e d to be update d. (Th e re is no guarante e th atany nam e -s pace
`m e ch anis m adopte d for XMLw illin factus e th e colon as a nam e -s pace de lim ite r.) In practice , th is m e ans
`th atauth ors s h ould notus e th e colon in XMLnam e s e xce ptas partofnam e -s pace e xpe rim e nts , butth at
`XMLproce s s ors s h ould acce ptth e colon as a nam e ch aracte r.
`An Nmtoken (name token) is any mixture of name characters.
`Names and Tokens
`::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar |
`::= (Letter | '_' | ':') (NameChar)*
`::= Name (S Name)*
`::= (NameChar)+
`::= Nmtoken (S Nmtoken)*
`Literal data is any quoted string not containing the quotation mark used as a delimiter for that string.
`Literals are used for specifying the content of internal entities (EntityValue), the values of attributes
`(AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without
`scanning for markup.
`[10] AttValue
`[11] SystemLiteral
`[12] PubidLiteral
`[13] PubidChar
`::= '"' ([^%&"] | PEReference | Reference)* '"'
`"'" ([^%&'] | PEReference | Reference)* "'"
`::= '"' ([^<&"] | Reference)* '"'
`"'" ([^<&'] | Reference)* "'"
`::= ('"' [^"]* '"') | ("'" [^']* "'")
`::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
`::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
`2.4 Character Data and Markup
`Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags,
`empty-element tags, entity references, character references, comments, CDATA section delimiters,
`document type declarations, and processing instructions.
`All text that is not markup constitutes the character data of the document.
`The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used
`as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also
`legal within the literal entity value of an internal entity declaration; see Section 4.3.2: Well-Formed
`Parsed Entities. If they are needed elsewhere, they must be escaped using either numeric character
`references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented
`using the string "&gt;", and must, for compatibility, be escaped using "&gt;" or a character reference
`when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA
`In the content of elements, character data is any string of characters which does not contain the start-
`delimiter of any markup. In a CDATA section, character data is any string of characters not including the
`CDATA-section-close delimiter, "]]>".
`To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character
`(') may be represented as "&apos;", and the double-quote character (") as "&quot;".
`Character Data
`[14] CharData
`::= [^<&]* - ([^<&]* ']]>' [^<&]*)
`Extensible Markup Language (XML)1.0
`Comments may appear anywhere in a document outside other markup; in addition, they may appear within
`the document type declaration at places allowed by the grammar. They are not part of the document's
`character data; an XML processor may, but need not, make it possible for an application to retrieve the text
`of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments.
`[15] Comment
`::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
`An example of a comment:
`<!-- declarations for <head> & <body> -->
`2.6 Processing Instructions
`Processing instructions (PIs) allow documents to contain instructions for applications.
`Processing Instructions
`[16] PI
`::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
`[17] PITarget
`::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
`PIs are not part of the document's character data, but must be passed through to the application. The PI
`begins with a target (PITarget) used to identify the application to which the instruction is directed. The
`target names "XML", "xml", and so on are reserved for standardization in this or future versions of this
`specification. The XML Notation mechanism may be used for formal declaration of PI targets.
`2.7 CDATA Sections
`CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text
`containing characters which would otherwise be recognized as markup. CDATA sections begin with the
`string "<![CDATA[" and end with the string "]]>":
`CDATA Sections
`[18] CDSect
`[19] CDStart
`[20] CData
`[21] CDEnd
`::= CDStart CData CDEnd
`::= '<![CDATA['
`::= (Char* - (Char* ']]>' Char*))
`::= ']]>'
`Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and
`ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and
`"&amp;". CDATA sections cannot nest.
`An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as
`character data, not markup:
`<![CDATA[<greeting>Hello, world!</greeting>]]>
`2.8 Prolog and Document Type Declaration
`XML documents may, and should, begin with an XMLdeclaration which specifies the version of XML
`being used. For example, the following is a complete XML document, well-formed but not valid:
`<?xml version="1.0"?>
`<greeting>Hello, world!</greeting>
`and so is this:
`<greeting>Hello, world!</greeting>
`Extensible Markup Language (XML)1.0
`The version number "1.0" should be used to indicate conformance to this version of this specification; it is
`an error for a document to use the value "1.0" if it does not conform to this version of this specification. It
`is the intent of the XML working group to give later versions of this specification numbers other than
`"1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any
`are produced, to use any particular numbering scheme. Since future versions are not ruled out, this
`construct is provided as a means to allow the possibility of automatic version recognition, should it become
`necessary. Processors may signal an error if they receive documents labeled with versions they do not
`The function of the markup in an XML document is to describe its storage and logical structure and to
`associate attribute-value pairs with its logical structures. XML provides a mechanism, the document type
`declaration, to define constraints on the logical structure and to support the use of predefined storage units.
`An XML document is valid if it has an associated document type declaration and if the document complies
`with the constraints expressed in it.
`The document type declaration must appear before the first element in the document.
`[22] prolog
`[23] XMLDecl
`[24] VersionInfo
`[25] Eq
`[26] VersionNum
`[27] Misc
`::= XMLDecl? Misc* (doctypedecl Misc*)?
`::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
`::= S 'version' Eq (' VersionNum ' | " VersionNum ")
`::= S? '=' S?
`::= ([a-zA-Z0-9_.:] | '-')+
`::= Comment | PI | S
`The XML document type declaration contains or points to markup declarations that provide a grammar for
`a class of documents. This grammar is known as a document type definition, or DTD. The document type
`declaration can point to an external subset (a special kind of external entity) containing markup
`declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD
`for a document consists of both subsets taken together.
`A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a
`notation declaration. These declarations may be contained in whole or in part within parameter entities, as
`described in the well-formedness and validity constraints below. For fuller information, see Section 4:
`Physical Structures.
`Document Type Definition
`[28] doctypedecl
`::= '<!DOCTYPE' S Name (S ExternalID)? S?
`('[' (markupdecl | PEReference | S)*
`']' S?)? '>'
`::= elementdecl | AttlistDecl | EntityDecl
`| NotationDecl | PI | Comment
`[29] markupdecl
`[VC: RootEle m e ntType ]
`[VC: Prope rDe claration/PE
`Ne s ting]
`[W FC: PEs in Inte rnalSubs e t]
`The markup declarations may be made up in whole or in part of the replacement text of parameter entities.
`The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on)
`describe the declarations after all the parameter entities have been included.
`VALIDITY CONSTRAINT: Root Element Type. Th e
`in th e docum e nttype de claration m us tm atch
`th e
`e le m e nttype ofth e roote le m e nt.
`VALIDITY CONSTRAINT: Proper Declaration/PE Nesting. Param e te r-e ntity re place m e ntte xtm us tbe
`prope rly ne s te d w ith m ark up de clarations . Th atis to s ay, ife ith e r th e firs tch aracte r orth e las tch aracte r ofa
`above ) is containe d in th e re place m e ntte xtfor a param e te r-e ntity
`m ark up de claration (
`re fe re nce , both m us tbe containe d in th e s am e re place m e ntte xt.
`WELL-FORMEDNESS CONSTRAINT: PEs in Internal Subset. In th e inte rnalDTD s ubs e t, param e te r-
`e ntity re fe re nce s can occur only w h e re m ark up de clarations can occur, notw ith in m ark up de clarations . (Th is
`doe s notapply to re fe re nce s th atoccur in e xte rnalparam e te r e ntitie s or to th e
`e xte rnals ubs e t.)
`Extensible Markup Language (XML)1.0
`Like the internal subset, the external subset and any external parameter entities referred to in the DTD must
`consist of a series of complete markup declarations of the types allowed by the non-terminal symbol
`markupdecl, interspersed with white space or parameter-entity references. However, portions of the
`contents of the external subset or of external parameter entities may conditionally be ignored by using the
`conditional section construct; this is not allowed in the internal subset.
`External Subset
`[30] extSubset
`::= TextDecl? extSubsetDecl
`[31] extSubsetDecl ::= ( markupdecl | conditionalSect | PEReference | S )*
`The external subset and external parameter entities also differ from the internal subset in that in them,
`parameter-entity references are permitted within markup declarations, not only between markup
`An example of an XML document with a document type declaration:
`<?xml version="1.0"?>
`<!DOCTYPE greeting SYSTEM "hello.dtd">
`<greeting>Hello, world!</greeting>
`The system identifier "hello.dtd" gives the URI of a DTD for the document.
`The declarations can also be given locally, as in this example:
`<?xml version="1.0" encoding="UTF-8" ?>
`<!DOCTYPE greeting [
`<!ELEMENT greeting (#PCDATA)>
`<greeting>Hello, world!</greeting>
`If both the external and internal subsets are used, the internal subset is considered to occur before the
`external subset. This has the effect that entity and attribute-list declarations in the internal subset take
`precedence over those in the external subset.
`2.9 Standalone Document Declaration
`Markup declarations can affect the content of the document, as passed from an XML processor to an
`application; examples are attribute defaults and entity declarations. The standalone document declaration,
`which may appear as a component of the XML declaration, signals whether or not there are such
`declarations which appear external to the document entity.
`Standalone Document Declaration
`[32] SDDecl
`::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
`[VC: Standalone Docum e nt
`| ('"' ('yes' | 'no') '"'))
`De claration]
`In a standalone document declaration, the value "yes" indicates that there are no markup declarations
`external to the document entity (either in the DTD external subset, or in an external parameter entity
`referenced from the internal subset) which affect the information passed from the XML processor to the
`application. The value "no" indicates that there are or may be such external markup declarations. Note that
`the standalone document declaration only denotes the presence of external declarations; the presence, in a
`document, of references to external entities, when those entities are internally declared, does not change its
`standalone status.
`If there are no external markup declarations, the standalone document declaration has no meaning. If there
`are external markup declarations but there is no standalone document declaration, the value "no" is
`Any XML document for which standalone="no" holds can be converted algorithmically to a
`standalone document, which may be desirable for some network delivery applications.
`Extensible Markup Language (XML)1.0
`VALIDITY CONSTRAINT: Standalone Document Declaration. Th e s tandalone docum e ntde claration
`m us th ave th e value "no"ifany e xte rnalm ark up de clarations contain de clarations of:
`attribute s w ith de faultvalue s , ife le m e nts to w h ich th e s e attribute s apply appe ar in th e docum e ntw ith out
`s pe cifications ofvalue s forth e s e attribute s , or
`e ntitie s (oth e rth an amp, lt, gt, apos, quot), ifre fe re nce s to th os e e ntitie s appe ar in th e docum e nt, or
`attribute s w ith value s s ubje ctto
`, w h e re th e attribute appe ars in th e docum e ntw ith a value
`w h ich w illch ange as a re s ultofnorm aliz ation, or
`e le m e nttype s w ith
`e le m e ntconte nt, ifw h ite s pace occurs dire ctly w ith in any ins tance ofth os e type s .
`An example XML declaration with a standalone document declaration:
`<?xml version="1.0" standalone='yes'?>
`2.10 White Space Handling
`In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines,
`denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such
`white space is typically not intended for inclusion in the delivered version of the docume

