`W3C Recommendation 16 August 2006, edited in place 29 September 2006
`
`This version:
`http://www.w3.org/TR/2006/REC-xml11-20060816
`Latest version:
`http://www.w3.org/TR/xml11
`Previous version:
`http://www.w3.org/TR/2006/PER-xml11-20060614
`Editors:
`Tim Bray, Textuality and Netscape <tbray@textuality.com>
`Jean Paoli, Microsoft <jeanpa@microsoft.com>
`C. M. Sperberg-McQueen, W3C <cmsmcq@w3.org>
`Eve Maler, Sun Microsystems, Inc. <eve.maler@east.sun.com>
`François Yergeau
`John Cowan <cowan@ccil.org>
`Please refer to the errata for this document, which may include some normative corrections.
`
`The previous errata for this document, are also available.
`See also translations.
`
`This document is also available in these non-normative formats: XML and XHTML with color-coded revision
`indicators.
`
`Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
`
`Abstract
`
`The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document.
`Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now
`possible with HTML. XML has been designed for ease of implementation and for interoperability with both
`SGML and HTML.
`Status of this Document
`
`This section describes the status of this document at the time of its publication. Other documents may
`supersede this document. A list of current W3C publications and the latest revision of this technical report can
`be found in the W3C technical reports index at http://www.w3.org/TR/.
`
`This document specifies a syntax created by subsetting an existing, widely used international text processing
`standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on
`the World Wide Web. It is a product of the XML Core Working Group as part of the XML Activity.
`
`On 29 September 2006 this document was edited in place to remove a number of spurious and potentially
`misleading spaces.
`
`The English version of this specification is the only normative version. However, for translations of this
`document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xml11.
`
`1
`
`SAMSUNG 1030
`
`
`
`This document is a W3C Recommendation. This second edition is not a new version of XML. As a
`convenience to readers, it incorporates the changes dictated by the accumulated errata (available at
`http://www.w3.org/XML/xml-V11-1e-errata) to the First Edition of XML 1.1, dated 4 February 2004. In addition,
`the markup introduced to clarify when prescriptive keywords are used in the formal sense defined in [IETF
`RFC 2119], has been modified to better match the intent of [IETF RFC 2119]. This edition supersedes the
`previous W3C Recommendation of 4 February 2004.
`
`Please report errors in this document to the public xml-editor@w3.org mailing list; archives are available. For
`the convenience of readers, an XHTML version with color-coded revision indicators is also provided; this
`version highlights each change due to an erratum published in the errata list, together with a link to the
`particular erratum in that list. Most of the errata in the list provide a rationale for the change. The errata list for
`this second edition is available at http://www.w3.org/XML/xml-V11-2e-errata.
`
`An implementation report is available at http://www.w3.org/XML/2006/06/xml11-2e-implementation.html. A
`Test Suite is maintained to help assessing conformance to this specification.
`
`This document has been reviewed by W3C Members, by software developers, and by other W3C groups and
`interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and
`may be used as reference material or cited from another document. W3C's role in making the
`Recommendation is to draw attention to the specification and to promote its widespread deployment. This
`enhances the functionality and interoperability of the Web.
`
`This document is governed by the 24 January 2002 CPP as amended by the W3C Patent Policy Transition
`Procedure. W3C maintains a public list of any patent disclosures made in connection with the deliverables of
`the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge
`of a patent which the individual believes contains Essential Claim(s) must disclose the information in
`accordance with section 6 of the W3C Patent Policy.
`Table of Contents
`
`1 Introduction
` 1.1 Origin and Goals
` 1.2 Terminology
` 1.3 Rationale and list of changes for XML 1.1
`2 Documents
` 2.1 Well-Formed XML Documents
` 2.2 Characters
` 2.3 Common Syntactic Constructs
` 2.4 Character Data and Markup
` 2.5 Comments
` 2.6 Processing Instructions
` 2.7 CDATA Sections
` 2.8 Prolog and Document Type Declaration
` 2.9 Standalone Document Declaration
` 2.10 White Space Handling
` 2.11 End-of-Line Handling
` 2.12 Language Identification
` 2.13 Normalization Checking
`3 Logical Structures
` 3.1 Start-Tags, End-Tags, and Empty-Element Tags
` 3.2 Element Type Declarations
` 3.2.1 Element Content
` 3.2.2 Mixed Content
` 3.3 Attribute-List Declarations
` 3.3.1 Attribute Types
` 3.3.2 Attribute Defaults
` 3.3.3 Attribute-Value Normalization
` 3.4 Conditional Sections
`4 Physical Structures
` 4.1 Character and Entity References
` 4.2 Entity Declarations
` 4.2.1 Internal Entities
` 4.2.2 External Entities
`
`2
`
`
`
` 4.3 Parsed Entities
` 4.3.1 The Text Declaration
` 4.3.2 Well-Formed Parsed Entities
` 4.3.3 Character Encoding in Entities
` 4.3.4 Version Information in Entities
` 4.4 XML Processor Treatment of Entities and References
` 4.4.1 Not Recognized
` 4.4.2 Included
` 4.4.3 Included If Validating
` 4.4.4 Forbidden
` 4.4.5 Included in Literal
` 4.4.6 Notify
` 4.4.7 Bypassed
` 4.4.8 Included as PE
` 4.4.9 Error
` 4.5 Construction of Entity Replacement Text
` 4.6 Predefined Entities
` 4.7 Notation Declarations
` 4.8 Document Entity
`5 Conformance
` 5.1 Validating and Non-Validating Processors
` 5.2 Using XML Processors
`6 Notation
`
`Appendices
`
`A References
` A.1 Normative References
` A.2 Other References
`B Definitions for Character Normalization
`C Expansion of Entity and Character References (Non-Normative)
`D Deterministic Content Models (Non-Normative)
`E Autodetection of Character Encodings (Non-Normative)
` E.1 Detection Without External Encoding Information
` E.2 Priorities in the Presence of External Encoding Information
`F W3C XML Working Group (Non-Normative)
`G W3C XML Core Working Group (Non-Normative)
`H Production Notes (Non-Normative)
`I Suggestions for XML Names (Non-Normative)
`
`1 Introduction
`
`Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and
`partially describes the behavior of computer programs which process them. XML is an application profile or
`restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML
`documents are conforming SGML documents.
`
`XML documents are made up of storage units called entities, which contain either parsed or unparsed data.
`Parsed data is made up of characters, some of which form character data, and some of which form markup.
`Markup encodes a description of the document's storage layout and logical structure. XML provides a
`mechanism to impose constraints on the storage layout and logical structure.
`[Definition: A software module called an XML processor is used to read XML documents and provide access
`to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of
`another module, called the application.] This specification describes the required behavior of an XML
`processor in terms of how it must read XML data and the information it must provide to the application.
`
`1.1 Origin and Goals
`
`XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board)
`formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak
`
`3
`
`
`
`of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the
`SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in
`an appendix. Dan Connolly served as the Working Group's contact with the W3C.
`
`The design goals for XML are:
`
`1. XML shall be straightforwardly usable over the Internet.
`
`2. XML shall support a wide variety of applications.
`
`3. XML shall be compatible with SGML.
`
`4. It shall be easy to write programs which process XML documents.
`
`5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
`
`6. XML documents should be human-legible and reasonably clear.
`
`7. The XML design should be prepared quickly.
`
`8. The design of XML shall be formal and concise.
`
`9. XML documents shall be easy to create.
`
`10. Terseness in XML markup is of minimal importance.
`
`This specification, together with associated standards (Unicode [Unicode] and ISO/IEC 10646 [ISO/IEC
`10646] for characters, Internet RFC 3066 [IETF RFC 3066] for language identification tags, ISO 639 [ISO
`639] for language name codes, and ISO 3166 [ISO 3166] for country name codes), provides all the
`information necessary to understand XML Version 1.1 and construct computer programs to process it.
`
`This version of the XML specification may be distributed freely, as long as all text and legal notices remain
`intact.
`
`1.2 Terminology
`
`The terminology used to describe XML documents is defined in the body of this specification. The key words
` , , , , , , , , , and ,
`when , are to be interpreted as described in [IETF RFC 2119]. In addition, the terms defined in
`the following list are used in building those definitions and in describing the actions of an XML processor:
`error
`
`[Definition: A violation of the rules of this specification; results are undefined. Unless otherwise specified,
`failure to observe a prescription of this specification indicated by one of the keywords , ,
` , and is an error. Conforming software detect and report an error and
` recover from it.]
`fatal error
`
`[Definition: An error which a conforming XML processor detect and report to the application. After
`encountering a fatal error, the processor continue processing the data to search for further errors
`and report such errors to the application. In order to support correction of errors, the processor
`make unprocessed data from the document (with intermingled character data and markup) available to
`the application. Once a fatal error is detected, however, the processor continue normal
`processing (i.e., it continue to pass character data and information about the document's
`logical structure to the application in the normal way).]
`at user option
`
`[Definition: Conforming software or (depending on the modal verb in the sentence) behave as
`described; if it does, it provide users a means to enable or disable the behavior described.]
`validity constraint
`
`4
`
`
`
`[Definition: A rule which applies to all valid XML documents. Violations of validity constraints are errors;
`they , at user option, be reported by validating XML processors.]
`well-formedness constraint
`
`[Definition: A rule which applies to all well-formed XML documents. Violations of well-formedness
`constraints are fatal errors.]
`match
`
`[Definition: (Of strings or names:) Two strings or names being compared are identical. Characters with
`multiple possible representations in Unicode (e.g. characters with both precomposed and base+diacritic
`forms) match only if they have the same representation in both strings. No case folding is performed.
`(Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the
`language generated by that production. (Of content and content models:) An element matches its
`declaration when it conforms in the fashion described in the constraint [VC: Element Valid].]
`for compatibility
`
`[Definition: Marks a sentence describing a feature of XML included solely to ensure that XML remains
`compatible with SGML.]
`for interoperability
`
`[Definition: Marks a sentence describing a non-binding recommendation included to increase the
`chances that XML documents can be processed by the existing installed base of SGML processors
`which predate the WebSGML Adaptations Annex to ISO 8879.]
`
`1.3 Rationale and list of changes for XML 1.1
`
`The W3C's XML 1.0 Recommendation was first issued in 1998, and despite the issuance of many errata
`culminating in a Third Edition of 2004, has remained (by intention) unchanged with respect to what is well-
`formed XML and what is not. This stability has been extremely useful for interoperability. However, the
`Unicode Standard on which XML 1.0 relies for character specifications has not remained static, evolving from
`version 2.0 to version 4.0 and beyond. Characters not present in Unicode 2.0 may already be used in XML
`1.0 character data. However, they are not allowed in XML names such as element type names, attribute
`names, enumerated attribute values, processing instruction targets, and so on. In addition, some characters
`that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode
`2.0.
`
`The overall philosophy of names has changed since XML 1.0. Whereas XML 1.0 provided a rigid definition of
`names, wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that
`everything that is not forbidden (for a specific reason) is permitted. Since Unicode will continue to grow past
`version 4.0, further changes to XML can be avoided by allowing almost any character, including those not yet
`assigned, in names.
`
`In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but
`discriminates against the conventions used on IBM and IBM-compatible mainframes. As a result, XML
`documents on mainframes are not plain text files according to the local conventions. XML 1.0 documents
`generated on mainframes must either violate the local line-end conventions, or employ otherwise
`unnecessary translation phases before parsing and after generation. Allowing straightforward interoperability
`is particularly important when data stores are shared between mainframe and non-mainframe systems (as
`opposed to being copied from one to the other). Therefore XML 1.1 adds NEL (#x85) to the list of line-end
`characters. For completeness, the Unicode line separator character, #x2028, is also supported.
`
`Finally, there is considerable demand to define a standard representation of arbitrary Unicode characters in
`XML documents. Therefore, XML 1.1 allows the use of character references to the control characters #x1
`through #x1F, most of which are forbidden in XML 1.0. For reasons of robustness, however, these characters
`still cannot be used directly in documents. In order to improve the robustness of character encoding detection,
`the additional control characters #x7F through #x9F, which were freely allowed in XML 1.0 documents, now
`must also appear only as character references. (Whitespace characters are of course exempt.) The minor
`sacrifice of backward compatibility is considered not significant. Due to potential problems with APIs, #x0 is
`still forbidden both directly and as a character reference.
`
`5
`
`
`
`Finally, XML 1.1 defines a set of constraints called "full normalization" on XML documents, which document
`creators adhere to, and document processors verify. Using fully normalized documents
`ensures that identity comparisons of names, attribute values, and character content can be made correctly by
`simple binary comparison of Unicode strings.
`
`A new XML version, rather than a set of errata to XML 1.0, is being created because the changes affect the
`definition of well-formed documents. XML 1.0 processors must continue to reject documents that contain new
`characters in XML names, new line-end conventions, and references to control characters. The distinction
`between XML 1.0 and XML 1.1 documents is indicated by the version number information in the XML
`declaration at the start of each document.
`2 Documents
`
`[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition,
`the XML document is valid if it meets certain further constraints.]
`
`Each XML document has both a logical and a physical structure. Physically, the document is composed of
`units called entities. An entity may refer to other entities to cause their inclusion in the document. A document
`begins in a "root" or document entity. Logically, the document is composed of declarations, elements,
`comments, character references, and processing instructions, all of which are indicated in the document by
`explicit markup. The logical and physical structures nest properly, as described in 4.3.2 Well-Formed
`Parsed Entities.
`
`2.1 Well-Formed XML Documents
`
`[Definition: A textual object is a well-formed XML document if:]
`
`1. Taken as a whole, it matches the production labeled document.
`
`2. It meets all the well-formedness constraints given in this specification.
`
`3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.
`
`Document
`
`[1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
`
`Matching the document production implies that:
`
`1. It contains one or more elements.
`2. [Definition: There is exactly one element, called the root, or document element, no part of which
`appears in the content of any other element.] For all other elements, if the start-tag is in the content of
`another element, the end-tag is in the content of the same element. More simply stated, the elements,
`delimited by start- and end-tags, nest properly within each other.
`
`[Definition: As a consequence of this, for each non-root element C in the document, there is one other element
`P in the document such that C is in the content of P, but is not in the content of any other element that is in the
`content of P. P is referred to as the parent of C, and C as a child of P.]
`2.2 Characters
`
`[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character
`data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646]. Legal
`characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The
`versions of these standards cited in A.1 Normative References were current at the time this document was
`prepared. New characters may be added to these standards by amendments or new editions. Consequently,
`XML processors accept any character in the range specified for Char.]
`
`Character Range
`
`6
`
`
`
`[2] Char
`
` ::= [#x1-#xD7FF] | [#xE000-#xFFFD] |
`[#x10000-#x10FFFF]
`[2a] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] |
`[#x7F-#x84] | [#x86-#x9F]
`
`/* any Unicode character, excluding the
`surrogate blocks, FFFE, and FFFF. */
`
`The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML
`processors accept the UTF-8 and UTF-16 encodings of Unicode [Unicode]; the mechanisms for
`signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3
`Character Encoding in Entities.
`Note:
`
`Document authors are encouraged to avoid "compatibility characters", as defined in Unicode [Unicode].
`The characters defined in the following ranges are also discouraged. They are either control characters
`or permanently undefined Unicode characters:
`
`[#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
`[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
`[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
`[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
`[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
`[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
`[#x10FFFE-#x10FFFF].
`
`2.3 Common Syntactic Constructs
`
`This section defines some symbols used widely in the grammar.
`
`S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs.
`
`White Space
`
`[3] S ::= (#x20 | #x9 | #xD | #xA)+
`
`Note:
`
`The presence of #xD in the above production is maintained purely for backward compatibility with the
`First Edition. As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML
`document are either removed or replaced by #xA characters before any other processing is done. The
`only way to get a #xD character to match this production is to use a character reference in an entity value
`literal.
`[Definition: A Name is a token beginning with a letter or one of a few punctuation characters, and continuing
`with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters.] Names
`beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are
`reserved for standardization in this or future versions of this specification.
`Note:
`
`The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon
`characters. Therefore, authors should not use the colon in XML names except for namespace purposes,
`but XML processors must accept the colon as a name character.
`
`An Nmtoken (name token) is any mixture of name characters.
`
`The first character of a Name be a NameStartChar, and any other characters be NameChars; this
`mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining
`characters. Almost all characters are permitted in names, except those which either are or reasonably could
`be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet
`encoded in Unicode can be used in XML names. See I Suggestions for XML Names for suggestions on the
`creation of names.
`
`7
`
`
`
`Document authors are encouraged to use names which are meaningful words or combinations of words in
`natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-
`MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.
`
`The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are
`excluded from names because they are more useful as delimiters in contexts where XML names are used
`outside XML documents; providing this group gives those contexts hard guarantees about what cannot be
`part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when
`normalized it becomes a semicolon, which could change the meaning of entity references.
`
`Names and Tokens
`
`[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-
`#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
`[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
` ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
` ::= NameStartChar (NameChar)*
` ::= Name (#x20 Name)*
` ::= (NameChar)+
` ::= Nmtoken (#x20 Nmtoken)*
`
`[4a] NameChar
`[5] Name
`[6] Names
`[7] Nmtoken
`[8] Nmtokens
`
`Note:
`
`The Names and Nmtokens productions are used to define the validity of tokenized attribute values after
`normalization (see 3.3.1 Attribute Types).
`
`Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals
`are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and
`external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup.
`
`Literals
`
`[10] AttValue
`
`[9] EntityValue
`
` ::= '"' ([^%&"] | PEReference | Reference)* '"'
`| "'" ([^%&'] | PEReference | Reference)* "'"
` ::= '"' ([^<&"] | Reference)* '"'
`| "'" ([^<&'] | Reference)* "'"
`[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
`[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
`[13] PubidChar
` ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
`
`Note:
`
`Although the EntityValue production allows the definition of a general entity consisting of a single explicit
`< in the literal (e.g., <!ENTITY mylt "<">), it is strongly advised to avoid this practice since any reference to
`that entity will cause a well-formedness error.
`
`2.4 Character Data and Markup
`
`Text consists of intermingled character data and markup. [Definition: Markup takes the form of start-tags,
`end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters,
`document type declarations, processing instructions, XML declarations, text declarations, and any white
`space that is at the top level of the document entity (that is, outside the document element and not inside any
`other markup).]
`[Definition: All text that is not markup constitutes the character data of the document.]
`
`The ampersand character (&) and the left angle bracket (<) appear in their literal form, except
`when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they
`are needed elsewhere, they be escaped using either numeric character references or the strings "&"
`
`8
`
`
`
`and "<" respectively. The right angle bracket (>) may be represented using the string ">", and , for
`compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in
`content, when that string is not marking the end of a CDATA section.
`
`In the content of elements, character data is any string of characters which does not contain the start-
`delimiter of any markup or the CDATA-section-close delimiter, "]]>". In a CDATA section, character data is any
`string of characters not including the CDATA-section-close delimiter.
`
`To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character
`(') may be represented as "'", and the double-quote character (") as """.
`
`Character Data
`
`[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
`
`2.5 Comments
`
`[Definition: Comments may appear anywhere in a document outside other markup; in addition, they may
`appear within the document type declaration at places allowed by the grammar. They are not part of the
`document's character data; an XML processor , but need not, make it possible for an application to
`retrieve the text of comments. For compatibility, the string "--" (double-hyphen) occur within
`comments.] Parameter entity references be recognized within comments.
`
`Comments
`
`[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
`
`An example of a comment:
`
`<!-- declarations for <head> & <body> -->
`
`Note that the grammar does not allow a comment ending in --->. The following example is not well-formed.
`
`<!-- B+, B, or B--->
`
`2.6 Processing Instructions
`
`[Definition: Processing instructions (PIs) allow documents to contain instructions for applications.]
`
`Processing Instructions
`
` ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
`[16] PI
`[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
`
`PIs are not part of the document's character data, but be passed through to the application. The PI
`begins with a target (PITarget) used to identify the application to which the instruction is directed. The target
`names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification.
`The XML Notation mechanism may be used for formal declaration of PI targets. Parameter entity references
` be recognized within processing instructions.
`
`2.7 CDATA Sections
`
`[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks
`of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the
`string "<![CDATA[" and end with the string "]]>":]
`
`CDATA Sections
`
`9
`
`
`
`[18] CDSect ::= CDStart CData CDEnd
`[19] CDStart ::= '<![CDATA['
`[20] CData
` ::= (Char* - (Char* ']]>' Char*))
`[21] CDEnd
` ::= ']]>'
`
`Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and
`ampersands may occur in their literal form; they need not (and cannot) be escaped using "<" and "&".
`CDATA sections cannot nest.
`
`An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not
`markup:
`
`<![CDATA[<greeting>Hello, world!</greeting>]]>
`
`2.8 Prolog and Document Type Declaration
`
`[Definition: XML 1.1 documents begin with an XML declaration which specifies the version of XML
`being used.] For example, the following is a complete XML 1.1 document, well-formed but not valid:
`
`<?xml version="1.1"?>
`<greeting>Hello, world!</greeting>
`
`but the following is an XML 1.0 document because it does not have an XML declaration:
`
`<greeting>Hello, world!</greeting>
`
`The function of the markup in an XML document is to describe its storage and logical structure and to
`associate attribute name-value pairs with its logical structures. XML provides a mechanism, the document
`type declaration, to define constraints on the logical structure and to support the use of predefined storage
`units. [Definition: An XML document is valid if it has an associated document type declaration and if the
`document complies with the constraints expressed in it.]
`
`The document type declaration appear before the first element in the document.
`
`Prolog
`
` ::= XMLDecl Misc* (doctypedecl Misc*)?
`[22] prolog
` ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
`[23] XMLDecl
`[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
`[25] Eq
` ::= S? '=' S?
`[26] VersionNum ::= '1.1'
`[27] Misc
` ::= Comment | PI | S
`
`[Definition: The XML document type declaration contains or points to markup declarations that provide a
`grammar for a class of documents. This grammar is known as a document type definition, or DTD. The
`document type declaration can point to an external subset (a special kind of external entity) containing
`markup declarations, or can contain the markup declarations directly in an internal subset, or can do both.
`The DTD for a document consists of both subsets taken together.]
`[Definition: A markup declaration is an element type declaration, an attribute-list declaration, an entity
`declaration, or a notation declaration.] These declarations may be contained in whole or in part within
`parameter entities, as described in the well-formedness and validity constraints below. For further information,
`see 4 Physical Structures.
`
`Document Type Definition
`
`10
`
`
`
`[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('['
`intSubset ']' S?)? '>'
`
`[28a] DeclSep
`
` ::= PEReference | S
`
` ::= (markupdecl | DeclSep)*
`[28b] intSubset
`[29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl |
`NotationDecl | PI | Comment
`
`[VC: Root Element Type]
`
`[WFC: External Subset]
`[WFC: PE Between
`Declarations]
`
`[VC: Proper Declaration/PE
`Nesting]
`[WFC: PEs in Internal Subset]
`
`Note that it is possible to construct a well-formed document containing a doctypedecl that neither points to an
`external subset nor contains an internal subset.
`
`The markup declarations may be made up in whole or in part of the replacement text of parameter entities.
`The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on)
`describe the declarations after all the parameter entities have been included.
`
`Parameter entity references are recognized anywhere in the DTD (internal and external subsets and external
`parameter entities), except in literals, processing instructions, comments, and the contents of ignored
`conditional sections (see 3.4 Conditional Sections). They are also recognized in entity value literals. The
`use of parameter entities in the internal subset is restricted as described below.
`Validity constraint: Root Element Type
`
`The Name in the document type declaration match the element type of the root element.
`Validity constraint: Proper Declaration/PE Nesting
`
`Parameter-entity replacement text be properly nested with markup declarations.