--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/spec/microxml.html Wed Sep 19 16:59:21 2012 +0700
@@ -0,0 +1,589 @@
+ <p>Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and
+ document use rules apply.</p>
+
+ <h2>Abstract</h2>
+
+ <p>MicroXML is a subset of XML intended for use in contexts where full XML is, or is perceived
+ to be, too large and complex. It has been designed to complement rather than replace XML, JSON
+ and HTML. Like XML, it is a general format for making use of markup vocabularies rather than
+ a specific markup vocabulary like HTML. This document provides a complete description
+ of MicroXML.</p>
+
+ <h2>Status of this document</h2>
+
+ <p>This document is being developed by the MicroXML Community Group, which is open to public
+ participation. Comments should be sent to the Community Group mailing list.</p>
+
+ <p>This version of the document is an editor's draft. It is believed to reflect the current
+ consensus in the Community Group, but has not yet been approved by it.</p>
+
+ <p>This document may be distributed freely as long as all text and legal notices remain
+ intact.</p>
+
+ <h2>1 Introduction</h2>
+
+ <p>MicroXML is a Unicode-based textual format for general-purpose structured information
+ interchange. A sequence of characters or bytes in this format is called a <i>MicroXML
+ document</i>. MicroXML is designed to be syntactically compatible with XML; more precisely,
+ any MicroXML document is a well-formed XML document according to XML 1.0 Fifth Edition [XML].</p>
+
+ <p>MicroXML also specifies an abstract data model for MicroXML documents. This is substantially
+ compatible with a subset of the information items and properties of the XML Information Set
+ [INFOSET]. See Appendix B for details.</p>
+
+ <p>A <i>MicroXML parser</i> is a software module that accepts a sequence of bytes or characters
+ as input, determines whether that sequence is a MicroXML document, and, if it is, makes a
+ representation of its abstract data model available to other modules.</p>
+
+ <p>MicroXML is designed to be dramatically simpler than XML, not only in its syntax but also in
+ its data model. Experience with XML has shown that for many applications much of the complexity
+ of XML is unnecessary. Indeed, many specifications that use XML have invented their own ad-hoc
+ subsets of XML (XMPP, SOAP, E4X, Scala). The complexity of XML does not affect just the
+ developers of XML parsers and other tools, but has an ongoing cost to users of XML and
+ developers of XML applications.</p>
+
+ <p>Although JSON has replaced XML in many applications where greater simplicity is desired, JSON
+ is awkward for representing structured documents that include mixed content (content that mixes
+ data characters and element). HTML is very widely used for representing structured documents.
+ However, MicroXML is a fundamentally different kind of format from HTML: MicroXML does
+ not define the semantics of any element or attribute names, whereas HTML does. MicroXML
+ with appropriately chosen element and attribute names can be trivially transformed into
+ valid HTML. Like HTML and XML, MicroXML is designed to support the use of plain text editors
+ for authoring; it therefore preserves some of the conveniences provided by XML for such usage.</p>
+
+ <p>MicroXML has a number of advantages over full XML as a format for network protocols. First, MicroXML
+ does not constrain how parsers recover from errors; in particular, MicroXML does not adopt XML's Draconian
+ error handling requirements. This allows protocols using MicroXML to follow the traditional policy of being
+ liberal in what they accept.. Second, the features of XML that are most problematic from a security perspective
+ have been eliminated from MicroXML: most importantly, MicroXML completely eliminates document type
+ declarations, including entity declarations; MicroXML documents are self-contained, in the sense that the parsing
+ of a MicroXML document never requires access to any external resource.</p>
+
+ <p>This document, together with [RFC 2119] for requirement keywords and [Unicode] for
+ characters, provides all the information necessary to understand MicroXML and construct
+ computer programs to process it.</p>
+
+ <p>The keywords "<small>MUST</small>", "<small>MUST NOT</small>", "<small>REQUIRED</small>",
+ "<small>SHALL</small>", "<small>SHALL NOT</small>", "<small>SHOULD</small>", "<small>SHOULD
+ NOT</small>", "<small>RECOMMENDED</small>", "<small>MAY</small>", and
+ "<small>OPTIONAL</small>" in this document are to be interpreted as described in RFC
+ 2119.</p>
+
+ <h2>2 Data Model</h2>
+
+ <p>The MicroXML abstract data model uses three primitive types:</p>
+
+ <ul>
+ <li>character: this is an atomic type corresponding to an integer in the range 0 to
+ 0x10FFFF, representing a single Unicode code point;</li>
+
+ <li>list: this is a structured type; it is an ordered list containing zero or more members,
+ which can be of any type;</li>
+
+ <li>map: this is a structured type; which associates each of zero or more keys with a value;
+ all the keys in a map are distinct; both the keys and the values of a map can be of any
+ type.</li>
+ </ul>
+
+ <p>A string is not a primitive type: it is just a list of zero or more characters.</p>
+
+ <p>The top-level construct in the MicroXML data model is an <i>element item</i>. An element item
+ is a list with exactly three members:</p>
+
+ <ol>
+ <li>a name item;</li>
+
+ <li>a attributes map: this is a possibly empty map, whose keys are name items and whose
+ values are strings;</li>
+
+ <li>a content list: this is a list with zero or more members; each member is either a character
+ or an element item.</li>
+ </ol>
+
+ <p>A <i>name item</i> is a non-empty string. The first character in the
+ string <small>MUST</small> match the production <i>nameStartChar</i>, and any subsequent
+ characters <small>MUST</small> match the production <i>nameChar</i>. In addition, a name item
+ occurring as a key in an attributes map <small>MUST</small> not be <code>xmlns</code>.</p>
+
+ <p>Any character occurring in the value of an attributes map or as a member of a content list
+ <small>MUST</small> match the production <i>char</i>.</p>
+
+ <h3>2.1 JSON syntax (informative)</h3>
+
+ <p>There are many possible ways of representing the data model in [JSON]. The following
+ is one possible way:</p>
+
+ <ul>
+ <li>an element item is represented as a JSON array;</li>
+ <li>a name item is represented as a JSON string;</li>
+ <li>an attributes map is represented as a JSON object;</li>
+ <li>values in an attribute map are represented as JSON strings;</li>
+ <li>a content list is represented as a JSON array;</li>
+ <li>a sequence of consecutive characters occurring in a content list are combined into a single JSON string.</li>
+ </ul>
+
+ <p>This document will use this syntax to represent the data model in examples.</p>
+
+ <h2>3 Syntax</h2>
+
+ <p>This section specifies the syntax of MicroXML. It also specifies how the syntax is parsed
+ into the abstract data model: for each syntactic form that contributes to the data model, it
+ specifies how the parse result for that form is constructed from the parse results of syntactic
+ subforms.</p>
+
+ <p>The abstract data model for a sequence of characters is constructed in two logical phases:</p>
+
+ <ol>
+ <li>line breaks in the sequence of characters are normalized by translating both the
+ two-character sequence #xD followed by #xA, and any #xD that is not followed by #xA, to a
+ single #xA character;</li>
+
+ <li>the sequence of characters is then parsed as a <i>document</i>, yielding an element
+ item as the parse result.</li>
+ </ol>
+
+ <h3>3.1 Documents</h3>
+
+ <pre>[1] document ::= byteOrderMark? (comment | s)* element (comment | s)*
+[2] byteOrderMark ::= #xFEFF</pre>
+
+ <p>The top-level syntactic form in MicroXML is a <i>document</i>. The parse result of
+ a <i>document</i> is the parse result of its single <i>element</i>.</p>
+
+ <p>This is an example of a small but complete MicroXML document exhibiting all
+ syntactic features:</p>
+
+ <pre><comment lang="en" date="2012-09-11">
+I <em>love</em> &#xB5;<!-- MICRO SIGN -->XML!<br/>
+It's so clean &amp; simple.</comment></pre>
+
+ <p>The abstract data model of this document in the JSON syntax described in Section 2.1 is:</p>
+
+<pre>[ "comment",
+ { "date": "2012-09-11", "lang": "en" },
+ [ "\nI ",
+ ["em", {}, ["love"]],
+ " \u03BCXML!",
+ ["br", {}, []],
+ "\nIt's so clean & simple."
+ ]
+]</pre>
+
+ <h3>3.2 Elements</h3>
+
+ <pre>[3] element ::= startTag content endTag
+ | emptyElementTag
+[4] startTag ::= '<' name attributeList s* '>'
+[5] endTag ::= '</' name s* '>'
+[6] content ::= (element | comment | dataChar | charRef)*
+[7] dataChar ::= char - ('<'|'&'|'>')
+[8] emptyElementTag ::= '<' name attributeList s* '/>'</pre>
+
+ <p>The <i>startTag</i> and <i>endTag</i> of an <i>element</i> <small>MUST</small> have the
+ same <i>name</i>. Note that the syntax prohibits overlapping elements.</p>
+
+ <p>The parse result of an <i>element</i> is an element item. There are two alternative syntaxes
+ for an <i>element</i>. For the general syntax, which uses a <i>startTag</i>
+ and an <i>endTag</i>, the parse result is constructed as follows:</p>
+
+ <ul>
+ <li>the name item comes from the <i>name</i> in the <i>startTag</i>;</li>
+
+ <li>the attribute map comes from the <i>attributeList</i> in the <i>startTag</i>;</li>
+
+ <li>the content list comes from the <i>content</i> between the <i>startTag</i> and the <i>endTag</i>.</li>
+ </ul>
+
+ <p>The parse result of <i>content</i> is a content list, which is constructed by combining in
+ order the parse results of each <i>element</i>, <i>dataChar</i>, and <i>charRef</i> in
+ the <i>content</i>. The parse result of a <i>dataChar</i> is the character itself.</p>
+
+ <p>For example, this element has content that consists of two elements, each of whose content consists of characters:</p>
+
+ <pre><location><city>New York</city><country>US</country><location></pre>
+
+ <p>Its abstract data model in JSON syntax is:</p>
+
+ <pre>["location", {}, [["city", {}, ["New York"]], ["country", {}, ["US"]]]]</pre>
+
+ <p>The other syntax, which consists of just an <i>emptyElementTag</i>, is equivalent
+ to a <i>startTag</i> immediately followed by an <i>endTag</i>. The parse result is
+ constructed as follows:</p>
+
+ <ul>
+ <li>the name item comes from the <i>name</i> in the <i>emptyElementTag</i>;</li>
+
+ <li>the attribute map comes from the <i>attributeList</i> in the <i>emptyElementTag</i>;</li>
+
+ <li>the content list is empty.</li>
+ </ul>
+
+ <p>This is a simple example of an <i>emptyElementTag</i>:</p>
+
+ <pre><page-break/></pre>
+
+ <p>Its abstract data model in JSON syntax is:</p>
+
+ <pre>["page-break", {}, []]</pre>
+
+ <h3>3.3 Attributes</h3>
+
+ <pre>[9] attributeList ::= (s+ attribute)*
+[10] attribute ::= attributeName s* '=' s* attributeValue
+[11] attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
+ | "'" ((attributeValueChar - "'") | charRef)* "'"
+[12] attributeValueChar ::= char - ('<'|'>'|'&')
+[13] attributeName ::= name - 'xmlns'</pre>
+
+ <p>The parse result of an <i>attributeList</i> is an attributes map. For each <i>attribute</i>
+ in the <i>attributeList</i>, there is a key and associated value in the attributes map: the
+ key comes from the <i>attributeName</i> and the value comes from
+ the <i>attributeValue</i>.</p>
+
+ <p>All the <i>attributeName</i>s in an <i>attributeList</i> <small>MUST</small> be distinct.</p>
+
+ <p>The parse results of each <i>attributeValueChar</i> and <i>charRef</i> in
+ the <i>attributeValue</i> are combined in order to construct the parse result of
+ the <i>attributeValue</i>. The parse result of an <i>attributeValueChar</i> is the character
+ itself.</p>
+
+ <p>For example, this is an element with two attributes:</p>
+
+ <pre><location city="New York" country="US"/></pre>
+
+ <p>Its abstract data model in JSON syntax is:</p>
+
+ <pre>["location", { "city": "New York", "country": "US" }, []]</pre>
+
+ <h3>3.4 Comments</h3>
+
+ <pre>[14] comment ::= '<!--' ((char - '-') | ('-' (char - '-')))* '-->'</pre>
+
+ <p>Comments are not part of the MicroXML data model and so have no parse result.</p>
+
+ <p>The syntax prohibits the occurrence of <code>--</code> except as part of the opening or
+ closing delimiter of the <i>comment</i>.</p>
+
+ <p>For example, this is a comment:</p>
+
+ <pre><!-- declarations for <head> & <body> --></pre>
+
+ <p>Note that <code><head></code> and <code><body></code>
+ are not recognized as start-tags.</p>
+
+ <h3>3.5 Character references</h3>
+
+ <pre>[15] charRef ::= numericCharRef | namedCharRef
+[16] numericCharRef ::= '&#x' charNumber ';'
+[17] charNumber ::= [0-9a-fA-F]+
+[18] namedCharRef ::= '&' charName ';'
+[19] charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'</pre>
+
+ <p>MicroXML provides two kinds of character references: named character references provide an easy way
+ to escape characters that MicroXML does not allow to be used literally as data characters; numeric character
+ references provide a way to include arbitrary Unicode characters in MicroXML documents without needing a
+ Unicode-aware text editor.</p>
+
+ <p>The parse result of a <i>charRef</i> is a single character. The code point of the parse
+ result of a <i>numericCharRef</i> is equal to <i>charNumber</i> interpreted as a hexadecimal
+ number; this <small>MUST</small> be the code point of a character that matches the <i>char</i>
+ production. The parse result of a <i>namedCharRef</i> depends on the the <i>charName</i> as
+ follows:</p>
+
+ <ul>
+ <li>for <code>amp</code>, it is & (#x26);</li>
+ <li>for <code>lt</code>, it is < (#x3C);</li>
+ <li>for <code>gt</code>, it is > (#x3E);</li>
+ <li>for <code>quot</code>, it is " (#x22);</li>
+ <li>for <code>apos</code>, it is ' (#x27).</li>
+ </ul>
+
+ <p>For example, this is an element that contains two numeric character references:</p>
+
+ <pre><p>&#x3C;&#x3bb;</p></pre>
+
+ <p>It has the same data model as this element, which uses one named character reference:</p>
+
+ <pre><p>&lt;λ</p></pre>
+
+ <p>The data model in JSON syntax is:</p>
+
+ <pre>["p", {}, "<\u03BB"]</pre>
+
+ <h3>3.6 Names</h3>
+
+ <pre>[20] name ::= nameStartChar nameChar*
+[21] nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
+ | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
+ | [#x3001-#xD7FF] | ([#xF900-#xEFFFF] - nonCharacterCodePoint)
+[22] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]</pre>
+
+ <p>The parse result of a <i>name</i> is a string whose members are the characters in the <i>name</i>.</p>
+
+ <p>Names beginning with a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code> are reserved for
+ standardization by the W3C.</p>
+
+ <p>The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol
+ characters, are excluded from names because they are useful as delimiters in contexts where
+ MicroXML names are used outside MicroXML documents. The character #x037E is excluded because
+ Unicode normalization turns it into a semicolon.</p>
+
+
+ <h3>3.7 Whitespace</h3>
+
+ <pre>[24] s ::= #x9 | #xA | #x20</pre>
+
+ <p>Whitespace is permitted in various places to increase readability. It does not affect the
+ data model. Note that #xD is not included here, because #xD characters are translated to #xA
+ characters during line-break normalization.</p>
+
+ <h3>3.8 Characters</h3>
+
+ <pre>[23] char ::= s | ([#0-#x10FFFF] - forbiddenCodePoint)
+
+[25] forbiddenCodePoint ::= controlCodePoint | surrogateCodePoint | nonCharacterCodePoint
+[26] controlCodePoint ::= [#x0-#1F] | [#7F-#9F]
+[27] surrogateCodePoint ::= [#xD800-#xDFFF]
+[28] nonCharacterCodePoint ::= [#xFDD0-#xFDEF] | [#xFFFE-#xFFFF] | [#x1FFFE-#x1FFFF]
+ | [#x2FFFE-#x2FFFF] | [#x3FFFE-#x3FFFF] | [#x4FFFE-#x4FFFF]
+ | [#x5FFFE-#x5FFFF] | [#x6FFFE-#x6FFFF] | [#x7FFFE-#x7FFFF]
+ | [#x8FFFE-#x8FFFF] | [#x9FFFE-#x9FFFF] | [#xAFFFE-#xAFFFF]
+ | [#xBFFFE-#xBFFFF] | [#xCFFFE-#xCFFFF] | [#xDFFFE-#xDFFFF]
+ | [#xEFFFE-#xEFFFF] | [#xFFFFE-#xFFFFF] | [#x10FFFE-#x10FFFF]</pre>
+
+
+ <p>MicroXML prohibits three kinds of code points from occurring literally in MicroXML documents:</p>
+
+ <ul>
+ <li>control characters, other than those that MicroXML allows as whitespace;</li>
+ <li>noncharacters (see Section 16.7 of [Unicode]);</li>
+ <li>surrogates (see Section 16.6 of [Unicode]).</li>
+ </ul>
+
+ <h2>4 Conformance</h2>
+
+ <h3>4.1 Document conformance</h3>
+
+ <p>A sequence of characters is a conforming MicroXML document if, after line normalization, it
+ matches the production <i>document</i>, and meets the further constraints found in the text of
+ this document marked with the keywords <small>MUST</small>
+ or <small>REQUIRED</small>.</p>
+
+ <p>A sequence of bytes is a conforming MicroXML document if it is the UTF-8 [Unicode] encoding
+ of a sequence of characters that is a conforming XML document.</p>
+
+ <p>[Unicode] says that canonically equivalent sequences of characters ought to be treated as
+ identical. However, documents that are canonically equivalent according to Unicode but that
+ use distinct code point sequences are considered distinct by MicroXML parsers. This gives
+ rise to the possibility that the user might unintentionally create sequences of characters
+ that are canonically equivalent but are treated as distinct by MicroXML parsers. To avoid
+ this possibility, all documents <small>SHOULD</small> be in Normalization Form C as described by
+ [Unicode].</p>
+
+ <h3>4.2 Parser conformance</h3>
+
+ <p>For any sequence of bytes, a conforming MicroXML parser <small>MUST</small> be able to report correctly
+ whether it is a conforming MicroXML document. If it is a conforming MicroXML document, then a
+ conforming MicroXML parser <small>MUST</small> be able to report the correct abstract data model for the document.</p>
+
+ <p>In some contexts, it may be appropriate for a conforming MicroXML parser to operate on
+ sequences of characters rather than sequences of bytes. In this case, the conformance
+ requirement of the preceding paragraph applies to sequences of characters rather than bytes.</p>
+
+ <p>A conforming MicroXML parser is free to use any data structure to represent the abstract data
+ model, provided that the data structure provides the same information as the abstract data
+ model.</p>
+
+ <p>A MicroXML parser <small>MAY</small> perform error correction, by providing an abstract data
+ model even for sequences of bytes that are not conforming MicroXML
+ documents. It <small>MUST</small>, however, still comply with the requirement of the first
+ paragraph to report that the sequence of bytes is not a conforming MicroXML document.</p>
+
+ <h2>5 Security considerations</h2>
+
+ <p>MicroXML does not provide any built-in service for integrity. Integrity services have been
+ defined for XML using XML Canonicalization [RFC3076] and XML Signatures [RFC3275]. These
+ can be applied to MicroXML, with a number of caveats. First, the XPath data model that is input
+ into XML Canonicalization <small>MUST</small> be constructed using a conforming MicroXML
+ parser rather than an XML parser. This is because an XML parser will normalize some attribute values
+ in a way that is incompatible with MicroXML, as detailed in Appendix B.2. Second, canonicalization
+ <small>MUST NOT</small> use the option to include comments, since the MicroXML data model
+ does not preserve comments. Third, although the canonicalization of a MicroXML document will be
+ a well-formed XML document, it will not always be a conforming MicroXML document, because the
+ definition of XML canonicalization does not escape <code>></code> characters in attribute values; this will not
+ usually be a problem because the output of canonicalization is typically used only as input to a
+ digest algorithm.</p>
+
+ <p>MicroXML documents are encoded in UTF-8; the security considerations described in
+ [RFC3629] are therefore applicable to MicroXML.</p>
+
+ <h2>6 Notation</h2>
+
+ <p>The formal grammar of MicroXML is given in this document using a simple Extended
+ Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form
+ <code>symbol ::= expression</code>.</p>
+
+ <p>Within the expression on the right-hand side of a rule, the following expressions are used to
+ match strings of one or more characters:</p>
+
+ <dl>
+ <dt><code>#xN</code></dt>
+
+ <dd>where <code>N</code> is a hexadecimal integer, the expression matches the character in
+ Unicode whose code point has the value indicated.</dd>
+
+ <dt><code>[a-zA-Z], [#xN-#xN]</code></dt>
+
+ <dd>matches any character with a value in the range(s) indicated (inclusive).</dd>
+
+ <dt><code>[abc], [#xN#xN#xN]</code></dt>
+
+ <dd>matches any character with a value among the characters enumerated. Enumerations and
+ ranges can be mixed in one set of brackets.</dd>
+
+ <dt><code>"string"</code></dt>
+
+ <dd>matches a literal string matching the one given inside the double quotes.</dd>
+
+ <dt><code>'string'</code></dt>
+
+ <dd>matches a literal string matching the one given inside the single quotes.</dd>
+ </dl>
+
+ <p>These symbols can be combined to match more complex patterns as follows, where <code>A</code>
+ and <code>B</code> represent expressions:</p>
+
+ <dl>
+ <dt><code>(A)</code></dt>
+
+ <dd>expression is treated as a unit and can be combined as described in this list.</dd>
+
+ <dt><code>A?</code></dt>
+
+ <dd>matches <code>A</code> or nothing; <small>OPTIONAL</small>
+ <code>A</code>.</dd>
+
+ <dt><code>A B</code></dt>
+
+ <dd>matches <code>A</code> followed by <code>B</code>. This operator has higher precedence
+ than alternation; thus <code>A B | C D</code> is identical to <code>(A B) | (C
+ D)</code>.</dd>
+
+ <dt><code>A | B</code></dt>
+
+ <dd>matches <code>A</code> or <code>B</code>.</dd>
+
+ <dt><code>A - B</code></dt>
+
+ <dd>matches any string that matches <code>A</code> but does not match <code>B</code>.</dd>
+
+ <dt><code>A+</code></dt>
+
+ <dd>matches one or more occurrences of <code>A</code>. This operation has higher precedence
+ than alternation; thus <code>A+ | B+</code> is identical to <code>(A+) | (B+)</code>.</dd>
+
+ <dt><code>A*</code></dt>
+
+ <dd>matches zero or more occurrences of A. This operation has higher precedence than
+ alternation; thus <code>A* | B*</code> is identical to <code>(A*) | (B*)</code>.</dd>
+ </dl>
+
+ <h2>Appendix A: References</h2>
+
+ <p>While these references cite a particular edition of a specification, conforming
+ implementations of MicroXML <small>MAY</small> support later editions either in addition or as
+ replacements, thus allowing MicroXML users to benefit from corrections and extensions to the
+ other specifications on which it depends.</p>
+
+ <h3>A.1 Normative References</h3>
+
+ <dl>
+ <dt>RFC 2119</dt>
+ <dd>IETF (Internet Engineering Task Force). RFC 2119: Key words for use in RFCs to Indicate
+ Requirement Levels. Scott Bradner, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)</dd>
+
+ <dt>RFC3629</dt>
+ <dd>IETF (Internet Engineering Task Force). RFC3629: UTF-8, a transformation format of ISO 10646.
+ F. Yergeau, 2003. (See http://www.ietf.org/rfc/rfc3629.txt.)</dd>
+
+ <dt>Unicode</dt>
+ <dd>The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The
+ Unicode Consortium, 2011. ISBN 978-1-936213-01-6)</dd>
+ </dl>
+
+ <h3>A.2 Informative References</h3>
+
+ <dl>
+ <dt>JSON</dt>
+ <dd>IETF (Internet Engineering Task Force). RFC 4627: The application/json Media Type for
+ JavaScript Object Notation (JSON). D. Crockford, 2006. (See
+ http://www.ietf.org/rfc/rfc4627.txt.)</dd>
+
+ <dt>INFOSET</dt>
+ <dd>W3C (World Wide Web Consortium). XML Information Set. John Cowan and Richard Tobin. (See
+ http://www.w3.org/TR/xml-infoset.)</dd>
+
+ <dt>RFC3076</dt>
+ <dd>IETF (Internet Engineering Task Force). RFC 3076: Canonical XML Version 1.0.
+ J. Boyer, 2001. (See http://www.ietf.org/rfc/rfc3076.txt.)</dd>
+
+ <dt>RFC3275</dt>
+ <dd>IETF (Internet Engineering Task Force). RFC 3275: (Extensible Markup
+ Language) XML-Signature Syntax and Processing. D. Eastlake, J. Reagle, and D. Solo, 2002.
+ (See http://www.ietf.org/rfc/rfc3275.txt.)</dd>
+
+ <dt>XML</dt>
+ <dd>W3C (World Wide Web Consortium). Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim
+ Bray et al., 2008. (See http://www.w3.org/TR/xml/.)</dd>
+ </dl>
+
+ <h2>Appendix B: Relationship to XML (informative)</h2>
+
+ <h3>B.1 Syntax</h3>
+
+ <p>Relative to XML 1.0 Fifth Edition, MicroXML prohibits:</p>
+
+ <ul>
+ <li>the XML declaration;</li>
+ <li>the document type declaration, and so entities other than the five built-in entities;</li>
+ <li>processing instructions;</li>
+ <li>CDATA sections;</li>
+ <li>colons in element and attribute names;</li>
+ <li>attributes named <code>xmlns</code>;</li>
+ <li>literal > characters in content or attribute values (XML requires > to be quoted only
+ in content and only when preceeded by <code>]]</code>);</li>
+ <li>Unicode noncharacters (XML only prohibits #xFFFE and #xFFFF);</li>
+ <li>Unicode C1 control characters;</li>
+ <li>numeric character references to #xD;</li>
+ <li>decimal character references;</li>
+ <li>encodings other than UTF-8.</li>
+ </ul>
+
+ <p>MicroXML parsers are not required to use draconian error handling.</p>
+
+ <h3>B.2 Data model</h3>
+
+ <p>The MicroXML data model corresponds to the following information items and properties from the
+ XML information set:</p>
+
+ <ul>
+ <li>the element information item with the local name, attributes and children properties;</li>
+ <li>the attribute information item with the local name and normalized value properties;</li>
+ <li>the character information item with the character code property.</li>
+ </ul>
+
+ <p>MicroXML's data model is incompatible with XML in one respect: XML requires that literal
+ newlines and tabs in attribute values are normalized into spaces, but MicroXML leaves them
+ unchanged. For example, in XML
+
+<pre><doc att="hello
+world"/></pre>
+
+ <p>and</p>
+
+<pre><doc att="hello world"/></pre>
+
+ <p>have the same information set, but in MicroXML they do not. Note that this incompatibility cannot in
+ general be fixed by postprocessing, since XML does not normalize newlines and tabs in attribute values
+ that were entered as numeric character references, and the MicroXML data model does not provide
+ information about which characters were entered as numeric character references.</p>