microxml: changeset 0:85c5e402c30d

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/spec/microxml.html	Wed Sep 19 16:59:21 2012 +0700
@@ -0,0 +1,589 @@
+    <p>Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and
+      document use rules apply.</p>
+
+    <h2>Abstract</h2>
+
+    <p>MicroXML is a subset of XML intended for use in contexts where full XML is, or is perceived
+      to be, too large and complex. It has been designed to complement rather than replace XML, JSON
+      and HTML.  Like XML, it is a general format for making use of markup vocabularies rather than
+      a specific markup vocabulary like HTML.  This document provides a complete description
+      of MicroXML.</p>
+
+    <h2>Status of this document</h2>
+
+    <p>This document is being developed by the MicroXML Community Group, which is open to public
+     participation.  Comments should be sent to the Community Group mailing list.</p>
+
+    <p>This version of the document is an editor's draft.  It is believed to reflect the current
+    consensus in the Community Group, but has not yet been approved by it.</p>
+
+    <p>This document may be distributed freely as long as all text and legal notices remain
+    intact.</p>
+
+    <h2>1 Introduction</h2>
+
+    <p>MicroXML is a Unicode-based textual format for general-purpose structured information
+    interchange. A sequence of characters or bytes in this format is called a <i>MicroXML
+    document</i>. MicroXML is designed to be syntactically compatible with XML; more precisely,
+    any MicroXML document is a well-formed XML document according to XML 1.0 Fifth Edition [XML].</p>
+
+    <p>MicroXML also specifies an abstract data model for MicroXML documents. This is substantially
+    compatible with a subset of the information items and properties of the XML Information Set
+    [INFOSET]. See Appendix B for details.</p>
+
+    <p>A <i>MicroXML parser</i> is a software module that accepts a sequence of bytes or characters
+      as input, determines whether that sequence is a MicroXML document, and, if it is, makes a
+      representation of its abstract data model available to other modules.</p>
+
+    <p>MicroXML is designed to be dramatically simpler than XML, not only in its syntax but also in
+    its data model. Experience with XML has shown that for many applications much of the complexity
+    of XML is unnecessary. Indeed, many specifications that use XML have invented their own ad-hoc
+    subsets of XML (XMPP, SOAP, E4X, Scala). The complexity of XML does not affect just the
+    developers of XML parsers and other tools, but has an ongoing cost to users of XML and
+    developers of XML applications.</p>
+
+    <p>Although JSON has replaced XML in many applications where greater simplicity is desired, JSON
+    is awkward for representing structured documents that include mixed content (content that mixes
+    data characters and element). HTML is very widely used for representing structured documents.
+    However, MicroXML is a fundamentally different kind of format from HTML: MicroXML does
+    not define the semantics of any element or attribute names, whereas HTML does.  MicroXML
+    with appropriately chosen element and attribute names can be trivially transformed into
+    valid HTML. Like HTML and XML, MicroXML is designed to support the use of plain text editors
+    for authoring; it therefore preserves some of the conveniences provided by XML for such usage.</p>
+
+    <p>MicroXML has a number of advantages over full XML as a format for network protocols.  First, MicroXML
+    does not constrain how parsers recover from errors; in particular, MicroXML does not adopt XML's Draconian
+    error handling requirements.  This allows protocols using MicroXML to follow the traditional policy of being
+    liberal in what they accept..  Second, the features of XML that are most problematic from a security perspective
+    have been eliminated from MicroXML: most importantly, MicroXML completely eliminates document type
+    declarations, including entity declarations; MicroXML documents are self-contained, in the sense that the parsing
+    of a MicroXML document never requires access to any external resource.</p>
+
+    <p>This document, together with [RFC 2119] for requirement keywords and [Unicode] for
+      characters, provides all the information necessary to understand MicroXML and construct
+      computer programs to process it.</p>
+
+    <p>The keywords "<small>MUST</small>", "<small>MUST NOT</small>", "<small>REQUIRED</small>",
+        "<small>SHALL</small>", "<small>SHALL NOT</small>", "<small>SHOULD</small>", "<small>SHOULD
+        NOT</small>", "<small>RECOMMENDED</small>", "<small>MAY</small>", and
+        "<small>OPTIONAL</small>" in this document are to be interpreted as described in RFC
+      2119.</p>
+
+    <h2>2 Data Model</h2>
+
+    <p>The MicroXML abstract data model uses three primitive types:</p>
+    
+    <ul>
+      <li>character: this is an atomic type corresponding to an integer in the range 0 to
+      0x10FFFF, representing a single Unicode code point;</li>
+
+      <li>list: this is a structured type; it is an ordered list containing zero or more members,
+      which can be of any type;</li>
+
+      <li>map: this is a structured type; which associates each of zero or more keys with a value;
+      all the keys in a map are distinct; both the keys and the values of a map can be of any
+      type.</li>
+    </ul>
+
+    <p>A string is not a primitive type: it is just a list of zero or more characters.</p>
+
+    <p>The top-level construct in the MicroXML data model is an <i>element item</i>. An element item
+    is a list with exactly three members:</p>
+
+    <ol>
+      <li>a name item;</li>
+
+      <li>a attributes map: this is a possibly empty map, whose keys are name items and whose
+      values are strings;</li>
+
+      <li>a content list: this is a list with zero or more members; each member is either a character
+      or an element item.</li>
+    </ol>
+
+    <p>A <i>name item</i> is a non-empty string. The first character in the
+    string <small>MUST</small> match the production <i>nameStartChar</i>, and any subsequent
+    characters <small>MUST</small> match the production <i>nameChar</i>. In addition, a name item
+    occurring as a key in an attributes map <small>MUST</small> not be <code>xmlns</code>.</p>
+
+    <p>Any character occurring in the value of an attributes map or as a member of a content list
+    <small>MUST</small> match the production <i>char</i>.</p>
+
+    <h3>2.1 JSON syntax (informative)</h3>
+
+    <p>There are many possible ways of representing the data model in [JSON]. The following
+      is one possible way:</p>
+
+    <ul>
+      <li>an element item is represented as a JSON array;</li>
+      <li>a name item is represented as a JSON string;</li>
+      <li>an attributes map is represented as a JSON object;</li>
+      <li>values in an attribute map are represented as JSON strings;</li>
+      <li>a content list is represented as a JSON array;</li>
+      <li>a sequence of consecutive characters occurring in a content list are combined into a single JSON string.</li>
+    </ul>
+
+    <p>This document will use this syntax to represent the data model in examples.</p>
+
+    <h2>3 Syntax</h2>
+
+    <p>This section specifies the syntax of MicroXML. It also specifies how the syntax is parsed
+    into the abstract data model: for each syntactic form that contributes to the data model, it
+    specifies how the parse result for that form is constructed from the parse results of syntactic
+    subforms.</p>
+
+    <p>The abstract data model for a sequence of characters is constructed in two logical phases:</p>
+
+    <ol>
+      <li>line breaks in the sequence of characters are normalized by translating both the
+      two-character sequence #xD followed by #xA, and any #xD that is not followed by #xA, to a
+      single #xA character;</li>
+
+      <li>the sequence of characters is then parsed as a <i>document</i>, yielding an element
+      item as the parse result.</li>
+    </ol>
+
+    <h3>3.1 Documents</h3>
+
+    <pre>[1] document ::= byteOrderMark? (comment | s)* element (comment | s)*
+[2] byteOrderMark ::= #xFEFF</pre>
+
+    <p>The top-level syntactic form in MicroXML is a <i>document</i>. The parse result of
+    a <i>document</i> is the parse result of its single <i>element</i>.</p>
+
+    <p>This is an example of a small but complete MicroXML document exhibiting all
+      syntactic features:</p>
+
+    <pre>&lt;comment lang="en" date="2012-09-11"&gt;
+I &lt;em&gt;love&lt;/em&gt; &amp;#xB5;&lt;!-- MICRO SIGN --&gt;XML!&lt;br/&gt;
+It's so clean &amp;amp; simple.&lt;/comment&gt;</pre>
+
+    <p>The abstract data model of this document in the JSON syntax described in Section 2.1 is:</p>
+
+<pre>[ "comment",
+  {  "date": "2012-09-11", "lang": "en" },
+  [ "\nI ",
+    ["em", {}, ["love"]],
+    " \u03BCXML!",
+    ["br", {}, []],
+    "\nIt's so clean &amp; simple."
+  ]
+]</pre>
+
+    <h3>3.2 Elements</h3>
+
+    <pre>[3] element ::= startTag content endTag
+              | emptyElementTag
+[4] startTag ::= '&lt;' name attributeList s* '&gt;'
+[5] endTag ::= '&lt;/' name s* '&gt;'
+[6] content ::= (element | comment | dataChar | charRef)*
+[7] dataChar ::= char - ('&lt;'|'&amp;'|'&gt;')
+[8] emptyElementTag ::= '&lt;' name attributeList s* '/&gt;'</pre>
+
+    <p>The <i>startTag</i> and <i>endTag</i> of an <i>element</i> <small>MUST</small> have the
+    same <i>name</i>. Note that the syntax prohibits overlapping elements.</p>
+
+    <p>The parse result of an <i>element</i> is an element item. There are two alternative syntaxes
+    for an <i>element</i>.  For the general syntax, which uses a <i>startTag</i>
+    and an <i>endTag</i>, the parse result is constructed as follows:</p>
+    
+    <ul>
+      <li>the name item comes from the <i>name</i> in the <i>startTag</i>;</li>
+
+      <li>the attribute map comes from the <i>attributeList</i> in the <i>startTag</i>;</li>
+
+      <li>the content list comes from the <i>content</i> between the <i>startTag</i> and the <i>endTag</i>.</li>
+    </ul>
+
+    <p>The parse result of <i>content</i> is a content list, which is constructed by combining in
+    order the parse results of each <i>element</i>, <i>dataChar</i>, and <i>charRef</i> in
+    the <i>content</i>. The parse result of a <i>dataChar</i> is the character itself.</p>
+
+   <p>For example, this element has content that consists of two elements, each of whose content consists of characters:</p>
+
+    <pre>&lt;location&gt;&lt;city&gt;New York&lt;/city&gt;&lt;country&gt;US&lt;/country&gt;&lt;location&gt;</pre>
+
+    <p>Its abstract data model in JSON syntax is:</p>
+
+    <pre>["location", {}, [["city", {}, ["New York"]], ["country", {}, ["US"]]]]</pre>
+
+    <p>The other syntax, which consists of just an <i>emptyElementTag</i>, is equivalent
+       to a <i>startTag</i> immediately followed by an <i>endTag</i>.  The parse result is
+       constructed as follows:</p>
+
+    <ul>
+      <li>the name item comes from the <i>name</i> in the <i>emptyElementTag</i>;</li>
+
+      <li>the attribute map comes from the <i>attributeList</i> in the <i>emptyElementTag</i>;</li>
+
+      <li>the content list is empty.</li>
+    </ul>
+
+    <p>This is a simple example of an <i>emptyElementTag</i>:</p>
+
+    <pre>&lt;page-break/&gt;</pre>
+
+    <p>Its abstract data model in JSON syntax is:</p>
+
+    <pre>["page-break", {}, []]</pre>
+
+    <h3>3.3 Attributes</h3>
+
+    <pre>[9] attributeList ::= (s+ attribute)*
+[10] attribute ::= attributeName s* '=' s* attributeValue
+[11] attributeValue ::= '"' ((attributeValueChar - '&quot;') | charRef)* '"'
+                      | "'" ((attributeValueChar - "'") | charRef)* "'"
+[12] attributeValueChar ::= char - ('&lt;'|'&gt;'|'&amp;')
+[13] attributeName ::= name - 'xmlns'</pre>
+
+    <p>The parse result of an <i>attributeList</i> is an attributes map. For each <i>attribute</i>
+      in the <i>attributeList</i>, there is a key and associated value in the attributes map: the
+      key comes from the <i>attributeName</i> and the value comes from
+      the <i>attributeValue</i>.</p>
+    
+    <p>All the <i>attributeName</i>s in an <i>attributeList</i> <small>MUST</small> be distinct.</p>
+
+    <p>The parse results of each <i>attributeValueChar</i> and <i>charRef</i> in
+    the <i>attributeValue</i> are combined in order to construct the parse result of
+    the <i>attributeValue</i>. The parse result of an <i>attributeValueChar</i> is the character
+    itself.</p>
+
+    <p>For example, this is an element with two attributes:</p>
+
+    <pre>&lt;location city="New York" country="US"/&gt;</pre>
+
+    <p>Its abstract data model in JSON syntax is:</p>
+
+     <pre>["location", { "city": "New York", "country": "US" }, []]</pre>
+
+    <h3>3.4 Comments</h3>
+
+    <pre>[14] comment ::= '&lt;!--' ((char - '-') | ('-' (char - '-')))* '--&gt;'</pre>
+
+    <p>Comments are not part of the MicroXML data model and so have no parse result.</p>
+
+    <p>The syntax prohibits the occurrence of <code>--</code> except as part of the opening or
+    closing delimiter of the <i>comment</i>.</p>
+
+    <p>For example, this is a comment:</p>
+
+    <pre>&lt;!-- declarations for &lt;head&gt; &amp; &lt;body&gt; --&gt;</pre>
+
+    <p>Note that <code>&lt;head&gt;</code> and <code>&lt;body&gt;</code>
+     are not recognized as start-tags.</p>
+
+    <h3>3.5 Character references</h3>
+
+    <pre>[15] charRef ::= numericCharRef | namedCharRef
+[16] numericCharRef ::= '&amp;#x' charNumber ';'
+[17] charNumber ::= [0-9a-fA-F]+ 
+[18] namedCharRef ::= '&amp;' charName ';'
+[19] charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'</pre>
+ 
+   <p>MicroXML provides two kinds of character references: named character references provide an easy way
+     to escape characters that MicroXML does not allow to be used literally as data characters; numeric character
+     references provide a way to include arbitrary Unicode characters in MicroXML documents without needing a
+     Unicode-aware text editor.</p>
+    
+ <p>The parse result of a <i>charRef</i> is a single character. The code point of the parse
+      result of a <i>numericCharRef</i> is equal to <i>charNumber</i> interpreted as a hexadecimal
+      number; this <small>MUST</small> be the code point of a character that matches the <i>char</i>
+      production. The parse result of a <i>namedCharRef</i> depends on the the <i>charName</i> as
+      follows:</p>
+
+    <ul>
+      <li>for <code>amp</code>, it is &amp; (#x26);</li>
+      <li>for <code>lt</code>, it is &lt; (#x3C);</li>
+      <li>for <code>gt</code>, it is &gt; (#x3E);</li>
+      <li>for <code>quot</code>, it is &quot; (#x22);</li>
+      <li>for <code>apos</code>, it is &#x27; (#x27).</li>
+    </ul>
+
+    <p>For example, this is an element that contains two numeric character references:</p>
+
+    <pre>&lt;p&gt;&amp;#x3C;&amp;#x3bb;&lt;/p&gt;</pre>
+
+    <p>It has the same data model as this element, which uses one named character reference:</p>
+
+    <pre>&lt;p&gt;&amp;lt;&#x3BB;&lt;/p&gt;</pre>
+
+    <p>The data model in JSON syntax is:</p>
+
+    <pre>["p", {}, "&lt;\u03BB"]</pre>
+
+    <h3>3.6 Names</h3>
+
+    <pre>[20] name ::= nameStartChar nameChar*
+[21] nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
+                     | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
+                     | [#x3001-#xD7FF] | ([#xF900-#xEFFFF] - nonCharacterCodePoint)
+[22] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]</pre>
+
+    <p>The parse result of a <i>name</i> is a string whose members are the characters in the <i>name</i>.</p>
+
+    <p>Names beginning with a match to <code>(('X'|'x')('M'|'m')('L'|'l'))</code> are reserved for
+      standardization by the W3C.</p>
+
+    <p>The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol
+      characters, are excluded from names because they are useful as delimiters in contexts where
+      MicroXML names are used outside MicroXML documents. The character #x037E is excluded because
+      Unicode normalization turns it into a semicolon.</p>
+
+
+    <h3>3.7 Whitespace</h3>
+
+    <pre>[24] s ::= #x9 | #xA | #x20</pre>
+
+    <p>Whitespace is permitted in various places to increase readability. It does not affect the
+    data model. Note that #xD is not included here, because #xD characters are translated to #xA
+    characters during line-break normalization.</p>
+
+    <h3>3.8 Characters</h3>
+
+    <pre>[23] char ::= s | ([#0-#x10FFFF] - forbiddenCodePoint)
+
+[25] forbiddenCodePoint ::= controlCodePoint | surrogateCodePoint | nonCharacterCodePoint
+[26] controlCodePoint ::= [#x0-#1F] | [#7F-#9F]
+[27] surrogateCodePoint ::= [#xD800-#xDFFF]
+[28] nonCharacterCodePoint ::= [#xFDD0-#xFDEF] | [#xFFFE-#xFFFF] | [#x1FFFE-#x1FFFF]
+                             | [#x2FFFE-#x2FFFF] | [#x3FFFE-#x3FFFF] | [#x4FFFE-#x4FFFF]
+                             | [#x5FFFE-#x5FFFF] | [#x6FFFE-#x6FFFF] | [#x7FFFE-#x7FFFF]
+                             | [#x8FFFE-#x8FFFF] | [#x9FFFE-#x9FFFF] | [#xAFFFE-#xAFFFF]
+                             | [#xBFFFE-#xBFFFF] | [#xCFFFE-#xCFFFF] | [#xDFFFE-#xDFFFF]
+                             | [#xEFFFE-#xEFFFF] | [#xFFFFE-#xFFFFF] | [#x10FFFE-#x10FFFF]</pre>
+
+
+    <p>MicroXML prohibits three kinds of code points from occurring literally in MicroXML documents:</p>
+
+    <ul>
+      <li>control characters, other than those that MicroXML allows as whitespace;</li>
+      <li>noncharacters (see Section 16.7 of [Unicode]);</li>
+      <li>surrogates (see Section 16.6 of [Unicode]).</li>
+    </ul>
+
+    <h2>4 Conformance</h2>
+
+    <h3>4.1 Document conformance</h3>
+
+    <p>A sequence of characters is a conforming MicroXML document if, after line normalization, it
+      matches the production <i>document</i>, and meets the further constraints found in the text of
+      this document marked with the keywords <small>MUST</small>
+      or <small>REQUIRED</small>.</p>
+
+    <p>A sequence of bytes is a conforming MicroXML document if it is the UTF-8 [Unicode] encoding
+      of a sequence of characters that is a conforming XML document.</p>
+
+    <p>[Unicode] says that canonically equivalent sequences of characters ought to be treated as
+      identical. However, documents that are canonically equivalent according to Unicode but that
+      use distinct code point sequences are considered distinct by MicroXML parsers. This gives
+      rise to the possibility that the user might unintentionally create sequences of characters
+      that are canonically equivalent but are treated as distinct by MicroXML parsers. To avoid
+      this possibility, all documents <small>SHOULD</small> be in Normalization Form C as described by
+      [Unicode].</p>
+
+    <h3>4.2 Parser conformance</h3>
+
+    <p>For any sequence of bytes, a conforming MicroXML parser <small>MUST</small> be able to report correctly
+      whether it is a conforming MicroXML document.  If it is a conforming MicroXML document, then a
+      conforming MicroXML parser <small>MUST</small> be able to report the correct abstract data model for the document.</p>
+
+    <p>In some contexts, it may be appropriate for a conforming MicroXML parser to operate on
+    sequences of characters rather than sequences of bytes. In this case, the conformance
+    requirement of the preceding paragraph applies to sequences of characters rather than bytes.</p>
+
+    <p>A conforming MicroXML parser is free to use any data structure to represent the abstract data
+    model, provided that the data structure provides the same information as the abstract data
+    model.</p>
+
+    <p>A MicroXML parser <small>MAY</small> perform error correction, by providing an abstract data
+    model even for sequences of bytes that are not conforming MicroXML
+    documents. It <small>MUST</small>, however, still comply with the requirement of the first
+    paragraph to report that the sequence of bytes is not a conforming MicroXML document.</p>
+
+    <h2>5 Security considerations</h2>
+
+    <p>MicroXML does not provide any built-in service for integrity.  Integrity services have been
+    defined for XML using XML Canonicalization [RFC3076] and XML Signatures [RFC3275]. These
+    can be applied to MicroXML, with a number of caveats.  First, the XPath data model that is input
+    into XML Canonicalization <small>MUST</small> be constructed using a conforming MicroXML
+    parser rather than an XML parser. This is because an XML parser will normalize some attribute values
+    in a way that is incompatible with MicroXML, as detailed in Appendix B.2. Second, canonicalization
+    <small>MUST NOT</small> use the option to include comments, since the MicroXML data model
+    does not preserve comments.  Third, although the canonicalization of a MicroXML document will be
+    a well-formed XML document, it will not always be a conforming MicroXML document, because the
+    definition of XML canonicalization does not escape <code>&gt;</code> characters in attribute values; this will not
+    usually be a problem because the output of canonicalization is typically used only as input to a
+    digest algorithm.</p>
+
+    <p>MicroXML documents are encoded in UTF-8; the security considerations described in
+    [RFC3629] are therefore applicable to MicroXML.</p>
+
+    <h2>6 Notation</h2>
+
+    <p>The formal grammar of MicroXML is given in this document using a simple Extended
+      Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form
+        <code>symbol ::= expression</code>.</p>
+
+    <p>Within the expression on the right-hand side of a rule, the following expressions are used to
+      match strings of one or more characters:</p>
+
+    <dl>
+      <dt><code>#xN</code></dt>
+
+      <dd>where <code>N</code> is a hexadecimal integer, the expression matches the character in
+        Unicode whose code point has the value indicated.</dd>
+
+      <dt><code>[a-zA-Z], [#xN-#xN]</code></dt>
+
+      <dd>matches any character with a value in the range(s) indicated (inclusive).</dd>
+
+      <dt><code>[abc], [#xN#xN#xN]</code></dt>
+
+      <dd>matches any character with a value among the characters enumerated. Enumerations and
+        ranges can be mixed in one set of brackets.</dd>
+
+      <dt><code>"string"</code></dt>
+
+      <dd>matches a literal string matching the one given inside the double quotes.</dd>
+
+      <dt><code>'string'</code></dt>
+
+      <dd>matches a literal string matching the one given inside the single quotes.</dd>
+    </dl>
+
+    <p>These symbols can be combined to match more complex patterns as follows, where <code>A</code>
+      and <code>B</code> represent expressions:</p>
+
+    <dl>
+      <dt><code>(A)</code></dt>
+
+      <dd>expression is treated as a unit and can be combined as described in this list.</dd>
+
+      <dt><code>A?</code></dt>
+
+      <dd>matches <code>A</code> or nothing; <small>OPTIONAL</small>
+        <code>A</code>.</dd>
+
+      <dt><code>A B</code></dt>
+
+      <dd>matches <code>A</code> followed by <code>B</code>. This operator has higher precedence
+        than alternation; thus <code>A B | C D</code> is identical to <code>(A B) | (C
+        D)</code>.</dd>
+
+      <dt><code>A | B</code></dt>
+
+      <dd>matches <code>A</code> or <code>B</code>.</dd>
+
+      <dt><code>A - B</code></dt>
+
+      <dd>matches any string that matches <code>A</code> but does not match <code>B</code>.</dd>
+
+      <dt><code>A+</code></dt>
+
+      <dd>matches one or more occurrences of <code>A</code>. This operation has higher precedence
+        than alternation; thus <code>A+ | B+</code> is identical to <code>(A+) | (B+)</code>.</dd>
+
+      <dt><code>A*</code></dt>
+
+      <dd>matches zero or more occurrences of A. This operation has higher precedence than
+        alternation; thus <code>A* | B*</code> is identical to <code>(A*) | (B*)</code>.</dd>
+    </dl>
+
+    <h2>Appendix A: References</h2>
+
+    <p>While these references cite a particular edition of a specification, conforming
+      implementations of MicroXML <small>MAY</small> support later editions either in addition or as
+      replacements, thus allowing MicroXML users to benefit from corrections and extensions to the
+      other specifications on which it depends.</p>
+
+    <h3>A.1 Normative References</h3>
+
+    <dl>   
+      <dt>RFC 2119</dt>
+      <dd>IETF (Internet Engineering Task Force). RFC 2119: Key words for use in RFCs to Indicate
+        Requirement Levels. Scott Bradner, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)</dd>
+
+      <dt>RFC3629</dt>
+      <dd>IETF (Internet Engineering Task Force). RFC3629: UTF-8, a transformation format of ISO 10646.
+      F. Yergeau, 2003. (See http://www.ietf.org/rfc/rfc3629.txt.)</dd>
+
+      <dt>Unicode</dt>
+      <dd>The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The
+        Unicode Consortium, 2011. ISBN 978-1-936213-01-6)</dd> 
+    </dl>
+
+   <h3>A.2 Informative References</h3>
+
+   <dl>
+     <dt>JSON</dt>
+     <dd>IETF (Internet Engineering Task Force). RFC 4627: The application/json Media Type for
+     JavaScript Object Notation (JSON). D. Crockford, 2006. (See
+     http://www.ietf.org/rfc/rfc4627.txt.)</dd>
+
+     <dt>INFOSET</dt>
+     <dd>W3C (World Wide Web Consortium). XML Information Set. John Cowan and Richard Tobin. (See
+     http://www.w3.org/TR/xml-infoset.)</dd>
+
+     <dt>RFC3076</dt>
+     <dd>IETF (Internet Engineering Task Force). RFC 3076: Canonical XML Version 1.0.
+       J. Boyer, 2001.  (See http://www.ietf.org/rfc/rfc3076.txt.)</dd>
+
+     <dt>RFC3275</dt>
+     <dd>IETF (Internet Engineering Task Force). RFC 3275: (Extensible Markup
+         Language) XML-Signature Syntax and Processing. D. Eastlake, J. Reagle, and D. Solo, 2002.
+        (See http://www.ietf.org/rfc/rfc3275.txt.)</dd>
+
+     <dt>XML</dt>
+     <dd>W3C (World Wide Web Consortium). Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim
+     Bray et al., 2008. (See http://www.w3.org/TR/xml/.)</dd>
+   </dl>
+
+   <h2>Appendix B: Relationship to XML (informative)</h2>
+
+   <h3>B.1 Syntax</h3>
+   
+   <p>Relative to XML 1.0 Fifth Edition, MicroXML prohibits:</p>
+
+   <ul>
+     <li>the XML declaration;</li>
+     <li>the document type declaration, and so entities other than the five built-in entities;</li>
+     <li>processing instructions;</li>
+     <li>CDATA sections;</li>
+     <li>colons in element and attribute names;</li>
+     <li>attributes named <code>xmlns</code>;</li>
+     <li>literal &gt; characters in content or attribute values (XML requires &gt; to be quoted only
+     in content and only when preceeded by <code>]]</code>);</li>
+     <li>Unicode noncharacters (XML only prohibits #xFFFE and #xFFFF);</li>
+     <li>Unicode C1 control characters;</li>
+     <li>numeric character references to #xD;</li>
+     <li>decimal character references;</li>
+     <li>encodings other than UTF-8.</li>
+   </ul>
+
+   <p>MicroXML parsers are not required to use draconian error handling.</p>
+   
+   <h3>B.2 Data model</h3>
+
+   <p>The MicroXML data model corresponds to the following information items and properties from the
+   XML information set:</p>
+
+   <ul>
+     <li>the element information item with the local name, attributes and children properties;</li>
+     <li>the attribute information item with the local name and normalized value properties;</li>
+     <li>the character information item with the character code property.</li>
+   </ul>
+
+   <p>MicroXML's data model is incompatible with XML in one respect: XML requires that literal
+   newlines and tabs in attribute values are normalized into spaces, but MicroXML leaves them
+   unchanged. For example, in XML
+
+<pre>&lt;doc att="hello
+world"/&gt;</pre>
+
+   <p>and</p>
+
+<pre>&lt;doc att="hello world"/&gt;</pre>
+
+   <p>have the same information set, but in MicroXML they do not.  Note that this incompatibility cannot in
+   general be fixed by postprocessing, since XML does not normalize newlines and tabs in attribute values
+   that were entered as numeric character references, and the MicroXML data model does not provide
+   information about which characters were entered as numeric character references.</p>
author	James Clark <jjc@jclark.com>
	Wed, 19 Sep 2012 16:59:21 +0700
changeset 0	85c5e402c30d
child 1	4e3c73c01e75
child 2	34e23f299f9a