This was at http://code.google.com/p/xml5/ before.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Makefile Mon Feb 20 15:54:48 2012 +0100
@@ -0,0 +1,11 @@
+ANOLIS = anolis
+
+all: Overview.html data/xrefs/dom/xml-er.json
+
+Overview.html: Overview.src.html data Makefile
+ $(ANOLIS) --output-encoding=ascii --omit-optional-tags --quote-attr-values \
+ --w3c-compat --enable=xspecxref --enable=refs --w3c-shortname="xml-er" \
+ --filter=".publish" $< $@
+
+data/xrefs/dom/xml-er.json: Overview.src.html Makefile
+ $(ANOLIS) --dump-xrefs=$@ $< /tmp/spec
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Overview.html Mon Feb 20 15:54:48 2012 +0100
@@ -0,0 +1,1928 @@
+<!DOCTYPE html><html lang="en"><meta charset="utf-8">
+<title>XML-ER</title>
+<style>
+ pre.idl { border:solid thin; background:#eee; color:#000; padding:0.5em }
+ pre.idl :link, pre.idl :visited { color:inherit; background:transparent }
+ pre code { color:inherit; background:transparent }
+ div.example { margin-left:1em; padding-left:1em; border-left:double; color:#222; background:#fcfcfc }
+ .note { margin-left:2em; font-weight:bold; font-style:italic; color:#008000 }
+ p.note::before { content:"Note: " }
+ .XXX { padding:.5em; border:solid #f00 }
+ dfn { font-weight:bold; font-style:normal }
+ code { color:orangered }
+ code :link, code :visited { color:inherit }
+ dl.switch { padding-left: 2em; }
+ dl.switch dt { text-indent: -1.5em; }
+ dl.switch dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
+</style>
+<link href="http://www.w3.org/StyleSheets/TR/base" rel="stylesheet">
+
+<div class="head">
+
+ <h1>XML-ER</h1>
+ <h2 class="no-num no-toc" id="20-february-2012">20 February 2012</h2>
+
+ <dl>
+ <dt>This Version:
+ <dd><a href="http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html">http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html</a>
+
+ <dt>Participate:
+ <dd><a href="mailto:public-xml-er@w3.org">public-xml-er</a> (<a href="http://lists.w3.org/Archives/Public/public-xml-er/">archives</a>)
+ <!-- XXX
+ <dd><a href="https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WebAppsWG&component=DOM">File a bug</a>
+ -->
+ <dd class="dontpublish"><a href="http://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a>
+
+ <dt>Editor:
+ <dd><a href="http://annevankesteren.nl/">Anne van Kesteren</a>
+ (<a href="http://www.opera.com/">Opera Software ASA</a>)
+ <<a href="mailto:annevk@opera.com">annevk@opera.com</a>>
+ </dl>
+
+ <p class="dontpublish copyright"><a href="http://creativecommons.org/publicdomain/zero/1.0/" rel="license"><img alt="CC0" src="http://i.creativecommons.org/p/zero/1.0/80x15.png"></a>
+ To the extent possible under law, the editors have waived all copyright and
+ related or neighboring rights to this work. In addition, as of
+ 20 February 2012, the editors have made this specification available
+ under the
+ <a href="http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0" rel="license">Open Web Foundation Agreement Version 1.0</a>,
+ which is available at
+ http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
+</div>
+
+<h2 class="no-num no-toc" id="see-also">See also</h2>
+
+<ul>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Charter">Charter</a>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Requirements">Requirements</a>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Issues">Issues</a>
+</ul>
+
+
+<h2 class="no-num no-toc" id="table-of-contents">Table of contents</h2>
+
+
+<!--begin-toc-->
+<ol class="toc">
+ <li><a href="#conformance"><span class="secno">1 </span>Conformance</a></li>
+ <li><a href="#writing-xml-documents"><span class="secno">2 </span>Writing XML documents</a></li>
+ <li><a href="#parsing-xml-documents"><span class="secno">3 </span>Parsing XML documents</a>
+ <ol class="toc">
+ <li><a href="#overview"><span class="secno">3.1 </span>Overview</a></li>
+ <li><a href="#input-stream"><span class="secno">3.2 </span>Input stream</a></li>
+ <li><a href="#tokenization"><span class="secno">3.3 </span>Tokenization</a></li>
+ <li><a href="#tree-construction"><span class="secno">3.4 </span>Tree construction</a></ol></li>
+ <li><a class="no-num" href="#references">References</a></ol>
+<!--end-toc-->
+
+
+<h2 id="conformance"><span class="secno">1 </span>Conformance</h2>
+<p>All diagrams, examples, and notes in this specification are
+non-normative, as are all sections explicitly marked non-normative.
+Everything else in this specification is normative.
+
+<p>The key words "MUST", "MUST NOT", "REQUIRED", <!--"SHALL", "SHALL
+NOT",--> "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in the normative parts of this document are to be
+interpreted as described in RFC2119. For readability, these words do
+not appear in all uppercase letters in this specification.
+<a href="#refsRFC2119">[RFC2119]</a>
+
+
+<h2 id="writing-xml-documents"><span class="secno">2 </span>Writing XML documents</h2>
+
+<p class="XXX">...
+
+
+<h2 id="parsing-xml-documents"><span class="secno">3 </span>Parsing XML documents</h2>
+
+<p>This section and its subsection define the <dfn id="xml-parser">XML parser</dfn>.
+
+<p>This specification defines the parsing rules for XML documents, whether
+they are syntactically correct or not. Certain points in the parsing
+algorithm are said to be <dfn id="parse-error" title="parse error">parse errors</dfn>. The
+handling for parse errors is well-defined: user agents must either act as
+described below when encountering such problems, or must terminate
+processing at the first error that they encounter for which they do not wish
+to apply the rules described below.
+<!-- XXX -->
+
+
+<h3 id="overview"><span class="secno">3.1 </span>Overview</h3>
+
+<p>The input to the XML parsing process consists of a stream of octets which
+is converted to a stream of code points, which in turn are tokenized, and
+finally those tokens are used to construct a tree.
+
+
+<h3 id="input-stream"><span class="secno">3.2 </span>Input stream</h3>
+
+<p>The stream of Unicode characters that consists the input to the
+tokenization stage will be initially seen by the user agent as a stream of
+octets (typically coming over the network or from the local file system).
+The octets encode Unicode code points according to a particular encoding,
+which the user agent must use to decode the octets into code points.
+
+<p class="XXX">Define how to find the encoding...
+
+
+<h3 id="tokenization"><span class="secno">3.3 </span>Tokenization</h3>
+
+<p>Implementations must act as if they used the following
+ state machine to tokenize XML. The state machine must
+ start in the <a href="#data-state">data state</a>. Most states consume a single character,
+ which can have various side-effects, and either switches the state machine to
+ a new state to reconsume the same character, or switches it to a new state
+ (to consume the next character), or repeats the same state (to consume the
+ next character). Some states have more complicated behaviour and can consume
+ several characters before switching to another state.
+
+ <p>The output of the tokenization stage is a series of zero or more of the
+ following tokens: start tag, empty tag, end tag, short end tag, comment,
+ character, processing instruction and end-of-file. Start and empty tag tokens
+ have a tag name and a list of attributes, each of which has a name and a
+ value. End tags have a tag name. Comment and character tokens have data.
+ Processing instructions have a name and data.
+
+ <p>The tokenization stage also uses a <dfn id="list-of-entities">list of entities</dfn> and a
+ <dfn id="list-of-parameter-entities">list of parameter entities</dfn>. Both lists are populated with tokens
+ consisting of a name and value during the tokenization stage and are also used
+ within this stage.
+
+ <p>Whenever the steps below indicate that the user agent has to
+ <dfn id="append-entity">append an entity</dfn> an entity has to be appended to
+ the <a href="#list-of-entities">list of entities</a> unless the entity flag has been set to
+ "parameter" in which case it hsa to be appended to the <a href="#list-of-parameter-entities">list of parameter
+ entities</a>. The <dfn id="entity-flag">entity flag</dfn> has two values: "normal" and
+ "parameter". Its default value is "normal". It is set to "normal" after an
+ entity has been appended.
+
+ <p>The tokenization stage also has a <dfn id="list-of-attribute-declarations">list of attribute declarations</dfn>
+ each consisting of a tag name and a list of attributes which consist of an
+ attribute name, type and default value.
+
+ <dl>
+ <dt><dfn id="data-state">Data state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0026 (<code>&</code>)
+
+ <dd class="XXX">...
+
+ <dt>U+003C (<code><</code>)
+ <dd>Switch to the <a href="#tag-state">tag state</a>.
+
+ <dt>EOF
+
+ <dd>Emit an end-of-file token.
+
+ <dt>Anything else
+
+ <dd>Emit the input character as character token. Stay in this state.
+ </dl>
+
+
+ <dt><dfn id="tag-state">Tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002F (<code>/</code>)
+
+ <dd>Switch to the <a href="#end-tag-state">end tag state</a>.
+
+ <dt>U+003F (<code>?</code>)
+
+ <dd>Switch to the <a href="#pi-state">pi state</a>.
+
+ <dt>U+0021 (<code>!</code>)
+ <dd>Switch to the <a href="#markup-declaration-state">markup declaration state</a>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>U+003A (<code>:</code>)
+ <dt>U+003C (<code><</code>)
+ <dt>U+003E (<code>></code>)
+ <dt>EOF
+
+ <dd><a href="#parse-error">Parse error</a>. Emit a U+003C (<code><</code>) character.
+ Reconsume the current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+
+ <dd>Create a new tag token and set its name to the input character, then
+ switch to the <a href="#tag-name-state">tag name state</a>.
+ </dl>
+
+
+ <dt><dfn id="end-tag-state">End tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+
+ <dd>Emit a short end tag token and then switch to the <a href="#data-state">data
+ state</a>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>U+003C (<code><</code>)
+ <dt>U+003A (<code>:</code>)
+ <dt>EOF
+
+ <dd><a href="#parse-error">Parse error</a>. Emit a U+003C (<code><</code>) character
+ token and a U+002F (<code>/</code>) character token. Reconsume the current
+ input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+
+ <dd>Create an end tag token and set its name to the input character, then
+ switch to the <a href="#end-tag-name-state">end tag name state</a>.
+ </dl>
+
+
+ <dt><dfn id="end-tag-name-state">End tag name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+
+ <dd>Switch to the <a href="#end-tag-name-after-state">end tag name after state</a>.
+
+ <dt>EOF
+
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data
+ state</a>.
+
+ <dt>Anything else
+
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn id="end-tag-name-after-state">End tag name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd><a href="#parse-error">Parse error</a>. Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="pi-state">Pi state</dfn>
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#bogus-comment-state">bogus comment state</a>.
+
+ <dt>Anything else
+ <dd>Create a new processing instruction token. Set target to the current
+ input character and data to the empty string. Then switch to the <a href="#pi-target-state">pi
+ target state</a>.
+ </dl>
+
+
+ <dt><dfn id="pi-target-state">Pi target state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#pi-target-after-state">pi target after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <a href="#pi-after-state">pi after state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn id="pi-target-after-state">Pi target after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <a href="#pi-data-state">pi data
+ state</a>.
+ </dl>
+
+
+ <dt><dfn id="pi-data-state">Pi data state</dfn>
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <a href="#pi-after-state">pi after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the pi's data and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn id="pi-after-state">Pi after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Append the current input character to the pi's data and stay in the
+ current state.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <a href="#pi-data-state">pi data
+ state</a>.
+ </dl>
+
+
+ <dt><dfn id="markup-declaration-state">Markup declaration state</dfn>
+ <dd>
+ <p>If the next two characters are both U+002D (<code>-</code>)
+ characters, consume those two characters, create a comment token whose
+ data is the empty string and then switch to the
+ <a href="#comment-state">comment state</a>.
+
+ <p>Otherwise, if the next seven characters are an exact match for
+ "<code title="">[CDATA[</code>", then consume those characters and switch
+ to the <a href="#cdata-state">CDATA state</a>.
+
+ <p>Otherwise, if the next seven characters are an exact match for
+ "<code title="">DOCTYPE</code>", then this is a <a href="#parse-error">parse error</a>.
+ Consume those characters and switch to the
+ <a href="#doctype-state">DOCTYPE state</a>.
+ <!-- XXX make them legal? -->
+
+ <p>Otherwise, this is a <a href="#parse-error">parse error</a>. Switch to the
+ <a href="#bogus-comment-state">bogus comment state</a>.
+
+ <dt><dfn id="comment-state">Comment state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <a href="#comment-dash-state">comment dash state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the comment token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current character to the comment data.
+ </dl>
+
+
+ <dt><dfn id="comment-dash-state">Comment dash state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <a href="#comment-end-state">comment end state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the comment token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append a U+002D (<code>-</code>) and the current input character to the
+ comment token's data. Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="comment-end-state">Comment end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the comment token. Switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+002D (<code>-</code>)
+ <dd>Append the current input character to the comment token's data. Stay in
+ the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the comment token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append two U+002D (<code>-</code>) characters and the current input
+ character to the comment token's data. Switch to the <a href="#comment-state">comment
+ state</a>.
+ </dl>
+
+
+ <dt><dfn id="cdata-state">CDATA state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <a href="#cdata-bracket-state">CDATA bracket state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Emit the current input character as character token. Stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn id="cdata-bracket-state">CDATA bracket state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <a href="#cdata-end-state">CDATA end state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Emit a U+005D (<code>]</code>) character as character token and also
+ emit the current input character as character token. Stay in the current
+ state.
+ </dl>
+
+
+ <dt><dfn id="cdata-end-state">CDATA end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+005D (<code>]</code>)
+ <dd>Emit the current input character as character token. Stay in the
+ current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Emit two U+005D (<code>]</code>) characters as character tokens and
+ also emit the current input character as character token. Switch to the
+ <a href="#cdata-state">CDATA state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-state">DOCTYPE state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-root-name-before-state">DOCTYPE root name before state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <a href="#bogus-comment-state">bogus comment
+ state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-root-name-before-state">DOCTYPE root name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>.
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-root-name-state">DOCTYPE root name state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-root-name-state">DOCTYPE root name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-root-name-after-state">DOCTYPE root name after state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+005B (<code>[</code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-root-name-after-state">DOCTYPE root name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-identifier-double-quoted-state">DOCTYPE identifier double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-identifier-single-quoted-state">DOCTYPE identifier single quoted state</a>.
+
+ <dt>U+005B (<code>[</code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-identifier-double-quoted-state">DOCTYPE identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-root-name-after-state">DOCTYPE root name after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-identifier-single-quoted-state">DOCTYPE identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-root-name-after-state">DOCTYPE root name after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-internal-subset-state">DOCTYPE internal subset state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003C (<code><</code>)
+ <dd>Switch to the <a href="#doctype-tag-state">DOCTYPE tag state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>U+0025 (<code>%</code>)
+ <dd class="XXX"> consume parameter entity
+
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-after-state">DOCTYPE internal subset after state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-internal-subset-after-state">DOCTYPE internal subset after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-tag-state">DOCTYPE tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0021 (<code>!</code>)
+ <dd>Switch to the <a href="#doctype-markup-declaration-state">DOCTYPE markup declaration state</a>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <a href="#doctype-pi-state">DOCTYPE pi state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-markup-declaration-state">DOCTYPE markup declaration state</dfn>
+ <dd>
+ <p>If the next two characters are both U+002D (<code>-</code>) characters,
+ then consume those characters and switch to the <a href="#doctype-comment-state">DOCTYPE comment
+ state</a>.
+
+ <p>Otherwise, if the next six characters are an exact match for "ENTITY",
+ then consume those characters and switch to the <a href="#doctype-entity-state">DOCTYPE ENTITY
+ state</a>.
+
+ <p>Otherwise, if the next seven characters are an exact match for "ATTLIST",
+ then consume those characters and switch to the <a href="#doctype-attlist-state">DOCTYPE ATTLIST
+ state</a>.
+
+ <p>Otherwise, if the next eight characters are an exact match for
+ "NOTATION", then consume those characters and switch to the <a href="#doctype-notation-state">DOCTYPE
+ NOTATION state</a>.
+
+ <p>Otherwise, switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ <!-- xxx parse error somewhere here? -->
+
+
+ <dt><dfn id="doctype-comment-state">DOCTYPE comment state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <a href="#doctype-comment-dash-state">DOCTYPE comment dash state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-comment-dash-state">DOCTYPE comment dash state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <a href="#doctype-comment-end-state">DOCTYPE comment end state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-comment-state">DOCTYPE comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-comment-end-state">DOCTYPE comment end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <a href="#doctype-comment-dash-state">DOCTYPE comment dash state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+
+ <dd>Switch to the <a href="#doctype-comment-state">DOCTYPE comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-state">DOCTYPE ENTITY state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-entity-type-before-state">DOCTYPE ENTITY type before state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-type-before-state">DOCTYPE ENTITY type before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0025 (<code>%</code>)
+ <dd>Switch to the <a href="#doctype-entity-parameter-before-state">DOCTYPE ENTITY parameter before state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Create an entity token with the name set to the current input character
+ and the value set to the empty string. Then switch to the <a href="#doctype-entity-name-state">DOCTYPE
+ ENTITY name state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-parameter-before-state">DOCTYPE ENTITY parameter before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-entity-parameter-state">DOCTYPE ENTITY parameter state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-parameter-state">DOCTYPE ENTITY parameter state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Create an entity token with the name set to the current input character
+ and the value set to the empty string. Set the <a href="#entity-flag">entity flag</a> to
+ "parameter". Switch to the <a href="#doctype-entity-name-state">DOCTYPE ENTITY name state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-name-state">DOCTYPE ENTITY name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-entity-name-after-state">DOCTYPE ENTITY name after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the name of the entity.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-name-after-state">DOCTYPE ENTITY name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-entity-value-double-quoted-state">DOCTYPE ENTITY value double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-entity-value-single-quoted-state">DOCTYPE ENTITY value single quoted state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-entity-identifier-state">DOCTYPE ENTITY identifier state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-value-double-quoted-state">DOCTYPE ENTITY value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <!-- XXX "%" -->
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-entity-value-after-state">DOCTYPE ENTITY value after state</a>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class="XXX">... normalize numeric entities only
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current entity token's
+ value.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-value-single-quoted-state">DOCTYPE ENTITY value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <!-- "%" -->
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-entity-value-after-state">DOCTYPE ENTITY value after state</a>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class="XXX">... normalize numeric entities only
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken == None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current entity token's
+ value.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-value-after-state">DOCTYPE ENTITY value after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd><a href="#append-entity">Append an entity</a>. Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal
+ subset state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-identifier-state">DOCTYPE ENTITY identifier state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd class="XXX"> append entity ...
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-entity-identifier-double-quoted-state">DOCTYPE ENTITY identifier double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-entity-identifier-single-quoted-state">DOCTYPE ENTITY identifier single quoted state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-identifier-double-quoted-state">DOCTYPE ENTITY identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-entity-identifier-state">DOCTYPE ENTITY identifier state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-entity-identifier-single-quoted-state">DOCTYPE ENTITY identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-entity-identifier-state">DOCTYPE ENTITY identifier state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reconsume the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-state">DOCTYPE ATTLIST state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-attlist-name-before-state">DOCTYPE ATTLIST name before state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-name-before-state">DOCTYPE ATTLIST name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...
+<!--xxx
+ self.currentToken = {"name":data, "attrs":[]}
+ <dd>Switch to the <span>DOCTYPE ATTLIST name state</span>.-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-name-state">DOCTYPE ATTLIST name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-attlist-name-after-state">DOCTYPE ATTLIST name after state</a>.
+
+ <dt>EOF
+ <!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...
+ <!--<dd>Append the current input character to the tag name and stay in the current state.-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-name-after-state">DOCTYPE ATTLIST name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)<!-- xxx
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...
+ <!-- self.currentToken["attrs state</span>..append({"name":data, "type":"",
+ "dv":""})
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute name state</span>.-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-name-state">DOCTYPE ATTLIST attribute name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-attlist-attribute-name-after-state">DOCTYPE ATTLIST attribute name after state</a>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...
+<!-- self.currentToken["attrs state</span>.[-1]["name state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-name-after-state">DOCTYPE ATTLIST attribute name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...<!--
+ self.currentToken["attrs state</span>.[-1]["type state</span>. += data
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute type state</span>.-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-type-state">DOCTYPE ATTLIST attribute type state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-attlist-attribute-type-after-state">DOCTYPE ATTLIST attribute type after state</a>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX">...
+<!-- self.currentToken["attrs state</span>.[-1]["type state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-type-after-state">DOCTYPE ATTLIST attribute type after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0023 (<code>#</code>)
+ <dd>Switch to the <a href="#doctype-attlist-attribute-declaration-before-state">DOCTYPE ATTLIST attribute declaration before state</a>.
+
+ <dt>EOF<!--
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-declaration-before-state">DOCTYPE ATTLIST attribute declaration before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-attlist-attribute-declaration-state">DOCTYPE ATTLIST attribute declaration
+ state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-declaration-state">DOCTYPE ATTLIST attribute declaration state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-attlist-attribute-declaration-after-state">DOCTYPE ATTLIST attribute declaration after state</a>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-declaration-after-state">DOCTYPE ATTLIST attribute declaration after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-attlist-attribute-value-double-quoted-state">DOCTYPE ATTLIST attribute value double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-attlist-attribute-value-single-quoted-state">DOCTYPE ATTLIST attribute value single quoted state</a>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd class="XXX"> ...
+<!-- self.currentToken["attrs state</span>..append({"name":data, "type":"",
+ "dv":""})
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute name state</span>.-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-value-double-quoted-state">DOCTYPE ATTLIST attribute value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-attlist-name-after-state">DOCTYPE ATTLIST name after state</a>.
+<!--
+ <dt>U+0025 (<code>%</code>)
+ raise NotSupportedError
+-->
+ <dt>U+0026 (<code>&</code>):
+ <dd class="XXX">...
+
+ <dt>Anything else
+ <dd class="issue="> ...
+ <!-- self.currentToken["attrs state</span>.[-1]["dv state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn id="doctype-attlist-attribute-value-single-quoted-state">DOCTYPE ATTLIST attribute value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-attlist-name-after-state">DOCTYPE ATTLIST name after state</a>.
+<!--
+ <dt>U+0025 (<code>%</code>)
+ raise NotSupportedError
+-->
+ <dt>U+0026 (<code>&</code>):
+ <dd class="XXX">...
+
+ <dt>Anything else
+ <dd class="XXX"> ...
+ <!-- self.currentToken["attrs state</span>.[-1]["dv state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn id="doctype-notation-state">DOCTYPE NOTATION state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#doctype-notation-identifier-state">DOCTYPE NOTATION identifier state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-bogus-comment-state">DOCTYPE bogus comment state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-notation-identifier-state">DOCTYPE NOTATION identifier state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-notation-identifier-double-quoted-state">DOCTYPE NOTATION identifier double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-notation-identifier-single-quoted-state">DOCTYPE NOTATION identifier single quoted state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-notation-identifier-double-quoted-state">DOCTYPE NOTATION identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#doctype-notation-identifier-state">DOCTYPE NOTATION identifier state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-notation-identifier-single-quoted-state">DOCTYPE NOTATION identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#doctype-notation-identifier-state">DOCTYPE NOTATION identifier state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-pi-state">DOCTYPE pi state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <a href="#doctype-pi-after-state">DOCTYPE pi after state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn id="doctype-pi-after-state">DOCTYPE pi after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Switch to the <a href="#doctype-pi-state">DOCTYPE pi state</a>.
+ </dl>
+
+
+ <dt><dfn id="doctype-bogus-comment-state">DOCTYPE bogus comment state</dfn>
+ <dd><p>Consume every character up to the first U+003E (<code>></code>) or
+ EOF, whichever comes first. Emit a comment token whose data is the
+ concatenation of all those consumed characters. Then consume the next input
+ character and switch to the <a href="#doctype-internal-subset-state">DOCTYPE internal subset state</a>
+ reprocessing the EOF character if that was the character consumed.
+
+ <dt><dfn id="tag-name-state">Tag name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#tag-attribute-name-before-state">tag attribute name before state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <a href="#empty-tag-state">empty tag state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn id="empty-tag-state">Empty tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current tag token as empty tag token and then switch to the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd><a href="#parse-error">Parse error</a>. Reprocess the current input character in the
+ <a href="#tag-attribute-name-before-state">tag attribute name before state</a>.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-name-before-state">Tag attribute name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+
+ <dd>Stay in the current state.
+
+ <dt>uU+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <a href="#empty-tag-state">Empty tag state</a>.
+
+ <dt>U+003A (<code>:</code>)
+ <dd><a href="#parse-error">Parse error</a>. Stay in the current state.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Start a new attribute in the current tag token. Set that attribute's
+ name to the current input character and its value to the empty string and
+ then switch to the <a href="#tag-attribute-name-state">tag attribute name state</a>.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-name-state">Tag attribute name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+003D (<code>=</code>)
+ <dd>Switch to the <a href="#tag-attribute-value-before-state">tag attribute value before state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token as start tag token. Switch to the <a href="#data-state">data
+ state</a>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#tag-attribute-name-after-state">tag attribute name after state</a>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <a href="#empty-tag-state">Empty tag state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token as start tag token and
+ then reprocess the current input character in the <a href="#data-state">data
+ state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current attribute's name.
+ Stay in the current state.
+ </dl>
+
+ <p>When the user agent leaves this state (and before emitting the tag token,
+ if appropriate), the complete attribute's name must be
+ compared to the other attributes on the same token; if there is already an
+ attribute on the token with the exact same name, then this is a parse error
+ and the new attribute must be dropped, along with the
+ value that gets associated with it (if any).
+
+
+ <dt><dfn id="tag-attribute-name-after-state">Tag attribute name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003D (<code>=</code>)
+ <dd>Switch to the <a href="#tag-attribute-value-before-state">tag attribute value before state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <a href="#empty-tag-state">empty tag state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Start a new attribute in the current tag token. Set that attribute's
+ name to the current input character and its value to the empty string and
+ then switch to the <a href="#tag-attribute-name-state">tag attribute name state</a>.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-value-before-state">Tag attribute value before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#tag-attribute-value-double-quoted-state">tag attribute value double quoted state</a>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#tag-attribute-value-single-quoted-state">tag attribute value single quoted state</a>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd>Reprocess the input character in the <a href="#tag-attribute-value-unquoted-state">tag attribute value unquoted
+ state</a>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <a href="#data-state">data state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current attribute's value and
+ then switch to the <a href="#tag-attribute-value-unquoted-state">tag attribute value unquoted state</a>.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-value-double-quoted-state">Tag attribute value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <a href="#tag-attribute-name-before-state">tag attribute name before state</a>.
+
+ <dt>U+0026 (<code>&</code>)
+ <dd class="XXX">...
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-value-single-quoted-state">Tag attribute value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <a href="#tag-attribute-name-before-state">tag attribute name before state</a>.
+
+ <dt>U+0026 (<code>&</code>)
+ <dd class="XXX">...
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token and then reprocess the
+ current input character in the <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn id="tag-attribute-value-unquoted-state">Tag attribute value unquoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <a href="#tag-attribute-name-before-state">tag attribute name before state</a>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class="XXX">...
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token as start tag token and then switch to the
+ <a href="#data-state">data state</a>.
+
+ <dt>EOF
+ <dd><a href="#parse-error">Parse error</a>. Emit the current token as start tag token and
+ then reprocess the current input character in the
+ <a href="#data-state">data state</a>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn id="bogus-comment-state">Bogus comment state</dfn>
+
+ <dd><p>Consume every character up to the first U+003E (<code>></code>) or
+ EOF, whichever comes first. Emit a comment token whose data is the
+ concatenation of all those consumed characters. Then consume the next input
+ character and switch to the <a href="#data-state">data state</a> reprocessing the EOF
+ character if that was the character consumed.
+ </dl>
+
+
+
+<h3 id="tree-construction"><span class="secno">3.4 </span>Tree construction</h3>
+
+<p>The input to the tree construction stage is a sequence of tokens from the
+<span>tokenization</span> stage. The output of this stage is a tree model
+represented by a <code>Document</code> object.
+
+<p>The tree construction stage passes through several phases. The initial
+phase is the <a href="#start-phase">start phase</a>.
+
+<p>The <dfn id="stack-of-open-elements">stack of open elements</dfn> contains all elements of which the
+closing tag has not yet been encountered. Once the first start tag token in
+the <a href="#start-phase">start phase</a> is encountered it will contain one open
+element. The rest of the elements are added during the
+<a href="#main-phase">main phase</a>.
+
+<p>The <dfn id="current-element">current element</dfn> is the bottommost node in the
+<a href="#stack-of-open-elements">stack of open elements</a>.
+
+<p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to
+<dfn id="have-an-element-in-scope">have an element in scope</dfn> if the target element is in the
+<a href="#stack-of-open-elements">stack of open elements</a>.
+
+<p>When the steps below require the user agent to
+<dfn id="append-a-character">append a character</dfn> to a node, the user agent must collect it
+and all subsequent consecutive characters that would be appended to that
+node and insert a single <code>Text</code> node whose data is the
+concatenation of all those characters.
+
+<p class="XXX">Need to define
+<dfn id="create-an-element-for-the-token">create an element for the token</dfn>...
+<!-- namespaces and such -->
+
+
+<p>When the steps below require the user agent to
+<dfn id="insert-an-element">insert an element</dfn> for a token the user agent must
+<a href="#create-an-element-for-the-token">create an element for the token</a> and then append it to the
+<a href="#current-element">current element</a> and push it into the
+<a href="#stack-of-open-elements">stack of open elements</a> so that it becomes the new
+<a href="#current-element">current element</a>.
+
+
+<dl class="switch">
+ <dt><dfn id="start-phase">Start phase</dfn>
+ <dd>
+ <p>Each token emitted from the tokenization stage must be
+ processed as follows until the algorithm below switches to a different
+ phase:
+
+ <dl class="switch">
+ <dt>A start tag token
+
+ <dd><p><a href="#create-an-element-for-the-token">Create an element for the token</a> and then append it to
+ the <code>Document</code> node and push it into the
+ <a href="#stack-of-open-elements">stack of open elements</a>. This element is the root element and
+ the first <a href="#current-element">current element</a>. Then switch to the
+ <a href="#main-phase">main phase</a>.
+
+ <dt>An empty tag token
+
+ <dd><p><a href="#create-an-element-for-the-token">Create an element for the token</a> and append it to the
+ <code>Document</code> node. Then switch to the <a href="#end-phase">end phase</a>.
+
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <code>Document</code> node
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the
+ <code>Document</code> node with the <code>target</code> and <code>data</code>
+ atributes set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><a href="#parse-error">Parse error</a>. Reprocess the token in the
+ <a href="#end-phase">end phase</a>.
+
+ <dt>Anything else
+
+ <dd><p><a href="#parse-error">Parse error</a>. Ignore the token.
+ </dl>
+
+ <dt><dfn id="main-phase">Main phase</dfn>
+ <dd>
+ <p>Once a start tag token has been encountered (as detailed in the
+ previous phase) each token must be process using the following steps until
+ further notice:
+
+ <dl class="switch">
+ <dt>A character token
+
+ <dd><p><a href="#append-a-character">Append a character</a> to the <a href="#current-element">current
+ element</a>.
+
+ <dt>A start tag token
+
+ <dd><p><a href="#insert-an-element">Insert an element</a> for the token.
+
+ <dt>An empty tag token
+
+ <dd><p><a href="#create-an-element-for-the-token">Create an element for the token</a> and append it to the
+ <a href="#current-element">current element</a>.
+
+ <dt>An end tag token
+
+ <dd>
+ <p>If the tag name of the <span>current node</span> does not match the tag
+ name of the end tag token this is a <a href="#parse-error">parse error</a>.
+
+ <p>If there is an <span>element in scope</span> with the same tag name as
+ that of the token pop nodes from the <a href="#stack-of-open-elements">stack of open elements</a>
+ until the first such element has been popped from the stack.
+
+ <p>If there are no more elements on the
+ <a href="#stack-of-open-elements">stack of open elements</a> at this point switch to the
+ <a href="#end-phase">end phase</a>.
+
+ <dt>A short end tag token
+
+ <dd><p>Pop an element from the <a href="#stack-of-open-elements">stack of open elements</a>. If
+ there are no more elements on the <a href="#stack-of-open-elements">stack of open elements</a>
+ switch to the <a href="#end-phase">end phase</a>.
+
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <a href="#current-element">current element</a>
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the <a href="#current-element">current
+ element</a> with the <code>target</code> and <code>data</code> atributes
+ set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><a href="#parse-error">Parse error</a>. Reprocess the token in the
+ <a href="#end-phase">end phase</a>.
+ </dl>
+
+ <dt><dfn id="end-phase">End phase</dfn>
+ <dd>
+ <p>Tokens in the end phase must be handled as follows:
+
+ <dl class="switch">
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <code>Document</code> node
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the
+ <code>Document</code> node with the <code>target</code> and <code>data</code>
+ atributes set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><a href="#stop-parsing">Stop parsing</a>.
+
+ <dt>Anything else
+
+ <dd><p><a href="#parse-error">Parse error</a>. Ignore the token.
+ </dl>
+</dl>
+
+<p>Once the user agent <dfn id="stop-parsing" title="stop parsing">stops parsing</dfn> the
+document, it must follow these steps:
+
+<ol class="XXX">
+</ol>
+
+
+<h2 class="no-num" id="references">References</h2>
+
+<div id="anolis-references-normative"><dl><dt id="refsRFC2119">[RFC2119]
+<dd><cite><a href="http://tools.ietf.org/html/rfc2119">Key words for use in RFCs to Indicate Requirement Levels</a></cite>, Scott Bradner. IETF.
+
+</dl></div>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Overview.src.html Mon Feb 20 15:54:48 2012 +0100
@@ -0,0 +1,1915 @@
+<!doctype html>
+<html lang="en">
+<meta charset=utf-8>
+<title>XML-ER</title>
+<style>
+ pre.idl { border:solid thin; background:#eee; color:#000; padding:0.5em }
+ pre.idl :link, pre.idl :visited { color:inherit; background:transparent }
+ pre code { color:inherit; background:transparent }
+ div.example { margin-left:1em; padding-left:1em; border-left:double; color:#222; background:#fcfcfc }
+ .note { margin-left:2em; font-weight:bold; font-style:italic; color:#008000 }
+ p.note::before { content:"Note: " }
+ .XXX { padding:.5em; border:solid #f00 }
+ dfn { font-weight:bold; font-style:normal }
+ code { color:orangered }
+ code :link, code :visited { color:inherit }
+ dl.switch { padding-left: 2em; }
+ dl.switch dt { text-indent: -1.5em; }
+ dl.switch dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
+</style>
+<link rel="stylesheet" href="http://www.w3.org/StyleSheets/TR/base">
+
+<div class=head>
+
+ <h1>XML-ER</h1>
+ <h2 class="no-num no-toc">[DATE: 01 Jan 1901]</h2>
+
+ <dl>
+ <dt>This Version:
+ <dd><a href="http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html">http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html</a>
+
+ <dt>Participate:
+ <dd><a href="mailto:public-xml-er@w3.org">public-xml-er</a> (<a href="http://lists.w3.org/Archives/Public/public-xml-er/">archives</a>)
+ <!-- XXX
+ <dd><a href="https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WebAppsWG&component=DOM">File a bug</a>
+ -->
+ <dd class=dontpublish><a href="http://wiki.whatwg.org/wiki/IRC">IRC: #whatwg on Freenode</a>
+
+ <dt>Editor:
+ <dd><a href="http://annevankesteren.nl/">Anne van Kesteren</a>
+ (<a href="http://www.opera.com/">Opera Software ASA</a>)
+ <<a href="mailto:annevk@opera.com">annevk@opera.com</a>>
+ </dl>
+
+ <p class="dontpublish copyright"><a rel="license" href="http://creativecommons.org/publicdomain/zero/1.0/"><img src="http://i.creativecommons.org/p/zero/1.0/80x15.png" alt="CC0"></a>
+ To the extent possible under law, the editors have waived all copyright and
+ related or neighboring rights to this work. In addition, as of
+ [DATE: 01 Jan 1901], the editors have made this specification available
+ under the
+ <a rel=license href="http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0">Open Web Foundation Agreement Version 1.0</a>,
+ which is available at
+ http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
+</div>
+
+<h2 class="no-num no-toc">See also</h2>
+
+<ul>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Charter">Charter</a>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Requirements">Requirements</a>
+ <li><a href="http://www.w3.org/community/xml-er/wiki/Issues">Issues</a>
+</ul>
+
+
+<h2 class="no-num no-toc">Table of contents</h2>
+
+<!--toc-->
+
+
+<h2>Conformance</h2>
+<p>All diagrams, examples, and notes in this specification are
+non-normative, as are all sections explicitly marked non-normative.
+Everything else in this specification is normative.
+
+<p>The key words "MUST", "MUST NOT", "REQUIRED", <!--"SHALL", "SHALL
+NOT",--> "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in the normative parts of this document are to be
+interpreted as described in RFC2119. For readability, these words do
+not appear in all uppercase letters in this specification.
+<span data-anolis-ref>RFC2119</span>
+
+
+<h2>Writing XML documents</h2>
+
+<p class=XXX>...
+
+
+<h2>Parsing XML documents</h2>
+
+<p>This section and its subsection define the <dfn>XML parser</dfn>.
+
+<p>This specification defines the parsing rules for XML documents, whether
+they are syntactically correct or not. Certain points in the parsing
+algorithm are said to be <dfn title="parse error">parse errors</dfn>. The
+handling for parse errors is well-defined: user agents must either act as
+described below when encountering such problems, or must terminate
+processing at the first error that they encounter for which they do not wish
+to apply the rules described below.
+<!-- XXX -->
+
+
+<h3>Overview</h3>
+
+<p>The input to the XML parsing process consists of a stream of octets which
+is converted to a stream of code points, which in turn are tokenized, and
+finally those tokens are used to construct a tree.
+
+
+<h3>Input stream</h3>
+
+<p>The stream of Unicode characters that consists the input to the
+tokenization stage will be initially seen by the user agent as a stream of
+octets (typically coming over the network or from the local file system).
+The octets encode Unicode code points according to a particular encoding,
+which the user agent must use to decode the octets into code points.
+
+<p class=XXX>Define how to find the encoding...
+
+
+<h3>Tokenization</h3>
+
+<p>Implementations must act as if they used the following
+ state machine to tokenize XML. The state machine must
+ start in the <span>data state</span>. Most states consume a single character,
+ which can have various side-effects, and either switches the state machine to
+ a new state to reconsume the same character, or switches it to a new state
+ (to consume the next character), or repeats the same state (to consume the
+ next character). Some states have more complicated behaviour and can consume
+ several characters before switching to another state.
+
+ <p>The output of the tokenization stage is a series of zero or more of the
+ following tokens: start tag, empty tag, end tag, short end tag, comment,
+ character, processing instruction and end-of-file. Start and empty tag tokens
+ have a tag name and a list of attributes, each of which has a name and a
+ value. End tags have a tag name. Comment and character tokens have data.
+ Processing instructions have a name and data.
+
+ <p>The tokenization stage also uses a <dfn>list of entities</dfn> and a
+ <dfn>list of parameter entities</dfn>. Both lists are populated with tokens
+ consisting of a name and value during the tokenization stage and are also used
+ within this stage.
+
+ <p>Whenever the steps below indicate that the user agent has to
+ <dfn id="append-entity">append an entity</dfn> an entity has to be appended to
+ the <span>list of entities</span> unless the entity flag has been set to
+ "parameter" in which case it hsa to be appended to the <span>list of parameter
+ entities</span>. The <dfn>entity flag</dfn> has two values: "normal" and
+ "parameter". Its default value is "normal". It is set to "normal" after an
+ entity has been appended.
+
+ <p>The tokenization stage also has a <dfn>list of attribute declarations</dfn>
+ each consisting of a tag name and a list of attributes which consist of an
+ attribute name, type and default value.
+
+ <dl>
+ <dt><dfn>Data state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0026 (<code>&</code>)
+
+ <dd class=XXX>...
+
+ <dt>U+003C (<code><</code>)
+ <dd>Switch to the <span>tag state</span>.
+
+ <dt>EOF
+
+ <dd>Emit an end-of-file token.
+
+ <dt>Anything else
+
+ <dd>Emit the input character as character token. Stay in this state.
+ </dl>
+
+
+ <dt><dfn>Tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002F (<code>/</code>)
+
+ <dd>Switch to the <span>end tag state</span>.
+
+ <dt>U+003F (<code>?</code>)
+
+ <dd>Switch to the <span>pi state</span>.
+
+ <dt>U+0021 (<code>!</code>)
+ <dd>Switch to the <span>markup declaration state</span>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>U+003A (<code>:</code>)
+ <dt>U+003C (<code><</code>)
+ <dt>U+003E (<code>></code>)
+ <dt>EOF
+
+ <dd><span>Parse error</span>. Emit a U+003C (<code><</code>) character.
+ Reconsume the current input character in the <span>data state</span>.
+
+ <dt>Anything else
+
+ <dd>Create a new tag token and set its name to the input character, then
+ switch to the <span>tag name state</span>.
+ </dl>
+
+
+ <dt><dfn>End tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+
+ <dd>Emit a short end tag token and then switch to the <span>data
+ state</span>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>U+003C (<code><</code>)
+ <dt>U+003A (<code>:</code>)
+ <dt>EOF
+
+ <dd><span>Parse error</span>. Emit a U+003C (<code><</code>) character
+ token and a U+002F (<code>/</code>) character token. Reconsume the current
+ input character in the <span>data state</span>.
+
+ <dt>Anything else
+
+ <dd>Create an end tag token and set its name to the input character, then
+ switch to the <span>end tag name state</span>.
+ </dl>
+
+
+ <dt><dfn>End tag name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+
+ <dd>Switch to the <span>end tag name after state</span>.
+
+ <dt>EOF
+
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data
+ state</span>.
+
+ <dt>Anything else
+
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn>End tag name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd><span>Parse error</span>. Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>Pi state</dfn>
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>bogus comment state</span>.
+
+ <dt>Anything else
+ <dd>Create a new processing instruction token. Set target to the current
+ input character and data to the empty string. Then switch to the <span>pi
+ target state</span>.
+ </dl>
+
+
+ <dt><dfn>Pi target state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>pi target after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <span>pi after state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn>Pi target after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <span>pi data
+ state</span>.
+ </dl>
+
+
+ <dt><dfn>Pi data state</dfn>
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <span>pi after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the pi's data and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn>Pi after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Append the current input character to the pi's data and stay in the
+ current state.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <span>pi data
+ state</span>.
+ </dl>
+
+
+ <dt><dfn>Markup declaration state</dfn>
+ <dd>
+ <p>If the next two characters are both U+002D (<code>-</code>)
+ characters, consume those two characters, create a comment token whose
+ data is the empty string and then switch to the
+ <span>comment state</span>.
+
+ <p>Otherwise, if the next seven characters are an exact match for
+ "<code title>[CDATA[</code>", then consume those characters and switch
+ to the <span>CDATA state</span>.
+
+ <p>Otherwise, if the next seven characters are an exact match for
+ "<code title>DOCTYPE</code>", then this is a <span>parse error</span>.
+ Consume those characters and switch to the
+ <span>DOCTYPE state</span>.
+ <!-- XXX make them legal? -->
+
+ <p>Otherwise, this is a <span>parse error</span>. Switch to the
+ <span>bogus comment state</span>.
+
+ <dt><dfn>Comment state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <span>comment dash state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the comment token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current character to the comment data.
+ </dl>
+
+
+ <dt><dfn>Comment dash state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <span>comment end state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the comment token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append a U+002D (<code>-</code>) and the current input character to the
+ comment token's data. Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>Comment end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the comment token. Switch to the <span>data state</span>.
+
+ <dt>U+002D (<code>-</code>)
+ <dd>Append the current input character to the comment token's data. Stay in
+ the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the comment token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append two U+002D (<code>-</code>) characters and the current input
+ character to the comment token's data. Switch to the <span>comment
+ state</span>.
+ </dl>
+
+
+ <dt><dfn>CDATA state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <span>CDATA bracket state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Emit the current input character as character token. Stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn>CDATA bracket state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <span>CDATA end state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Emit a U+005D (<code>]</code>) character as character token and also
+ emit the current input character as character token. Stay in the current
+ state.
+ </dl>
+
+
+ <dt><dfn>CDATA end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>U+005D (<code>]</code>)
+ <dd>Emit the current input character as character token. Stay in the
+ current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Emit two U+005D (<code>]</code>) characters as character tokens and
+ also emit the current input character as character token. Switch to the
+ <span>CDATA state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE root name before state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Reprocess the current input character in the <span>bogus comment
+ state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE root name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>.
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE root name state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE root name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE root name after state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>U+005B (<code>[</code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE root name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE identifier double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE identifier single quoted state</span>.
+
+ <dt>U+005B (<code>[</code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE root name after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE root name after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE internal subset state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003C (<code><</code>)
+ <dd>Switch to the <span>DOCTYPE tag state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>U+0025 (<code>%</code>)
+ <dd class=XXX> consume parameter entity
+
+ <dt>U+005D (<code>]</code>)
+ <dd>Switch to the <span>DOCTYPE internal subset after state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE internal subset after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0021 (<code>!</code>)
+ <dd>Switch to the <span>DOCTYPE markup declaration state</span>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <span>DOCTYPE pi state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE markup declaration state</dfn>
+ <dd>
+ <p>If the next two characters are both U+002D (<code>-</code>) characters,
+ then consume those characters and switch to the <span>DOCTYPE comment
+ state</span>.
+
+ <p>Otherwise, if the next six characters are an exact match for "ENTITY",
+ then consume those characters and switch to the <span>DOCTYPE ENTITY
+ state</span>.
+
+ <p>Otherwise, if the next seven characters are an exact match for "ATTLIST",
+ then consume those characters and switch to the <span>DOCTYPE ATTLIST
+ state</span>.
+
+ <p>Otherwise, if the next eight characters are an exact match for
+ "NOTATION", then consume those characters and switch to the <span>DOCTYPE
+ NOTATION state</span>.
+
+ <p>Otherwise, switch to the <span>DOCTYPE bogus comment state</span>.
+ <!-- xxx parse error somewhere here? -->
+
+
+ <dt><dfn>DOCTYPE comment state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <span>DOCTYPE comment dash state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE comment dash state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <span>DOCTYPE comment end state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE comment end state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>U+002D (<code>-</code>)
+ <dd>Switch to the <span>DOCTYPE comment dash state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+
+ <dd>Switch to the <span>DOCTYPE comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ENTITY type before state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY type before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0025 (<code>%</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY parameter before state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Create an entity token with the name set to the current input character
+ and the value set to the empty string. Then switch to the <span>DOCTYPE
+ ENTITY name state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY parameter before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ENTITY parameter state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY parameter state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Create an entity token with the name set to the current input character
+ and the value set to the empty string. Set the <span>entity flag</span> to
+ "parameter". Switch to the <span>DOCTYPE ENTITY name state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ENTITY name after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the name of the entity.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY value double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY value single quoted state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE ENTITY identifier state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <!-- XXX "%" -->
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY value after state</span>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class=XXX>... normalize numeric entities only
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current entity token's
+ value.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <!-- "%" -->
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY value after state</span>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class=XXX>... normalize numeric entities only
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken == None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current entity token's
+ value.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY value after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd><span>Append an entity</span>. Switch to the <span>DOCTYPE internal
+ subset state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY identifier state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd class=XXX> append entity ...
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY identifier double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY identifier single quoted state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY identifier state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ENTITY identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ENTITY identifier state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reconsume the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ATTLIST name before state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...
+<!--xxx
+ self.currentToken = {"name":data, "attrs":[]}
+ <dd>Switch to the <span>DOCTYPE ATTLIST name state</span>.-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ATTLIST name after state</span>.
+
+ <dt>EOF
+ <!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...
+ <!--<dd>Append the current input character to the tag name and stay in the current state.-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)<!-- xxx
+ self.currentToken = None-->
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...
+ <!-- self.currentToken["attrs state</span>..append({"name":data, "type":"",
+ "dv":""})
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute name state</span>.-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute name after state</span>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...
+<!-- self.currentToken["attrs state</span>.[-1]["name state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...<!--
+ self.currentToken["attrs state</span>.[-1]["type state</span>. += data
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute type state</span>.-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute type state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute type after state</span>.
+
+ <dt>EOF<!-- xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX>...
+<!-- self.currentToken["attrs state</span>.[-1]["type state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute type after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0023 (<code>#</code>)
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute declaration before state</span>.
+
+ <dt>EOF<!--
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute declaration before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute declaration
+ state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute declaration state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute declaration after state</span>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute declaration after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute value double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute value single quoted state</span>.
+
+ <dt>EOF<!--xxx
+XXX parse error
+ self.currentToken = None-->
+ <dd>Switch to the <span>data state</span>.
+
+ <dt>Anything else
+ <dd class=XXX> ...
+<!-- self.currentToken["attrs state</span>..append({"name":data, "type":"",
+ "dv":""})
+ <dd>Switch to the <span>DOCTYPE ATTLIST attribute name state</span>.-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE ATTLIST name after state</span>.
+<!--
+ <dt>U+0025 (<code>%</code>)
+ raise NotSupportedError
+-->
+ <dt>U+0026 (<code>&</code>):
+ <dd class=XXX>...
+
+ <dt>Anything else
+ <dd class="issue="> ...
+ <!-- self.currentToken["attrs state</span>.[-1]["dv state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE ATTLIST attribute value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE ATTLIST name after state</span>.
+<!--
+ <dt>U+0025 (<code>%</code>)
+ raise NotSupportedError
+-->
+ <dt>U+0026 (<code>&</code>):
+ <dd class=XXX>...
+
+ <dt>Anything else
+ <dd class=XXX> ...
+ <!-- self.currentToken["attrs state</span>.[-1]["dv state</span>. += data-->
+ </dl>
+
+
+ <dt><dfn>DOCTYPE NOTATION state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>DOCTYPE NOTATION identifier state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE bogus comment state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE NOTATION identifier state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE NOTATION identifier double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE NOTATION identifier single quoted state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE NOTATION identifier double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>DOCTYPE NOTATION identifier state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE NOTATION identifier single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>DOCTYPE NOTATION identifier state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE pi state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003F (<code>?</code>)
+ <dd>Switch to the <span>DOCTYPE pi after state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Stay in the current state.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE pi after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Switch to the <span>DOCTYPE internal subset state</span>.
+
+ <dt>U+003F (<code>?</code>)
+ <dd>Stay in the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Switch to the <span>DOCTYPE pi state</span>.
+ </dl>
+
+
+ <dt><dfn>DOCTYPE bogus comment state</dfn>
+ <dd><p>Consume every character up to the first U+003E (<code>></code>) or
+ EOF, whichever comes first. Emit a comment token whose data is the
+ concatenation of all those consumed characters. Then consume the next input
+ character and switch to the <span>DOCTYPE internal subset state</span>
+ reprocessing the EOF character if that was the character consumed.
+
+ <dt><dfn>Tag name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>tag attribute name before state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <span>empty tag state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the tag name and stay in the
+ current state.
+ </dl>
+
+
+ <dt><dfn>Empty tag state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current tag token as empty tag token and then switch to the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd><span>Parse error</span>. Reprocess the current input character in the
+ <span>tag attribute name before state</span>.
+ </dl>
+
+
+ <dt><dfn>Tag attribute name before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+
+ <dd>Stay in the current state.
+
+ <dt>uU+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <span>Empty tag state</span>.
+
+ <dt>U+003A (<code>:</code>)
+ <dd><span>Parse error</span>. Stay in the current state.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Start a new attribute in the current tag token. Set that attribute's
+ name to the current input character and its value to the empty string and
+ then switch to the <span>tag attribute name state</span>.
+ </dl>
+
+
+ <dt><dfn>Tag attribute name state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+003D (<code>=</code>)
+ <dd>Switch to the <span>tag attribute value before state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token as start tag token. Switch to the <span>data
+ state</span>.
+
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>tag attribute name after state</span>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <span>Empty tag state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token as start tag token and
+ then reprocess the current input character in the <span>data
+ state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current attribute's name.
+ Stay in the current state.
+ </dl>
+
+ <p>When the user agent leaves this state (and before emitting the tag token,
+ if appropriate), the complete attribute's name must be
+ compared to the other attributes on the same token; if there is already an
+ attribute on the token with the exact same name, then this is a parse error
+ and the new attribute must be dropped, along with the
+ value that gets associated with it (if any).
+
+
+ <dt><dfn>Tag attribute name after state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+003D (<code>=</code>)
+ <dd>Switch to the <span>tag attribute value before state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>U+002F (<code>/</code>)
+ <dd>Switch to the <span>empty tag state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Start a new attribute in the current tag token. Set that attribute's
+ name to the current input character and its value to the empty string and
+ then switch to the <span>tag attribute name state</span>.
+ </dl>
+
+
+ <dt><dfn>Tag attribute value before state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Stay in the current state.
+
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>tag attribute value double quoted state</span>.
+
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>tag attribute value single quoted state</span>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd>Reprocess the input character in the <span>tag attribute value unquoted
+ state</span>.
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token and then switch to the <span>data state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the current input character to the current attribute's value and
+ then switch to the <span>tag attribute value unquoted state</span>.
+ </dl>
+
+
+ <dt><dfn>Tag attribute value double quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0022 (<code>"</code>)
+ <dd>Switch to the <span>tag attribute name before state</span>.
+
+ <dt>U+0026 (<code>&</code>)
+ <dd class=XXX>...
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn>Tag attribute value single quoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+
+ <dl class="switch">
+ <dt>U+0027 (<code>'</code>)
+ <dd>Switch to the <span>tag attribute name before state</span>.
+
+ <dt>U+0026 (<code>&</code>)
+ <dd class=XXX>...
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token and then reprocess the
+ current input character in the <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn>Tag attribute value unquoted state</dfn>
+
+ <dd>
+ <p>Consume the <span>next input character</span>:
+ <dl class="switch">
+ <dt>U+0009
+ <dt>U+000A
+ <dt>U+0020
+ <dd>Switch to the <span>tag attribute name before state</span>.
+
+ <dt>U+0026 (<code>&</code>):
+ <dd class=XXX>...
+
+ <dt>U+003E (<code>></code>)
+ <dd>Emit the current token as start tag token and then switch to the
+ <span>data state</span>.
+
+ <dt>EOF
+ <dd><span>Parse error</span>. Emit the current token as start tag token and
+ then reprocess the current input character in the
+ <span>data state</span>.
+
+ <dt>Anything else
+ <dd>Append the input character to the current attribute's value. Stay in
+ the current state.
+ </dl>
+
+
+ <dt><dfn>Bogus comment state</dfn>
+
+ <dd><p>Consume every character up to the first U+003E (<code>></code>) or
+ EOF, whichever comes first. Emit a comment token whose data is the
+ concatenation of all those consumed characters. Then consume the next input
+ character and switch to the <span>data state</span> reprocessing the EOF
+ character if that was the character consumed.
+ </dl>
+
+
+
+<h3>Tree construction</h3>
+
+<p>The input to the tree construction stage is a sequence of tokens from the
+<span>tokenization</span> stage. The output of this stage is a tree model
+represented by a <code>Document</code> object.
+
+<p>The tree construction stage passes through several phases. The initial
+phase is the <span>start phase</span>.
+
+<p>The <dfn>stack of open elements</dfn> contains all elements of which the
+closing tag has not yet been encountered. Once the first start tag token in
+the <span>start phase</span> is encountered it will contain one open
+element. The rest of the elements are added during the
+<span>main phase</span>.
+
+<p>The <dfn>current element</dfn> is the bottommost node in the
+<span>stack of open elements</span>.
+
+<p>The <span>stack of open elements</span> is said to
+<dfn>have an element in scope</dfn> if the target element is in the
+<span>stack of open elements</span>.
+
+<p>When the steps below require the user agent to
+<dfn>append a character</dfn> to a node, the user agent must collect it
+and all subsequent consecutive characters that would be appended to that
+node and insert a single <code>Text</code> node whose data is the
+concatenation of all those characters.
+
+<p class=XXX>Need to define
+<dfn>create an element for the token</dfn>...
+<!-- namespaces and such -->
+
+
+<p>When the steps below require the user agent to
+<dfn>insert an element</dfn> for a token the user agent must
+<span>create an element for the token</span> and then append it to the
+<span>current element</span> and push it into the
+<span>stack of open elements</span> so that it becomes the new
+<span>current element</span>.
+
+
+<dl class=switch>
+ <dt><dfn>Start phase</dfn>
+ <dd>
+ <p>Each token emitted from the tokenization stage must be
+ processed as follows until the algorithm below switches to a different
+ phase:
+
+ <dl class="switch">
+ <dt>A start tag token
+
+ <dd><p><span>Create an element for the token</span> and then append it to
+ the <code>Document</code> node and push it into the
+ <span>stack of open elements</span>. This element is the root element and
+ the first <span>current element</span>. Then switch to the
+ <span>main phase</span>.
+
+ <dt>An empty tag token
+
+ <dd><p><span>Create an element for the token</span> and append it to the
+ <code>Document</code> node. Then switch to the <span>end phase</span>.
+
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <code>Document</code> node
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the
+ <code>Document</code> node with the <code>target</code> and <code>data</code>
+ atributes set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><span>Parse error</span>. Reprocess the token in the
+ <span>end phase</span>.
+
+ <dt>Anything else
+
+ <dd><p><span>Parse error</span>. Ignore the token.
+ </dl>
+
+ <dt><dfn>Main phase</dfn>
+ <dd>
+ <p>Once a start tag token has been encountered (as detailed in the
+ previous phase) each token must be process using the following steps until
+ further notice:
+
+ <dl class="switch">
+ <dt>A character token
+
+ <dd><p><span>Append a character</span> to the <span>current
+ element</span>.
+
+ <dt>A start tag token
+
+ <dd><p><span>Insert an element</span> for the token.
+
+ <dt>An empty tag token
+
+ <dd><p><span>Create an element for the token</span> and append it to the
+ <span>current element</span>.
+
+ <dt>An end tag token
+
+ <dd>
+ <p>If the tag name of the <span>current node</span> does not match the tag
+ name of the end tag token this is a <span>parse error</span>.
+
+ <p>If there is an <span>element in scope</span> with the same tag name as
+ that of the token pop nodes from the <span>stack of open elements</span>
+ until the first such element has been popped from the stack.
+
+ <p>If there are no more elements on the
+ <span>stack of open elements</span> at this point switch to the
+ <span>end phase</span>.
+
+ <dt>A short end tag token
+
+ <dd><p>Pop an element from the <span>stack of open elements</span>. If
+ there are no more elements on the <span>stack of open elements</span>
+ switch to the <span>end phase</span>.
+
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <span>current element</span>
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the <span>current
+ element</span> with the <code>target</code> and <code>data</code> atributes
+ set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><span>Parse error</span>. Reprocess the token in the
+ <span>end phase</span>.
+ </dl>
+
+ <dt><dfn>End phase</dfn>
+ <dd>
+ <p>Tokens in the end phase must be handled as follows:
+
+ <dl class="switch">
+ <dt>A comment token
+
+ <dd><p>Append a <code>Comment</code> node to the <code>Document</code> node
+ with the <code>data</code> attribute set to the data given in the
+ token.
+
+ <dt>A processing instruction token
+
+ <dd><p>Append a <code>ProcessingInstruction</code> node to the
+ <code>Document</code> node with the <code>target</code> and <code>data</code>
+ atributes set to the target and data given in the token.
+
+ <dt>An end-of-file token
+
+ <dd><p><span>Stop parsing</span>.
+
+ <dt>Anything else
+
+ <dd><p><span>Parse error</span>. Ignore the token.
+ </dl>
+</dl>
+
+<p>Once the user agent <dfn title="stop parsing">stops parsing</dfn> the
+document, it must follow these steps:
+
+<ol class=XXX>
+</ol>
+
+
+<h2 class="no-num">References</h2>
+
+<div id=anolis-references-normative></div>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.markdown Mon Feb 20 15:54:48 2012 +0100
@@ -0,0 +1,1 @@
+The repository for [XML-ER](http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html).