DOM Parsing and Serialization

DOMParser, XMLSerializer, innerHTML, and similar APIs

W3C Editor's Draft

This version:
https://w3c.github.io/DOM-Parsing/
Latest published version:
https://www.w3.org/TR/DOM-Parsing/
Latest editor's draft:
https://w3c.github.io/DOM-Parsing/
Editor:
(Microsoft)
Test Suites
http://w3c-test.org/domparsing/
http://w3c-test.org/html/syntax/
Participate
We are on Github.
Bugzilla Bug list.
Github Issues.
Commit history.
Mailing list.

Abstract

This specification defines APIs for the parsing and serializing of HTML and XML-based DOM nodes for web applications.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Web Platform Working Group as an Editor's Draft.

Comments regarding this document are welcome. Please send them to www-dom@w3.org (subscribe, archives) with DOM-Parsing at the start of your email's subject.

Publication as an Editor's Draft does not imply endorsement by the W3C Membership.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 15 September 2020 W3C Process Document.

Candidate Recommendation Exit Criteria

This specification will not advance to Proposed Recommendation before the spec's test suite is completed and two or more independent implementations pass each test, although no single implementation must pass each test. We expect to meet this criteria no sooner than 24 October 2014. The group will also create an Implementation Report.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript. [ECMA-262]

Unless otherwise stated, string comparisons are done in a case-sensitive manner.

If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.

Extensibility

Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.

If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.

When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. Such an extension specification becomes an applicable specification for the purposes of conformance requirements in this specification.

1. Introduction

A document object model (DOM) is an in-memory representation of various types of Nodes where each Node is connected in a tree. The [HTML5] and [DOM4] specifications describe DOM and its Nodes is greater detail.

Parsing is the term used for converting a string representation of a DOM into an actual DOM, and Serializing is the term used to transform a DOM back into a string. This specification concerns itself with defining various APIs for both parsing and serializing a DOM.

For example: the innerHTML API is a common way to both parse and serialize a DOM (it does both). If a particular Node, has the following in-memory DOM:
HTMLDivElement (nodeName: "div")
┃
┣━ HTMLSpanElement (nodeName: "span")
┃  ┃
┃  ┗━ Text (data: "some ")
┃
┗━ HTMLElement (nodeName: "em")
   ┃
   ┗━ Text (data: "text!")
And the HTMLDivElement node is stored in a variable myDiv, then to serialize myDiv's children simply get (read) the Element's innerHTML property (this triggers the serialization):
var serializedChildren = myDiv.innerHTML;
// serializedChildren has the value:
// "<span>some </span><em>text!</em>"

To parse new children for myDiv from a string (replacing its existing children), simply set the innerHTML property (this triggers parsing of the assigned string):

myDiv.innerHTML = "<span>new</span><em>children!</em>";

This specification describes two flavors of parsing and serializing: HTML and XML (with XHTML being a type of XML). Each follows the rules of its respective markup language. The above example shows HTML parsing and serialization. The specific algorithms for HTML parsing and serializing are defined in the [HTML5] specification. This specification contains the algorithm for XML serializing. The grammar for XML parsing is described in the [XML10] specification.

Round-tripping a DOM means to serialize and then immediately parse the serialized string back into a DOM. Ideally, this process does not result in any data loss with respect to the identity and attributes of the Node in the DOM. Round-tripping is especially tricky for an XML serialization, which must be concerned with preserving the Node's namespace identity in the serialization (wereas namespaces are ignored in HTML).

Consider the XML serialization of the following in-memory DOM:
Element (nodeName: "root")
┃
┗━ HTMLScriptElement (nodeName: "script")
   ┃
   ┗━ Text (data: "alert('hello world')")
An XML serialization must include the HTMLScriptElement Node's namespace in order to preserve the identity of the script element, and to allow the serialized string to round-trip through an XML parser. Assuming that root is in a variable named root:
var xmlSerialization = new XMLSerializer().serializeToString(root);
// xmlSerialization has the value:
// "<root><script xmlns="http://www.w3.org/1999/xhtml">alert('hello world')</script></root>"

The term context object means the object on which the API being discussed was called.

The following terms are understood to represent their respective namespaces in this specification (and makes it easier to read):

2. APIs for parsing and serializing DOM

2.1 The DOMParser interface

The definition of DOMParser has moved to the HTML Standard.

2.2 The XMLSerializer interface

WebIDL[Exposed=Window]
interface XMLSerializer {
  constructor();
  DOMString serializeToString(Node root);
};
xmlserializer = new XMLSerializer ()
Constructs a new XMLSerializer object.
string = xmlserializer . serializeToString ( root )
Serializes root into a string using an XML serialization. Throws a TypeError exception if root is not a Node or an Attr object.

The XMLSerializer() constructor must return a new XMLSerializer object.

The serializeToString(root) method must produce an XML serialization of root passing a value of false for the require well-formed parameter, and return the result.

2.3 The InnerHTML mixin

WebIDLinterface mixin InnerHTML {
  [CEReactions] attribute [LegacyNullToEmptyString] DOMString innerHTML;
};

Element includes InnerHTML;
ShadowRoot includes InnerHTML;

The innerHTML IDL attribute represents the markup of the element's contents.

element . innerHTML [ = value ]
Returns a fragment of HTML or XML that represents the element's contents.

Can be set, to replace the contents of the element with nodes parsed from the given string.

In the case of an XML document, throws a "InvalidStateError" DOMException if the element cannot be serialized to XML, or a "SyntaxError" DOMException if the given string is not well-formed.

On getting, return the result of invoking the fragment serializing algorithm on the context object providing true for the require well-formed flag (this might throw an exception instead of returning a string).

On setting, these steps must be run:

  1. Let context element be the context object's host if the context object is a ShadowRoot object, or the context object otherwise.
  2. Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and with context element.
  3. If the context object is a template element, then let context object be the template's template contents (a DocumentFragment).
    Note

    Setting innerHTML on a template element will replace all the nodes in its template contents (template.content) rather than its children.

  4. Replace all with fragment within the context object.

2.4 Extensions to the Element interface

WebIDLpartial interface Element {
  [CEReactions] attribute [LegacyNullToEmptyString] DOMString outerHTML;
  [CEReactions] undefined insertAdjacentHTML(DOMString position, DOMString text);
};

The outerHTML IDL attribute represents the markup of the Element and its contents.

element . outerHTML [ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.

Can be set, to replace the element with nodes parsed from the given string.

In the case of an XML document, throws a "InvalidStateError" DOMException if the element cannot be serialized to XML, or a "SyntaxError" DOMException if the given string is not well-formed.

Throws a "NoModificationAllowedError" DOMException if the parent of the element is a Document.

On getting, return the result of invoking the fragment serializing algorithm on a fictional node whose only child is the context object providing true for the require well-formed flag (this might throw an exception instead of returning a string).

On setting, the following steps must be run:

  1. Let parent be the context object's parent.
  2. If parent is null, terminate these steps. There would be no way to obtain a reference to the nodes created even if the remaining steps were run.
  3. If parent is a Document, throw a "NoModificationAllowedError" DOMException.
  4. If parent is a DocumentFragment, let parent be a new Element with:
  5. Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and parent as the context element.
  6. Replace the context object with fragment within the context object's parent.
element . insertAdjacentHTML ( position, text )
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
"beforebegin"
Before the element itself (i.e., after element's previous sibling)
"afterbegin"
Just inside the element, before its first child.
"beforeend"
Just inside the element, after its last child.
"afterend"
After the element itself (i.e., before element's next sibling)

Throws a "SyntaxError" DOMException if the arguments have invalid values (e.g., in the case of an XML document, if the given string is not well-formed).

Throws a "NoModificationAllowedError" DOMException if the given position isn't possible (e.g. inserting elements after the root element of a Document).

The insertAdjacentHTML(position, text) method must run these steps:

  1. Use the first matching item from this list:
    If position is an ASCII case-insensitive match for the string "beforebegin"
    If position is an ASCII case-insensitive match for the string "afterend"
    Let context be the context object's parent.

    If context is null or a Document, throw a "NoModificationAllowedError" DOMException.

    If position is an ASCII case-insensitive match for the string "afterbegin"
    If position is an ASCII case-insensitive match for the string "beforeend"
    Let context be the context object.
    Otherwise
    Throw a "SyntaxError" DOMException.
  2. If context is not an Element or the following are all true:

    let context be a new Element with

  3. Let fragment be the result of invoking the fragment parsing algorithm with text as markup, and context as the context element.
  4. Use the first matching item from this list:
    If position is an ASCII case-insensitive match for the string "beforebegin"
    Insert fragment into the context object's parent before the context object.
    If position is an ASCII case-insensitive match for the string "afterbegin"
    Insert fragment into the context object before its first child.
    If position is an ASCII case-insensitive match for the string "beforeend"
    Append fragment to the context object.
    If position is an ASCII case-insensitive match for the string "afterend"
    Insert fragment into the context object's parent before the context object's next sibling.
Note

No special handling for template elements is included in the above "afterbegin" and "beforeend" cases. As with other direct Node-manipulation APIs (and unlike innerHTML), insertAdjacentHTML does not include any special handling for template elements. In most cases you will wish to use template.content.insertAdjacentHTML instead of directly manipulating the child nodes of a template element.

2.5 Extensions to the Range interface

WebIDLpartial interface Range {
  [CEReactions, NewObject] DocumentFragment createContextualFragment(DOMString fragment);
};
docFragment = range . createContextualFragment ( fragment )
Returns a DocumentFragment, created from the markup string fragment using range's start node as the context in which fragment is parsed.

The createContextualFragment(fragment) method must run these steps:

  1. Let node be the context object's start node.

    Let element be as follows, depending on node's interface:

    Document
    DocumentFragment
    null
    Element
    node
    Text
    Comment
    node's parent element
    DocumentType
    ProcessingInstruction
    [DOM4] prevents this case.
  2. If either element is null or the following are all true:

    let element be a new Element with

  3. Let fragment node be the result of invoking the fragment parsing algorithm with fragment as markup, and element as the context element.
  4. Unmark all scripts in fragment node as "already started" and as "parser-inserted".
  5. Return the value of fragment node.

3. Algorithms for parsing and serializing

3.1 Parsing

The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element:

  1. If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.

    If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.

  2. Let new children be the result of invoking algorithm with markup as the input, and context element as the context element.
  3. Let fragment be a new DocumentFragment whose node document is context element's node document.
  4. Append each Node in new children to fragment (in tree order).
    Note

    This ensures the node document for the new nodes is correct.

  5. Return the value of fragment.

3.2 Serializing

The following steps form the fragment serializing algorithm, whose arguments are a Node node and a flag require well-formed:

  1. Let context document be the value of node's node document.
  2. If context document is an HTML document, return an HTML serialization of node.
  3. Otherwise, context document is an XML document; return an XML serialization of node passing the flag require well-formed.
    Note

    The XML serialization defined in this document conforms to the requirements of the XML fragment serialization algorithm defined in [HTML5].

To produce an HTML serialization of a Node node, the user agent must run the HTML fragment serialization algorithm on node and return the string produced.

3.2.1 XML Serialization

An XML serialization differs from an HTML serialization in the following ways:

  • Elements and attributes will always be serialized such that their namespaceURI is preserved. In some cases this means that an existing prefix, prefix declaration attribute or default namespace declaration attribute might be dropped, substituted or changed. An HTML serialization does not attempt to preserve the namespaceURI.
  • Elements not in the HTML namespace containing no children, are serialized using the empty-element tag syntax (i.e., according to the XML EmptyElemTag production).

Otherwise, the algorithm for producing an XML serialization is designed to produce a serialization that is compatible with the HTML parser. For example, elements in the HTML namespace that contain no child nodes are serialized with an explicit begin and end tag rather than using the empty-element tag syntax.

Note

Per [DOM4], Attr objects do not inherit from Node, and thus cannot be serialized by the XML serialization algorithm. An attempt to serialize an Attr object will result in an empty string.

To produce an XML serialization of a Node node given a flag require well-formed, run the following steps:

  1. Let namespace be a context namespace with value null. The context namespace tracks the XML serialization algorithm's current default namespace. The context namespace is changed when either an Element Node has a default namespace declaration, or the algorithm generates a default namespace declaration for the Element Node to match its own namespace. The algorithm assumes no namespace (null) to start.
  2. Let prefix map be a new namespace prefix map.
  3. Add the XML namespace with prefix value "xml" to prefix map.
  4. Let prefix index be a generated namespace prefix index with value 1. The generated namespace prefix index is used to generate a new unique prefix value when no suitable existing namespace prefix is available to serialize a node's namespaceURI (or the namespaceURI of one of node's attributes). See the generate a prefix algorithm.
  5. Return the result of running the XML serialization algorithm on node passing the context namespace namespace, namespace prefix map prefix map, generated namespace prefix index reference to prefix index, and the flag require well-formed. If an exception occurs during the execution of the algorithm, then catch that exception and throw an "InvalidStateError" DOMException.

Each of the following algorithms for producing an XML serialization of a DOM node take as input a node to serialize and the following arguments:

The XML serialization algorithm produces an XML serialization of an arbitrary DOM node node based on the node's interface type. Each referenced algorithm is to be passed the arguments as they were recieved by the caller and return their result to the caller. Re-throw any exceptions. If node's interface is:

Element
Run the algorithm for XML serializing an Element node node.
Document
Run the algorithm for XML serializing a Document node node.
Comment
Run the algorithm for XML serializing a Comment node node.
Text
Run the algorithm for XML serializing a Text node node.
DocumentFragment
Run the algorithm for XML serializing a DocumentFragment node node.
DocumentType
Run the algorithm for XML serializing a DocumentType node node.
ProcessingInstruction
Run the algorithm for XML serializing a ProcessingInstruction node node.
An Attr object
Return an empty string.
Anything else
Throw a TypeError. Only Nodes and Attr objects can be serialized by this algorithm.
Each of the above referenced algorithms are detailed in the sections that follow.
3.2.1.1 XML serializing an Element node
The algorithm for producing an XML serialization of a DOM node of type Element is as follows:
  1. If the require well-formed flag is set (its value is true), and this node's localName attribute contains the character ":" (U+003A COLON) or does not match the XML Name production, then throw an exception; the serialization of this node would not be a well-formed element.
  2. Let markup be the string "<" (U+003C LESS-THAN SIGN).
  3. Let qualified name be an empty string.
  4. Let skip end tag be a boolean flag with value false.
  5. Let ignore namespace definition attribute be a boolean flag with value false.
  6. Given prefix map, copy a namespace prefix map and let map be the result.
  7. Let local prefixes map be an empty map. The map has unique Node prefix strings as its keys, with corresponding namespaceURI Node values as the map's key values (in this map, the null namespace is represented by the empty string).
    Note

    This map is local to each element. It is used to ensure there are no conflicting prefixes should a new namespace prefix attribute need to be generated. It is also used to enable skipping of duplicate prefix definitions when writing an element's attributes: the map allows the algorithm to distinguish between a prefix in the namespace prefix map that might be locally-defined (to the current Element) and one that is not.

  8. Let local default namespace be the result of recording the namespace information for node given map and local prefixes map.
    Note

    The above step will update map with any found namespace prefix definitions, add the found prefix definitions to the local prefixes map and return a local default namespace value defined by a default namespace attribute if one exists. Otherwise it returns null.

  9. Let inherited ns be a copy of namespace.
  10. Let ns be the value of node's namespaceURI attribute.
  11. If inherited ns is equal to ns, then:
    1. If local default namespace is not null, then set ignore namespace definition attribute to true.
    2. If ns is the XML namespace, then append to qualified name the concatenation of the string "xml:" and the value of node's localName.
    3. Otherwise, append to qualified name the value of node's localName. The node's prefix if it exists, is dropped.
    4. Append the value of qualified name to markup.
  12. Otherwise, inherited ns is not equal to ns (the node's own namespace is different from the context namespace of its parent). Run these sub-steps:
    1. Let prefix be the value of node's prefix attribute.
    2. Let candidate prefix be the result of retrieving a preferred prefix string prefix from map given namespace ns.
      Note

      The above may return null if no namespace key ns exists in map.

    3. If the value of prefix matches "xmlns", then run the following steps:
      1. If the require well-formed flag is set, then throw an error. An Element with prefix "xmlns" will not legally round-trip in a conforming XML parser.
      2. Let candidate prefix be the value of prefix.
    4. Found a suitable namespace prefix: if candidate prefix is not null (a namespace prefix is defined which maps to ns), then:
      Note

      The following may serialize a different prefix than the Element's existing prefix if it already had one. However, the retrieving a preferred prefix string algorithm already tried to match the existing prefix if possible.

      1. Append to qualified name the concatenation of candidate prefix, ":" (U+003A COLON), and node's localName. There exists on this node or the node's ancestry a namespace prefix definition that defines the node's namespace.
      2. If the local default namespace is not null (there exists a locally-defined default namespace declaration attribute) and its value is not the XML namespace, then let inherited ns get the value of local default namespace unless the local default namespace is the empty string in which case let it get null (the context namespace is changed to the declared default, rather than this node's own namespace).
        Note

        Any default namespace definitions or namespace prefixes that define the XML namespace are omitted when serializing this node's attributes.

      3. Append the value of qualified name to markup.
    5. Otherwise, if prefix is not null, then:
      Note

      By this step, there is no namespace or prefix mapping declaration in this node (or any parent node visited by this algorithm) that defines prefix otherwise the step labelled Found a suitable namespace prefix would have been followed. The sub-steps that follow will create a new namespace prefix declaration for prefix and ensure that prefix does not conflict with an existing namespace prefix declaration of the same localName in node's attribute list.

      1. If the local prefixes map contains a key matching prefix, then let prefix be the result of generating a prefix providing as input map, ns, and prefix index.
      2. Add prefix to map given namespace ns.
      3. Append to qualified name the concatenation of prefix, ":" (U+003A COLON), and node's localName.
      4. Append the value of qualified name to markup.
      5. Append the following to markup, in the order listed:
        Note

        The following serializes a namespace prefix declaration for prefix which was just added to the map.

        1. " " (U+0020 SPACE);
        2. The string "xmlns:";
        3. The value of prefix;
        4. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
        5. The result of serializing an attribute value given ns and the require well-formed flag as input;
        6. """ (U+0022 QUOTATION MARK).
        7. If local default namespace is not null (there exists a locally-defined default namespace declaration attribute), then let inherited ns get the value of local default namespace unless the local default namespace is the empty string in which case let it get null.
    6. Otherwise, if local default namespace is null, or local default namespace is not null and its value is not equal to ns, then:
      Note

      At this point, the namespace for this node still needs to be serialized, but there's no prefix (or candidate prefix) availble; the following uses the default namespace declaration to define the namespace--optionally replacing an existing default declaration if present.

      1. Set the ignore namespace definition attribute flag to true.
      2. Append to qualified name the value of node's localName.
      3. Let the value of inherited ns be ns.
        Note

        The new default namespace will be used in the serialization to define this node's namespace and act as the context namespace for its children.

      4. Append the value of qualified name to markup.
      5. Append the following to markup, in the order listed:
        Note

        The following serializes the new (or replacement) default namespace definition.

        1. " " (U+0020 SPACE);
        2. The string "xmlns";
        3. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
        4. The result of serializing an attribute value given ns and the require well-formed flag as input;
        5. """ (U+0022 QUOTATION MARK).
    7. Otherwise, the node has a local default namespace that matches ns. Append to qualified name the value of node's localName, let the value of inherited ns be ns, and append the value of qualified name to markup.
      Note

      All of the combinations where ns is not equal to inherited ns are handled above such that node will be serialized preserving its original namespaceURI.

  13. Append to markup the result of the XML serialization of node's attributes given map, prefix index, local prefixes map, ignore namespace definition attribute flag, and require well-formed flag.
  14. If ns is the HTML namespace, and the node's list of children is empty, and the node's localName matches any one of the following void elements: "area", "base", "basefont", "bgsound", "br", "col", "embed", "frame", "hr", "img", "input", "keygen", "link", "menuitem", "meta", "param", "source", "track", "wbr"; then append the following to markup, in the order listed:
    1. " " (U+0020 SPACE);
    2. "/" (U+002F SOLIDUS).
    and set the skip end tag flag to true.
  15. If ns is not the HTML namespace, and the node's list of children is empty, then append "/" (U+002F SOLIDUS) to markup and set the skip end tag flag to true.
  16. Append ">" (U+003E GREATER-THAN SIGN) to markup.
  17. If the value of skip end tag is true, then return the value of markup and skip the remaining steps. The node is a leaf-node.
  18. If ns is the HTML namespace, and the node's localName matches the string "template", then this is a template element. Append to markup the result of XML serializing a DocumentFragment node given the template element's template contents (a DocumentFragment), providing inherited ns, map, prefix index, and the require well-formed flag.
    Note

    This allows template content to round-trip , given the rules for parsing XHTML documents.

  19. Otherwise, append to markup the result of running the XML serialization algorithm on each of node's children, in tree order, providing inherited ns, map, prefix index, and the require well-formed flag.
  20. Append the following to markup, in the order listed:
    1. "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS);
    2. The value of qualified name;
    3. ">" (U+003E GREATER-THAN SIGN).
  21. Return the value of markup.
3.2.1.1.1 Recording the namespace

This following algorithm will update the namespace prefix map with any found namespace prefix definitions, add the found prefix definitions to the local prefixes map, and return a local default namespace value defined by a default namespace attribute if one exists. Otherwise it returns null.

When recording the namespace information for an Element element, given a namespace prefix map map and a local prefixes map (initially empty), the user agent must run the following steps:

  1. Let default namespace attr value be null.
  2. Main: For each attribute attr in element's attributes, in the order they are specified in the element's attribute list:
    Note

    The following conditional steps find namespace prefixes. Only attributes in the XMLNS namespace are considered (e.g., attributes made to look like namespace declarations via setAttribute("xmlns:pretend-prefix", "pretend-namespace") are not included).

    1. Let attribute namespace be the value of attr's namespaceURI value.
    2. Let attribute prefix be the value of attr's prefix.
    3. If the attribute namespace is the XMLNS namespace, then:
      1. If attribute prefix is null, then attr is a default namespace declaration. Set the default namespace attr value to attr's value and stop running these steps, returning to Main to visit the next attribute.
      2. Otherwise, the attribute prefix is not null and attr is a namespace prefix definition. Run the following steps:
        1. Let prefix definition be the value of attr's localName.
        2. Let namespace definition be the value of attr's value.
        3. If namespace definition is the XML namespace, then stop running these steps, and return to Main to visit the next attribute.
          Note

          XML namespace definitions in prefixes are completely ignored (in order to avoid unnecessary work when there might be prefix conflicts). XML namespaced elements are always handled uniformly by prefixing (and overriding if necessary) the element's localname with the reserved "xml" prefix.

        4. If namespace definition is the empty string (the declarative form of having no namespace), then let namespace definition be null instead.
        5. If prefix definition is found in map given the namespace namespace definition, then stop running these steps, and return to Main to visit the next attribute.
          Note

          This step avoids adding duplicate prefix definitions for the same namespace in the map. This has the side-effect of avoiding later serialization of duplicate namespace prefix declarations in any descendant nodes.

        6. Add the prefix prefix definition to map given namespace namespace definition.
        7. Add the value of prefix definition as a new key to the local prefixes map, with the namespace definition as the key's value replacing the value of null with the empty string if applicable.
  3. Return the value of default namespace attr value.
    Note

    The empty string is a legitimate return value and is not converted to null.

3.2.1.1.2 The Namespace Prefix Map

A namespace prefix map is a map that associates namespaceURI and namespace prefix lists, where namespaceURI values are the map's unique keys (which can include the null value representing no namespace), and ordered lists of associated prefix values are the map's key values. The namespace prefix map will be populated by previously seen namespaceURIs and all their previously encountered prefix associations for a given node and its ancestors.

Note

Note: the last seen prefix for a given namespaceURI is at the end of its respective list. The list is searched to find potentially matching prefixes, and if no matches are found for the given namespaceURI, then the last prefix in the list is used. See copy a namespace prefix map and retrieve a preferred prefix string for additional details.

To copy a namespace prefix map map means to copy the map's keys into a new empty namespace prefix map, and to copy each of the values in the namespace prefix list associated with each keys' value into a new list which should be associated with the respective key in the new map.

To retrieve a preferred prefix string preferred prefix from the namespace prefix map map given a namespace ns, the user agent should:

  1. Let candidates list be the result of retrieving a list from map where there exists a key in map that matches the value of ns or if there is no such key, then stop running these steps, and return the null value.
  2. Otherwise, for each prefix value prefix in candidates list, iterating from beginning to end:
    Note

    There will always be at least one prefix value in the list.

    1. If prefix matches preferred prefix, then stop running these steps and return prefix.
    2. If prefix is the last item in the candidates list, then stop running these steps and return prefix.

To check if a prefix string prefix is found in a namespace prefix map map given a namespace ns, the user agent should:

  1. Let candidates list be the result of retrieving a list from map where there exists a key in map that matches the value of ns or if there is no such key, then stop running these steps, and return false.
  2. If the value of prefix occurs at least once in candidates list, return true, otherwise return false.

To add a prefix string prefix to the namespace prefix map map given a namespace ns, the user agent should:

  1. Let candidates list be the result of retrieving a list from map where there exists a key in map that matches the value of ns or if there is no such key, then let candidates list be null.
  2. If candidates list is null, then create a new list with prefix as the only item in the list, and associate that list with a new key ns in map.
  3. Otherwise, append prefix to the end of candidates list.
    Note

    The steps in retrieve a preferred prefix string use the list to track the most recently used (MRU) prefix associated with a given namespace, which will be the prefix at the end of the list. This list may contain duplicates of the same prefix value seen earlier (and that's OK).

3.2.1.1.3 Serializing an Element's attributes

The XML serialization of the attributes of an Element element together with a namespace prefix map map, a generated namespace prefix index prefix index reference, a local prefixes map, a ignore namespace definition attribute flag, and a require well-formed flag, is the result of the following algorithm:

  1. Let result be the empty string.
  2. Let localname set be a new empty namespace localname set. This localname set will contain tuples of unique attribute namespaceURI and localName pairs, and is populated as each attr is processed. This set is used to [optionally] enforce the well-formed constraint that an element cannot have two attributes with the same namespaceURI and localName. This can occur when two otherwise identical attributes on the same element differ only by their prefix values.
  3. Loop: For each attribute attr in element's attributes, in the order they are specified in the element's attribute list:
    1. If the require well-formed flag is set (its value is true), and the localname set contains a tuple whose values match those of a new tuple consisting of attr's namespaceURI attribute and localName attribute, then throw an exception; the serialization of this attr would fail to produce a well-formed element serialization.
    2. Create a new tuple consisting of attr's namespaceURI attribute and localName attribute, and add it to the localname set.
    3. Let attribute namespace be the value of attr's namespaceURI value.
    4. Let candidate prefix be null.
    5. If attribute namespace is not null, then run these sub-steps:
      1. Let candidate prefix be the result of retrieving a preferred prefix string from map given namespace attribute namespace with preferred prefix being attr's prefix value.
      2. If the value of attribute namespace is the XMLNS namespace, then run these steps:
        1. If any of the following are true, then stop running these steps and goto Loop to visit the next attribute:
        2. If the require well-formed flag is set (its value is true), and the value of attr's value attribute matches the XMLNS namespace, then throw an exception; the serialization of this attribute would produce invalid XML because the XMLNS namespace is reserved and cannot be applied as an element's namespace via XML parsing.
          Note

          DOM APIs do allow creation of elements in the XMLNS namespace but with strict qualifications.

        3. If the require well-formed flag is set (its value is true), and the value of attr's value attribute is the empty string, then throw an exception; namespace prefix declarations cannot be used to undeclare a namespace (use a default namespace declaration instead).
        4. the attr's prefix matches the string "xmlns", then let candidate prefix be the string "xmlns".
      3. Otherwise, the attribute namespace in not the XMLNS namespace. Run these steps:
        1. Let candidate prefix be the result of generating a prefix providing map, attribute namespace, and prefix index as input.
        2. Append the following to result, in the order listed:
          1. " " (U+0020 SPACE);
          2. The string "xmlns:";
          3. The value of candidate prefix;
          4. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
          5. The result of serializing an attribute value given attribute namespace and the require well-formed flag as input;
          6. """ (U+0022 QUOTATION MARK).
    6. Append a " " (U+0020 SPACE) to result.
    7. If candidate prefix is not null, then append to result the concatenation of candidate prefix with ":" (U+003A COLON).
    8. If the require well-formed flag is set (its value is true), and this attr's localName attribute contains the character ":" (U+003A COLON) or does not match the XML Name production or equals "xmlns" and attribute namespace is null, then throw an exception; the serialization of this attr would not be a well-formed attribute.
    9. Append the following strings to result, in the order listed:
      1. The value of attr's localName;
      2. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
      3. The result of serializing an attribute value given attr's value attribute and the require well-formed flag as input;
      4. """ (U+0022 QUOTATION MARK).
  4. Return the value of result.

When serializing an attribute value given an attribute value and require well-formed flag, the user agent must run the following steps:

  1. If the require well-formed flag is set (its value is true), and attribute value contains characters that are not matched by the XML Char production, then throw an exception; the serialization of this attribute value would fail to produce a well-formed element serialization.
  2. If attribute value is null, then return the empty string.
  3. Otherwise, attribute value is a string. Return the value of attribute value, first replacing any occurrences of the following:
    1. "&" with "&amp;"
    2. """ with "&quot;"
    3. "<" with "&lt;"
    4. ">" with "&gt;"
    Note

    This matches behavior present in browsers, and goes above and beyond the grammar requirement in the XML specification's AttValue production by also replacing ">" characters.

3.2.1.1.4 Generating namespace prefixes

To generate a prefix given a namespace prefix map map, a string new namespace, and a reference to a generated namespace prefix index prefix index, the user agent must run the following steps:

  1. Let generated prefix be the concatenation of the string "ns" and the current numerical value of prefix index.
  2. Let the value of prefix index be incremented by one.
  3. Add to map the generated prefix given the new namespace namespace.
  4. Return the value of generated prefix.
3.2.1.2 XML serializing a Document node
The algorithm for producing an XML serialization of a DOM node of type Document is as follows:

If the require well-formed flag is set (its value is true), and this node has no documentElement (the documentElement attribute's value is null), then throw an exception; the serialization of this node would not be a well-formed document.

Otherwise, run the following steps:

  1. Let serialized document be an empty string.
  2. For each child child of node, in tree order, run the XML serialization algorithm on the child passing along the provided arguments, and append the result to serialized document.
    Note

    This will serialize any number of ProcessingInstruction and Comment nodes both before and after the Document's documentElement node, including at most one DocumentType node. (Text nodes are not allowed as children of the Document.)

  3. Return the value of serialized document.
3.2.1.3 XML serializing a Comment node
The algorithm for producing an XML serialization of a DOM node of type Comment is as follows:

If the require well-formed flag is set (its value is true), and node's data contains characters that are not matched by the XML Char production or contains "--" (two adjacent U+002D HYPHEN-MINUS characters) or that ends with a "-" (U+002D HYPHEN-MINUS) character, then throw an exception; the serialization of this node's data would not be well-formed.

Otherwise, return the concatenation of "<!--", node's data, and "-->".

3.2.1.4 XML serializing a Text node
The algorithm for producing an XML serialization of a DOM node of type Text is as follows:
  1. If the require well-formed flag is set (its value is true), and node's data contains characters that are not matched by the XML Char production, then throw an exception; the serialization of this node's data would not be well-formed.
  2. Let markup be the value of node's data.
  3. Replace any occurrences of "&" in markup by "&amp;".
  4. Replace any occurrences of "<" in markup by "&lt;".
  5. Replace any occurrences of ">" in markup by "&gt;".
  6. Return the value of markup.
3.2.1.5 XML serializing a DocumentFragment node
The algorithm for producing an XML serialization of a DOM node of type DocumentFragment is as follows:
  1. Let markup the empty string.
  2. For each child child of node, in tree order, run the XML serialization algorithm on the child given namespace, prefix map, a reference to prefix index, and flag require well-formed. Concatenate the result to markup.
  3. Return the value of markup.
3.2.1.6 XML serializing a DocumentType node
The algorithm for producing an XML serialization of a DOM node of type DocumentType is as follows:
  1. If the require well-formed flag is true and the node's publicId attribute contains characters that are not matched by the XML PubidChar production, then throw an exception; the serialization of this node would not be a well-formed document type declaration.
  2. If the require well-formed flag is true and the node's systemId attribute contains characters that are not matched by the XML Char production or that contains both a """ (U+0022 QUOTATION MARK) and a "'" (U+0027 APOSTROPHE), then throw an exception; the serialization of this node would not be a well-formed document type declaration.
  3. Let markup be an empty string.
  4. Append the string "<!DOCTYPE" to markup.
  5. Append " " (U+0020 SPACE) to markup.
  6. Append the value of the node's name attribute to markup. For a node belonging to an HTML document, the value will be all lowercase.
  7. If the node's publicId is not the empty string then append the following, in the order listed, to markup:
    1. " " (U+0020 SPACE);
    2. The string "PUBLIC";
    3. " " (U+0020 SPACE);
    4. """ (U+0022 QUOTATION MARK);
    5. The value of the node's publicId attribute;
    6. """ (U+0022 QUOTATION MARK).
  8. If the node's systemId is not the empty string and the node's publicId is set to the empty string, then append the following, in the order listed, to markup:
    1. " " (U+0020 SPACE);
    2. The string "SYSTEM".
  9. If the node's systemId is not the empty string then append the following, in the order listed, to markup:
    1. " " (U+0020 SPACE);
    2. """ (U+0022 QUOTATION MARK);
    3. The value of the node's systemId attribute;
    4. """ (U+0022 QUOTATION MARK).
  10. Append ">" (U+003E GREATER-THAN SIGN) to markup.
  11. Return the value of markup.
3.2.1.7 XML serializing a ProcessingInstruction node
The algorithm for producing an XML serialization of a DOM node of type ProcessingInstruction is as follows:
  1. If the require well-formed flag is set (its value is true), and node's target contains a ":" (U+003A COLON) character or is an ASCII case-insensitive match for the string "xml", then throw an exception; the serialization of this node's target would not be well-formed.
  2. If the require well-formed flag is set (its value is true), and node's data contains characters that are not matched by the XML Char production or contains the string "?>" (U+003F QUESTION MARK, U+003E GREATER-THAN SIGN), then throw an exception; the serialization of this node's data would not be well-formed.
  3. Let markup be the concatenation of the following, in the order listed:
    1. "<?" (U+003C LESS-THAN SIGN, U+003F QUESTION MARK);
    2. The value of node's target;
    3. " " (U+0020 SPACE);
    4. The value of node's data;
    5. "?>" (U+003F QUESTION MARK, U+003E GREATER-THAN SIGN).
  4. Return the value of markup.

A. Dependencies

The HTML specification [HTML5] defines the following terms used in this document: The DOM specification [DOM4] defines the following terms used in this document: The following terms used in this document are defined by [DOM]: The following terms used in this document are defined by [XML10]: The ECMAScript [ECMA-262] (commonly "JavaScript") specification defines these terms: The Web IDL [WebIDL] specification defines:

B. Revision History

The following is an informative summary of the changes since the last publication of this specification. A complete revision history of the Editor's Drafts of this specification can be found at the W3C Github Repository and older revisions at the W3C Mercurial server.

C. Acknowledgements

We acknowledge with gratitude the original work of Ms2ger and others at the WHATWG, who created and maintained the original DOM Parsing and Serialization Living Standard upon which this specification is based.

Thanks to C. Scott Ananian, Victor Costan, Aryeh Gregor, Anne van Kesteren, Arkadiusz Michalski, Simon Pieters, Henri Sivonen, Josh Soref and Boris Zbarsky, for their useful comments.

Special thanks to Ian Hickson for first defining the innerHTML and outerHTML attributes, and the insertAdjacentHTML method in [HTML5] and his useful comments.

D. References

D.1 Normative references

[DOM4]
DOM Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMA-262]
ECMAScript Language Specification. Ecma International. URL: https://tc39.es/ecma262/
[HTML]
HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[HTML5]
HTML5. Ian Hickson; Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Theresa O'Connor; Silvia Pfeiffer. W3C. 27 March 2018. W3C Recommendation. URL: https://www.w3.org/TR/html5/
[WEBIDL]
Web IDL. Boris Zbarsky. W3C. 15 December 2016. W3C Editor's Draft. URL: https://heycam.github.io/webidl/
[XML10]
Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau et al. W3C. 26 November 2008. W3C Recommendation. URL: https://www.w3.org/TR/xml/