Abstract

This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and serializing DOM nodes.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This specification is based on the original work of the DOM Parsing and Serialization Living Specification, though it has diverged in terms of supported features, normative requirements, and algorithm specificity. As appropriate, relevant fixes from the living standard are incorporated into this document.

This document was published by the Web Applications Working Group as a Last Call Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to www-dom@w3.org (subscribe, archives) with DOM-Parsing at the start of your email's subject. The Last Call comment period ends 07 January 2014. All comments are welcome.

Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

Issues

Issue 1

Open issues that appear throughout the remainder of this document will be highlighted like this.

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.

Unless otherwise stated, string comparisons are done in a case-sensitive manner.

If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.

1.1 Dependencies

The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]

Some of the terms used in this specification are defined in [DOM4], [HTML5], and [XML10].

1.2 Extensibility

Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.

If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.

When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.

2. Terminology

The term context object means the object on which the method or attribute being discussed was called.

3. Parsing and serializing Nodes

3.1 Parsing

The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element.

  1. If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.

    If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.

  2. Invoke algorithm with markup as the input, and context element as the context element.
  3. Let new children be the nodes returned.
  4. Let fragment be a new DocumentFragment whose node document is context element's node document.
  5. Append each node in new children to fragment (in order).
    Note

    This ensures the node document for the new nodes is correct.

  6. Return fragment.

3.2 Serializing

To serialize a Node node, the user agent must run the following steps:

  1. Let document be node's node document.
  2. If document is an HTML document, return an HTML serialization of node.
  3. Otherwise, document is an XML document.
  4. Let context namespace be null.
  5. Let prefix list be an empty list. The prefix list will contain strings that represent a history of namespace prefixes [XML-NAMES] that have been serialized by the XML serialization algorithm for a subtree.
  6. Return an XML serialization of node providing to the algorithm context namespace as the namespace and prefix list as prefixes.

To produce an HTML serialization of a Node node, the user agent must run the HTML fragment serialization algorithm [HTML5] on node and return the string produced.

To produce an XML serialization of a Node node given a context namespace namespace and prefix list prefixes, the user agent must run the appropriate steps, depending on node's interface:

Note

The following steps for serializing a node belonging to an XML document are designed to produce a serialization that is compatible with the HTML parser. For example, elements in the XHTML namespace that contain no child nodes are serialized with an explicit begin and end tag rather than using the XML self-closing syntax. Exceptions to this rule occur when an XHTML element's equivalent HTML element is a void element that would be auto-closed by the HTML parser.

Element

Run the following algorithm:

  1. Let markup be an empty string.
  2. Let list be a copy of the prefixes array.
  3. Let prefix be the value of node's prefix attribute.
  4. Let ns be the value of node's namespaceURI attribute.
  5. Let a skip end tag flag have the value false.
  6. Append "<" (U+003C LESS-THAN SIGN) to markup.
  7. If prefix is not null then append the following to markup:
    1. The value of prefix;
    2. ":" (U+003A COLON).
  8. Append the value of node's localName attribute to markup.
  9. If namespace is not equal to ns (the node's own namespace is different from its parent), and prefix is not null, then run these sub-steps:
    Note

    These steps determine whether a namespace prefix is serialized for this node.

    1. If list contains the value of prefix, then abort these sub-steps. This namespace prefix was already serialized.
    2. Add the value of prefix to list.
    3. If node has an attribute whose name attribute value is equal to the concatenation of the string "xmlns:" with the value of prefix, abort these sub-steps. The prefix namespace definition will be serialized later as part of the XML serialization of node's attributes.
    4. Append the following to markup, in order:
      1. " " (U+0020 SPACE);
      2. The string "xmlns:";
      3. The value of prefix;
      4. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
      5. The value of ns;
      6. """ (U+0022 QUOTATION MARK);
  10. If namespace is not equal to ns, and prefix is null, then run these sub-steps:
    Note

    These steps determine whether a default namespace is serialized for this node.

    1. If node has an attribute whose name attribute value is equal to "xmlns", abort these sub-steps. The default namespace will be serialized later as part of the XML serialization of node's attributes.
    2. Append the following to markup, in order:
      1. " " (U+0020 SPACE);
      2. The string "xmlns";
      3. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
      4. The value of ns;
      5. """ (U+0022 QUOTATION MARK);
  11. Append to markup the result of the XML serialization of node's attributes, passing list as the prefixes.
  12. If the value of ns is the string "http://www.w3.org/1999/xhtml", and the node's list of children is empty, and the node's tagName matches any one of the following void elements: "area", "base", "br", "col", "embed", "hr", "img", "input", "keygen", "link", "menuitem", "meta", "param", "source", "track", "wbr"; then append the following to markup, in order:
    1. " " (U+0020 SPACE);
    2. "/" (U+002F SOLIDUS);
    and set the skip end tag flag to true.
  13. If the value of ns is not the string "http://www.w3.org/1999/xhtml", and the node's list of children is empty, then append "/" (U+002F SOLIDUS) to markup and set the skip end tag flag to true.
  14. Append ">" (U+003E GREATER-THAN SIGN) to markup.
  15. If the value of skip end tag is true, then return the value of markup and skip the remaining steps. The node is a leaf-node.
  16. Append to markup the result of performing an XML serialization of each of node's children, in order, providing the value of ns for the namespace and list for the prefixes.
  17. Append "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) to markup.
  18. If the value of prefix is not null, then append the following to markup, in order:
    1. The value of prefix;
    2. ":" (U+003A COLON).
  19. Append the value of node's localName attribute to markup.
  20. Append ">" (U+003E GREATER-THAN SIGN) to markup.
  21. Return the value of markup.
Document

Return the result of concatenating the following, in order:

  1. The string produced by running the steps to produce a DocumentType serialization of node's doctype attribute;
  2. The string produced by an XML serialization of node's documentElement attribute, providing null as the namespace and an empty list as prefixes.
Comment
  1. Let markup be the concatenation of "<!--", node's data, and "-->".
  2. If markup matches the Comment production, return markup. Otherwise, throw a DOMException with name InvalidStateError.
CDATASection
  1. Let markup be the concatenation of "<![CDATA[", node's data, and "]]>".
  2. Return markup.
Note

CDATASection objects may be created by the historical document.createCDATASection API, or as a result of parsing an XML document.

Text
  1. Let markup be node's data.
  2. Replace any occurrences of "&" in markup by "&amp;".
  3. Replace any occurrences of "<" in markup by "&lt;".
  4. Replace any occurrences of ">" in markup by "&gt;".
  5. Return data.
DocumentFragment
  1. Let markup the empty string.
  2. For each child of node, in order, produce an XML serialization of the child and concatenate the result to markup.
  3. Return markup.
DocumentType
Run the steps to produce a DocumentType serialization of node and return the string this produced.
ProcessingInstruction
  1. Let markup be the concatenation of "<?", node's data, and "?>".
  2. Return markup.
Note

ProcessingInstruction objects may be created by the historical document.createProcessingInstruction API, or as a result of parsing an XML document.

To produce a DocumentType serialization of a Node node, the user agent must return the result of the following algorithm:

  1. Let markup be an empty string.
  2. Append the string "<!DOCTYPE" to markup.
  3. Append " " (U+0020 SPACE) to markup.
  4. Append the value of the node's name attribute to markup. For a node belonging to an HTML document, the value will be all lowercase.
  5. If the node's publicId is not the empty string then append the following, in order, to markup:
    1. " " (U+0020 SPACE);
    2. The string "PUBLIC";
    3. " " (U+0020 SPACE);
    4. """ (U+0022 QUOTATION MARK);
    5. The value of the node's publicId attribute;
    6. """ (U+0022 QUOTATION MARK);
  6. If the node's systemId is not the empty string and the node's publicId is set to the empty string, then append the following, in order, to markup:
    1. " " (U+0020 SPACE);
    2. The string "SYSTEM";
  7. If the node's systemId is not the empty string then append the following, in order, to markup:
    1. " " (U+0020 SPACE);
    2. """ (U+0022 QUOTATION MARK);
    3. The value of the node's systemId attribute;
    4. """ (U+0022 QUOTATION MARK);
  8. Optional: if the node has an (historical) internalSubset and the internalSubset attribute's value is a non-empty string, then append the following, in order, to markup:
    1. " " (U+0020 SPACE);
    2. "[" (U+005B LEFT SQUARE BRACKET);
    3. The value of the node's internalSubset attribute;
    4. "]" (U+005D RIGHT SQUARE BRACKET);
    Note

    A node belonging to an HTML document will never have an internalSubset because any internalSubset markup is ignored by the parser.

  9. Append ">" (U+003E GREATER-THAN SIGN) to markup.

The XML serialization of the attributes of an element element together with a prefix list prefixes is the result of the following algorithm:

  1. Let result be the empty string.
  2. For each attribute attr in element's attributes, in order:
    1. Append the following strings to result:
      1. " " (U+0020 SPACE);
      2. attr's name;
      3. "="" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
      4. attr's value, replacing any occurrences of the following:
        1. """ with "&quot;"
        2. "&" with "&amp;"
      5. """ (U+0022 QUOTATION MARK).
    2. If the first six characters of the value of attr's name attribute case-sensitively match the string "xmlns:", then:
      1. Let prefix definition be the result of trimming "xmlns:" from the beginning of the value of attr's name.
      2. Add the value of prefix definition to prefixes. Since this namespace prefix definition has been serialized, it is unnecessary to serialize it again if subsequently encountered in element's children.
  3. Return result.

4. The DOMParser interface

enum SupportedType {
    "text/html",
    "text/xml",
    "application/xml",
    "application/xhtml+xml",
    "image/svg+xml"
};

The DOMParser() constructor must return a new DOMParser object.

[Constructor]
interface DOMParser {
    Document parseFromString (DOMString str, SupportedType type);
};

4.1 Methods

parseFromString

The parseFromString(str, type) method must run these steps, depending on type:

"text/html"

Parse str with an HTML parser, and return the newly created document.

The scripting flag must be set to "disabled".

Note

meta elements are not taken into account for the encoding used, as a Unicode stream is passed into the parser.

Note

script elements get marked unexecutable and the contents of noscript get parsed as markup.

"text/xml"
"application/xml"
"application/xhtml+xml"
"image/svg+xml"
  1. Parse str with a namespace-enabled XML parser.
  2. If the previous step didn't return an error, return the newly created document and terminate these steps.
  3. Otherwise, throw a DOMException with name SyntaxError.
    Note

    Some UAs do not throw an exception, but rather return a minimal well-formed XML document that describes the error. In these cases, the error document's root element will be named parsererror and its namespace will be set to "http://www.mozilla.org/newlayout/xml/parsererror.xml".

In any case, the returned document's content type must be the type argument. Additionally, the document must have a URL value equal to the URL of the active document, a location value of null.

Note

The returned document's encoding is the default, UTF-8.

ParameterTypeNullableOptionalDescription
strDOMString
typeSupportedType
Return type: Document

5. The XMLSerializer interface

The XMLSerializer() constructor must return a new XMLSerializer object.

[Constructor]
interface XMLSerializer {
    DOMString serializeToString (Node root);
};

5.1 Methods

serializeToString
The serializeToString(root) method must produce an XML serialization of root and return the result.
ParameterTypeNullableOptionalDescription
rootNode
Return type: DOMString

6. Extensions to the Element interface

partial interface Element {
    [TreatNullAs=EmptyString]
                attribute DOMString innerHTML;
    [TreatNullAs=EmptyString]
                attribute DOMString outerHTML;
    void insertAdjacentHTML (DOMString position, DOMString text);
};

6.1 Attributes

innerHTML of type DOMString,

The innerHTML IDL attribute represents the markup of the Element's contents.

element . innerHTML [ = value ]

Returns a fragment of HTML or XML that represents the element's contents.

Can be set, to replace the contents of the element with nodes parsed from the given string.

In the case of an XML document, will throw a DOMException with name InvalidStateError if the Element cannot be serialized to XML, and a DOMException with name SyntaxError if the given string is not well-formed.

On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the context object instead (this might throw an exception instead of returning a string).

On setting, these steps must be run:

  1. Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and the context object as the context element.
  2. Replace all with fragment within the context object.
outerHTML of type DOMString,

The outerHTML IDL attribute represents the markup of the Element and its contents.

element . outerHTML [ = value ]

Returns a fragment of HTML or XML that represents the element and its contents.

Can be set, to replace the element with nodes parsed from the given string.

In the case of an XML document, will throw a DOMException with name InvalidStateError if the element cannot be serialized to XML, and a DOMException with name SyntaxError if the given string is not well-formed.

Throws a DOMException with name NoModificationAllowedError if the parent of the element is the Document node.

On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on a fictional node whose only child is context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on that fictional node instead (this might throw an exception instead of returning a string).

On setting, the following steps must be run:

  1. Let parent be the context object's parent.
  2. If parent is null, terminate these steps. There would be no way to obtain a reference to the nodes created even if the remaining steps were run.
  3. If parent is a Document, throw a DOMException with name NoModificationAllowedError exception and terminate these steps.
  4. If parent is a DocumentFragment, let parent be a new Element with
  5. Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and parent as the context element.
  6. Replace the context object with fragment within the context object's parent.

6.2 Methods

insertAdjacentHTML
element . insertAdjacentHTML(position, text)

Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:

"beforebegin"
Before the element itself.
"afterbegin"
Just inside the element, before its first child.
"beforeend"
Just inside the element, after its last child.
"afterend"
After the element itself.

Throws a SyntaxError exception if the arguments have invalid values (e.g., in the case of an XML document, if the given string is not well-formed).

Throws a DOMException with name NoModificationAllowedError if the given position isn't possible (e.g. inserting elements after the root element of a Document).

The insertAdjacentHTML(position, text) method must run these steps:

  1. Use the first matching item from this list:
    If position is an ASCII case-insensitive match for the string "beforebegin"
    If position is an ASCII case-insensitive match for the string "afterend"

    Let context be the context object's parent.

    If context is null or a document, throw a DOMException with name NoModificationAllowedError and terminate these steps.

    If position is an ASCII case-insensitive match for the string "afterbegin"
    If position is an ASCII case-insensitive match for the string "beforeend"
    Let context be the context object.
    Otherwise

    Throw a SyntaxError exception.

  2. If context is not an Element or the following are all true:

    let context be a new Element with

  3. Let fragment be the result of invoking the fragment parsing algorithm with text as markup, and parent as the context element.
  4. Use the first matching item from this list:
    If position is an ASCII case-insensitive match for the string "beforebegin"
    Insert fragment into the context object's parent before the context object.
    If position is an ASCII case-insensitive match for the string "afterbegin"
    Insert fragment into the context object before its first child.
    If position is an ASCII case-insensitive match for the string "beforeend"
    Append fragment to the context object.
    If position is an ASCII case-insensitive match for the string "afterend"
    Insert fragment into the context object's parent before the context object's next sibling.
ParameterTypeNullableOptionalDescription
positionDOMString
textDOMString
Return type: void

7. Extensions to the Range interface

partial interface Range {
    DocumentFragment createContextualFragment (DOMString fragment);
};

7.1 Methods

createContextualFragment
fragment = range . createContextualFragment(fragment)
Returns a DocumentFragment, created from the markup string given.

The createContextualFragment(fragment) method must run these steps:

  1. Let node the context object's start node.

    Let element be as follows, depending on node's interface:

    Document
    DocumentFragment
    null
    Element
    node
    Text
    Comment
    node's parent element
    DocumentType
    ProcessingInstruction
    [DOM4] prevents this case.
  2. If either element is null or the following are all true:

    let element be a new element with

  3. Let fragment node be the result of invoking the fragment parsing algorithm with fragment as markup, and element as the context element.
  4. Unmark all scripts in fragment node as "already started".
  5. Return fragment node.
ParameterTypeNullableOptionalDescription
fragmentDOMString
Return type: DocumentFragment

A. Acknowledgements

Thanks to Ms2ger [Mozilla] for maintaining the initial drafts of this specification and for its continued improvement in the Living Standard.

Thanks to Anne van Kesteren, Aryeh Gregor, Boris Zbarsky, Henri Sivonen, Simon Pieters and timeless for their useful comments.

Special thanks to Ian Hickson for defining the innerHTML and outerHTML attributes, and the insertAdjacentHTML() method in [HTML5] and his useful comments.

B. References

B.1 Normative references

[HTML5]
Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Edward O'Connor; Silvia Pfeiffer. HTML5. 6 August 2013. W3C Candidate Recommendation. URL: http://www.w3.org/TR/html5/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[WEBIDL]
Cameron McCormack. Web IDL. 19 April 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/WebIDL/

B.2 Informative references

[DOM4]
Anne van Kesteren; Aryeh Gregor; Ms2ger; Alex Russell; Robin Berjon. W3C DOM4. 7 November 2013. W3C Working Draft. URL: http://www.w3.org/TR/dom/
[XML-NAMES]
Tim Bray; Dave Hollander; Andrew Layman; Richard Tobin; Henry Thompson et al. Namespaces in XML 1.0 (Third Edition). 8 December 2009. W3C Recommendation. URL: http://www.w3.org/TR/xml-names/
[XML10]
Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau et al. Extensible Markup Language (XML) 1.0 (Fifth Edition). 26 November 2008. W3C Recommendation. URL: http://www.w3.org/TR/xml/