Make whitespace canonicalization less bad
authorAryeh Gregor <AryehGregor+gitcommit@gmail.com>
Tue, 21 Jun 2011 15:28:14 -0600
changeset 307 006aba32edd1
parent 306 fa09e48f4239
child 308 fb902b8019df
Make whitespace canonicalization less bad
editcommands.html
implementation.js
source.html
tests.js
--- a/editcommands.html	Tue Jun 21 12:30:52 2011 -0600
+++ b/editcommands.html	Tue Jun 21 15:28:14 2011 -0600
@@ -2877,6 +2877,78 @@
   <li>Return <var title="">buffer</var>.
 </ol>
 
+<p>To <dfn id=canonicalize-whitespace>canonicalize whitespace</dfn> at (<var title="">node</var>,
+<var title="">offset</var>):
+
+<p class=XXX>This algorithm fails in all kinds of common cases, like any
+non-text node, or whitespace that spans multiple text nodes.  Needs lots of
+fixing.
+
+<ol>
+  <li>If <var title="">node</var> is not a <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node, or is not
+  <a href=#editable>editable</a>, or its <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-parent title=concept-tree-parent>parent</a>'s <a href=http://www.w3.org/TR/CSS21/cascade.html#computed-value>computed value</a> for "white-space" is
+  "pre" or "pre-wrap", abort these steps.
+
+  <!-- First go to the beginning of the current whitespace run. -->
+  <li>Let <var title="">start offset</var> equal <var title="">offset</var>.
+
+  <li>While <var title="">start offset</var> is positive and the (<var title="">start
+  offset</var> &minus; 1)st <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is a
+  space (0x0020) or non-breaking space (0x00A0), subtract one from <var title="">start
+  offset</var>.
+
+  <!-- Now collapse any consecutive spaces. -->
+  <li>Let <var title="">end offset</var> equal <var title="">start offset</var>.
+
+  <li>While <var title="">end offset</var> is less than <var title="">node</var>'s <a href=http://es5.github.com/#x15.5.5.1>length</a>,
+  and the <var title="">end offset</var>th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is
+  0x0020 or 0x00A0:
+
+  <ol>
+    <li>Let <var title="">length</var> equal zero.
+
+    <li>While <var title="">end offset</var> plus <var title="">length</var> is less than
+    <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-length><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-length>length</a></code>, and the (<var title="">end offset</var> +
+    <var title="">length</var>)th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is 0x0020,
+    add one to <var title="">length</var>.
+
+    <li>If <var title="">length</var> is greater than one, call <code class=external data-anolis-spec=domcore title=dom-CharacterData-deleteData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-deletedata>deleteData(<var title="">end
+    offset</var> + 1, <var title="">length</var> &minus; 1)</a></code> on <var title="">node</var>.
+
+    <li>Add one to <var title="">end offset</var>.
+  </ol>
+
+  <!-- Now replace with the canonical sequence. -->
+  <li>Let <var title="">replacement whitespace</var> be the <a href=#canonical-space-sequence>canonical space
+  sequence</a> of length <var title="">end offset</var> minus <var title="">start
+  offset</var>.  <var title="">non-breaking start</var> is true if <var title="">start
+  offset</var> is zero and false otherwise, and <var title="">non-breaking end</var> is
+  true if <var title="">end offset</var> is <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-length><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-length>length</a></code> and false
+  otherwise.
+
+  <li>While <var title="">start offset</var> is less than <var title="">end offset</var>:
+
+  <ol>
+    <li>Remove the first <a href=http://es5.github.com/#x8.4>element</a> from <var title="">replacement whitespace</var>, and
+    let <var title="">element</var> be that <a href=http://es5.github.com/#x8.4>element</a>.
+
+    <li>If <var title="">element</var> is not the same as the <var title="">start offset</var>th
+    <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code>:
+
+    <ol>
+      <!-- We need to insert then delete, so that we don't change range
+      boundary points. -->
+      <li>Call <code class=external data-anolis-spec=domcore title=dom-CharacterData-insertData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-insertdata>insertData(<var title="">start offset</var>, <var title="">element</var>)</a></code> on
+      <var title="">node</var>.
+
+      <li>Call <code class=external data-anolis-spec=domcore title=dom-CharacterData-deleteData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-deletedata>deleteData(<var title="">start offset</var> + 1, 1)</a></code> on
+      <var title="">node</var>.
+    </ol>
+
+    <li>Add one to <var title="">start offset</var>.
+  </ol>
+</ol>
+
 
 <h3 id=allowed-children><span class=secno>7.3 </span>Allowed children</h3>
 
@@ -3267,9 +3339,15 @@
 
   <!-- This is based on deleteContents() in DOM Range. -->
   <li>If <var title="">start node</var> and <var title="">end node</var> are the same, and
-  <var title="">start node</var> is an <a href=#editable>editable</a> <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node, call
-  <code class=external data-anolis-spec=domcore title=dom-CharacterData-deleteData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-deletedata>deleteData(<var title="">start offset</var>, <var title="">end offset</var> &minus;
-  <var title="">start offset</var>)</a></code> on <var title="">start node</var>.
+  <var title="">start node</var> is an <a href=#editable>editable</a> <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node:
+
+  <ol>
+    <li>Call <code class=external data-anolis-spec=domcore title=dom-CharacterData-deleteData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-deletedata>deleteData(<var title="">start offset</var>, <var title="">end offset</var>
+    &minus; <var title="">start offset</var>)</a></code> on <var title="">start node</var>.
+
+    <li><a href=#canonicalize-whitespace>Canonicalize whitespace</a> at (<var title="">start node</var>,
+    <var title="">start offset</var>).
+  </ol>
 
   <li>Otherwise:
 
@@ -4316,6 +4394,11 @@
   collapsed?  WebKit seems to do some normalization on the range before
   deciding whether it's collapsed, and that sounds like a good idea.
 
+  <li><a href=#canonicalize-whitespace>Canonicalize whitespace</a> at (<a href=#active-range>active range</a>'s
+  <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-node title=concept-boundary-point-node>node</a>, <a href=#active-range>active range</a>'s <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-offset title=concept-boundary-point-offset>offset</a>).
+  <!-- Needed so that if there are multiple consecutive spaces we backspace
+  over all at once. -->
+
   <li>Let <var title="">node</var> and <var title="">offset</var> be the <a href=#active-range>active
   range</a>'s <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-node title=concept-boundary-point-node>node</a> and <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-offset title=concept-boundary-point-offset>offset</a>.
 
@@ -4802,6 +4885,9 @@
   collapsed?  WebKit seems to do some normalization on the range before
   deciding whether it's collapsed, and that sounds like a good idea.
 
+  <li><a href=#canonicalize-whitespace>Canonicalize whitespace</a> at (<a href=#active-range>active range</a>'s
+  <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-node title=concept-boundary-point-node>node</a>, <a href=#active-range>active range</a>'s <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-offset title=concept-boundary-point-offset>offset</a>).
+
   <li>Let <var title="">node</var> and <var title="">offset</var> be the <a href=#active-range>active
   range</a>'s <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-range-start title=concept-range-start>start</a> <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-node title=concept-boundary-point-node>node</a> and <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#concept-boundary-point-offset title=concept-boundary-point-offset>offset</a>.
 
@@ -5661,106 +5747,12 @@
   and that <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-child title=concept-tree-child>child</a> is a <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node, set <var title="">node</var> to that <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-child title=concept-tree-child>child</a>,
   then set <var title="">offset</var> to zero.
 
-  <li>If <var title="">node</var> is a <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node whose <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-parent title=concept-tree-parent>parent</a>'s <a href=http://www.w3.org/TR/CSS21/cascade.html#computed-value>computed value</a>
-  for "white-space" is neither "pre" nor "pre-wrap":
-
-  <p class=XXX>This also needs to take visually adjoining text nodes into
-  account, even if their parents are in different elements.  When inserting "a"
-  in "&lt;a href=/&gt;foo&nbsp;&lt;/a&gt;[] ", for instance, we need to convert the
-  nbsp to a regular space.  This kind of thing is just not feasible using pure
-  DOM stuff, though, so the current definition is a bad hack that will often
-  fail in real-world cases.  Suggestions for how to improve it appreciated.
-
-  <ol>
-    <li>Let <var title="">leading space</var> equal zero.
-
-    <li>Let <var title="">start offset</var> equal <var title="">offset</var> minus one.
-
-    <li>While <var title="">start offset</var> is nonnegative and the
-    <var title="">start offset</var>th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is a
-    space (U+0020) or non-breaking space (U+00A0):
-
-    <ol>
-      <li>If the <var title="">start offset</var>th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s
-      <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is a non-breaking space (U+00A0), or the <a href=http://es5.github.com/#x8.4>element</a> before it
-      is not a space (U+0020), add one to <var title="">leading space</var>.
-
-      <li>Subtract one from <var title="">start offset</var>.
-    </ol>
-
-    <li>Add one to <var title="">start offset</var>.
-
-    <li>Let <var title="">trailing space</var> equal zero.
-
-    <li>Let <var title="">end offset</var> equal <var title="">offset</var>.
-
-    <li>While <var title="">end offset</var> is less than <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-length><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-length>length</a></code>
-    and the <var title="">end offset</var>th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code>
-    is a space (U+0020) or non-breaking space (U+00A0):
-
-    <ol>
-      <li>If the <var title="">end offset</var>th <a href=http://es5.github.com/#x8.4>element</a> of <var title="">node</var>'s
-      <code class=external data-anolis-spec=domcore title=dom-CharacterData-data><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-data>data</a></code> is a non-breaking space (U+00A0), or the <a href=http://es5.github.com/#x8.4>element</a> before it
-      is not a space (U+0020), add one to <var title="">trailing space</var>.
-      <!-- If we're between two spaces that are collapsed together, this means
-      we're effectively at the end of the collapsed run.  This shouldn't happen
-      with user-created selections, of course. -->
-
-      <li>Add one to <var title="">end offset</var>.
-    </ol>
-
-    <li>Set <var title="">initial nbsp</var> to true if <var title="">start offset</var> is 0,
-    false otherwise.
-
-    <li>Set <var title="">final nbsp</var> to true if <var title="">end offset</var> is the
-    <code class=external data-anolis-spec=domcore title=dom-CharacterData-length><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-length>length</a></code> of <var title="">node</var>, false otherwise.
-
-    <p class=XXX>These are wrong in zillions of common cases.  Per XXX above,
-    the fix here is not obvious at all.
-
-    <li>If <var title="">value</var> is a space (U+0020):
-
-    <ol>
-      <li>Let <var title="">new trailing space</var> be the <a href=#canonical-space-sequence>canonical space
-      sequence</a> of length <var title="">leading space</var> plus <var title="">trailing
-      space</var> plus one, with <var title="">non-breaking start</var> equal to
-      <var title="">initial nbsp</var> and <var title="">non-breaking end</var> equal to
-      <var title="">final nbsp</var>.
-
-      <li>Remove the first <var title="">leading space</var> <a href=http://es5.github.com/#x8.4>elements</a> from <var title="">new
-      trailing space</var>, and let <var title="">new leading space</var> be the result.
-
-      <li>Remove the first <a href=http://es5.github.com/#x8.4>element</a> from <var title="">new trailing space</var>, and
-      let <var title="">value</var> be the result.
-    </ol>
-
-    <li>Otherwise:
-
-    <ol>
-      <li>Let <var title="">new leading space</var> be the <a href=#canonical-space-sequence>canonical space
-      sequence</a> of length <var title="">leading space</var>, with
-      <var title="">non-breaking start</var> equal to <var title="">initial nbsp</var> and
-      <var title="">non-breaking end</var> equal to false.
-
-      <li>Let <var title="">new trailing space</var> be the <a href=#canonical-space-sequence>canonical space
-      sequence</a> of length <var title="">trailing space</var>, with
-      <var title="">non-breaking start</var> equal to false and <var title="">non-breaking
-      end</var> equal to <var title="">final nbsp</var>.
-    </ol>
-
-    <li>Call <code class=external data-anolis-spec=domcore title=dom-CharacterData-replaceData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-replacedata>replaceData(<var title="">start offset</var>, <var title="">offset</var> &minus;
-    <var title="">start offset</var>, <var title="">new leading space</var>)</a></code> on <var title="">node</var>.
-
-    <li>Subtract <var title="">offset</var> from <var title="">end offset</var>, then add
-    <var title="">start offset</var> plus <var title="">leading space</var> to <var title="">end
-    offset</var>.
-
-    <li>Set <var title="">offset</var> to <var title="">start offset</var> plus <var title="">leading
-    space</var>.
-
-    <li>Call <code class=external data-anolis-spec=domcore title=dom-CharacterData-replaceData><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#dom-characterdata-replacedata>replaceData(<var title="">offset</var>, <var title="">end offset</var> &minus;
-    <var title="">offset</var>, <var title="">new trailing space</var>)</a></code> on <var title="">node</var>.
-  </ol>
+  <li>If <var title="">value</var> is a space (U+0020), and either <var title="">node</var> is an
+  <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#element>Element</a></code> whose <a href=http://www.w3.org/TR/CSS21/cascade.html#computed-value>computed value</a> for "white-space" is neither "pre" nor
+  "pre-wrap" or <var title="">node</var> is not an <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#element>Element</a></code> but its <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-parent title=concept-tree-parent>parent</a> is an
+  <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#element>Element</a></code> whose <a href=http://www.w3.org/TR/CSS21/cascade.html#computed-value>computed value</a> for "white-space" is neither "pre" nor
+  "pre-wrap", set <var title="">value</var> to a non-breaking space (U+00A0).
+  <!-- This may change to a space when we canonicalize. -->
 
   <li>If <var title="">node</var> is a <code class=external data-anolis-spec=domcore><a href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#text>Text</a></code> node:
 
@@ -5773,6 +5765,12 @@
     <li>Call <code class=external data-anolis-spec=domrange title=dom-Selection-collapse><a href=http://html5.org/specs/dom-range.html#dom-selection-collapse>collapse(<var title="">node</var>, <var title="">offset</var>)</a></code> on the
     <a class=external data-anolis-spec=domrange href=http://html5.org/specs/dom-range.html#context-object>context object</a>'s <code class=external data-anolis-spec=domrange><a href=http://html5.org/specs/dom-range.html#selection>Selection</a></code>.
 
+    <li><a href=#canonicalize-whitespace>Canonicalize whitespace</a> at (<var title="">node</var>,
+    <var title="">offset</var> &minus; 1).
+
+    <li><a href=#canonicalize-whitespace>Canonicalize whitespace</a> at (<var title="">node</var>,
+    <var title="">offset</var>).
+
     <li>Abort these steps.
   </ol>
 
@@ -5781,9 +5779,6 @@
   <li>If <var title="">node</var> has only one <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-child title=concept-tree-child>child</a>, which is a <a href=#collapsed-line-break>collapsed
   line break</a>, remove its <a class=external data-anolis-spec=domcore href=http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-tree-child title=concept-tree-child>child</a> from it.
 
-  <li>If <var title="">value</var> is a space (U+0020), set <var title="">value</var> to a
-  non-breaking space (U+00A0).
-
   <p class=XXX>This is wrong in all sorts of cases, like
   "foo&lt;b&gt;[]&lt;/b&gt;bar".  As above, this is hard to get right without heavy
   CSS involvement.
--- a/implementation.js	Tue Jun 21 12:30:52 2011 -0600
+++ b/implementation.js	Tue Jun 21 15:28:14 2011 -0600
@@ -3280,6 +3280,85 @@
 	return buffer;
 }
 
+function canonicalizeWhitespace(node, offset) {
+	// "If node is not a Text node, or is not editable, or its parent's
+	// computed value for "white-space" is "pre" or "pre-wrap", abort these
+	// steps."
+	if (node.nodeType != Node.TEXT_NODE
+	|| !isEditable(node)
+	|| ["pre", "pre-wrap"].indexOf(getComputedStyle(node.parentNode).whiteSpace) != -1) {
+		return;
+	}
+
+	// "Let start offset equal offset."
+	var startOffset = offset;
+
+	// "While start offset is positive and the (start offset − 1)st element of
+	// node's data is a space (0x0020) or non-breaking space (0x00A0), subtract
+	// one from start offset."
+	while (startOffset > 0
+	&& /[ \xa0]/.test(node.data[startOffset - 1])) {
+		startOffset--;
+	}
+
+	// "Let end offset equal start offset."
+	var endOffset = startOffset;
+
+	// "While end offset is less than node's length, and the end offsetth
+	// element of node's data is 0x0020 or 0x00A0:"
+	while (endOffset < node.length
+	&& /[ \xa0]/.test(node.data[endOffset])) {
+		// "Let length equal zero."
+		var length = 0;
+
+		// "While end offset plus length is less than node's length, and the
+		// (end offset + length)th element of node's data is 0x0020, add one to
+		// length."
+		while (endOffset + length < node.length
+		&& node.data[endOffset + length] == " ") {
+			length++;
+		}
+
+		// "If length is greater than one, call deleteData(end offset + 1,
+		// length − 1) on node."
+		if (length > 1) {
+			node.deleteData(endOffset + 1, length - 1);
+		}
+
+		// "Add one to end offset."
+		endOffset++;
+	}
+
+	// "Let replacement whitespace be the canonical space sequence of length
+	// end offset minus start offset. non-breaking start is true if start
+	// offset is zero and false otherwise, and non-breaking end is true if end
+	// offset is node's length and false otherwise."
+	var replacementWhitespace = canonicalSpaceSequence(endOffset - startOffset,
+		startOffset == 0,
+		endOffset == node.length);
+
+	// "While start offset is less than end offset:"
+	while (startOffset < endOffset) {
+		// "Remove the first element from replacement whitespace, and let
+		// element be that element."
+		var element = replacementWhitespace[0];
+		replacementWhitespace = replacementWhitespace.slice(1);
+
+		// "If element is not the same as the start offsetth element of node's
+		// data:"
+		if (element != node.data[startOffset]) {
+			// "Call insertData(start offset, element) on node."
+			node.insertData(startOffset, element);
+
+			// "Call deleteData(start offset + 1, 1) on node."
+			node.deleteData(startOffset + 1, 1);
+		}
+
+		// "Add one to start offset."
+		startOffset++;
+	}
+}
+
 //@}
 
 ///// Allowed children /////
@@ -3752,13 +3831,17 @@
 	}
 
 	// "If start node and end node are the same, and start node is an editable
-	// Text node, call deleteData(start offset, end offset − start offset) on
-	// start node."
+	// Text node:"
 	if (startNode == endNode
 	&& isEditable(startNode)
 	&& startNode.nodeType == Node.TEXT_NODE) {
+		// "Call deleteData(start offset, end offset − start offset) on start
+		// node."
 		startNode.deleteData(startOffset, endOffset - startOffset);
 
+		// "Canonicalize whitespace at (start node, start offset)."
+		canonicalizeWhitespace(startNode, startOffset);
+
 	// "Otherwise:"
 	} else {
 		// "If start node is an editable Text node, call deleteData() on it,
@@ -4577,6 +4660,10 @@
 			return;
 		}
 
+		// "Canonicalize whitespace at (active range's start node, active
+		// range's start offset)."
+		canonicalizeWhitespace(getActiveRange().startContainer, getActiveRange().startOffset);
+
 		// "Let node and offset be the active range's start node and offset."
 		var node = getActiveRange().startContainer;
 		var offset = getActiveRange().startOffset;
@@ -5022,6 +5109,10 @@
 			return;
 		}
 
+		// "Canonicalize whitespace at (active range's start node, active
+		// range's start offset)."
+		canonicalizeWhitespace(getActiveRange().startContainer, getActiveRange().startOffset);
+
 		// "Let node and offset be the active range's start node and offset."
 		var node = getActiveRange().startContainer;
 		var offset = getActiveRange().startOffset;
@@ -5795,113 +5886,16 @@
 			offset = 0;
 		}
 
-		// "If node is a Text node whose parent's computed value for
-		// "white-space" is neither "pre" nor "pre-wrap":"
-		if (node.nodeType == Node.TEXT_NODE
-		&& ["pre", "pre-wrap"].indexOf(getComputedStyle(node.parentNode).whiteSpace) == -1) {
-			// "Let leading space equal zero."
-			var leadingSpace = 0;
-
-			// "Let start offset equal offset minus one."
-			var startOffset = offset - 1;
-
-			// "While start offset is nonnegative and the start offsetth
-			// element of node's data is a space (U+0020) or non-breaking space
-			// (U+00A0):"
-			while (startOffset >= 0
-			&& /[ \xa0]/.test(node.data[startOffset])) {
-				// "If the start offsetth element of node's data is a
-				// non-breaking space (U+00A0), or the element before it is not
-				// a space (U+0020), add one to leading space."
-				if (node.data[startOffset] == "\xa0"
-				|| node.data[startOffset - 1] !== " ") {
-					leadingSpace++;
-				}
-
-				// "Subtract one from start offset."
-				startOffset--;
-			}
-
-			// "Add one to start offset."
-			startOffset++;
-
-			// "Let trailing space equal zero."
-			var trailingSpace = 0;
-
-			// "Let end offset equal offset."
-			var endOffset = offset;
-
-			// "While end offset is less than node's length and the end
-			// offsetth element of node's data is a space (U+0020) or
-			// non-breaking space (U+00A0):"
-			while (endOffset < node.length
-			&& /[ \xa0]/.test(node.data[endOffset])) {
-				// "If the end offsetth element of node's data is a
-				// non-breaking space (U+00A0), or the element before it is not
-				// a space (U+0020), add one to trailing space."
-				if (node.data[endOffset] == "\xa0"
-				|| node.data[endOffset - 1] !== " ") {
-					trailingSpace++;
-				}
-
-				// "Add one to end offset."
-				endOffset++;
-			}
-
-			// "Set initial nbsp to true if start offset is 0, false
-			// otherwise."
-			var initialNbsp = startOffset == 0;
-
-			// "Set final nbsp to true if end offset is the length of node,
-			// false otherwise."
-			var finalNbsp = endOffset == node.length;
-
-			// "If value is a space (U+0020):"
-			if (value == " ") {
-				// "Let new trailing space be the canonical space sequence of
-				// length leading space plus trailing space plus one, with
-				// non-breaking start equal to initial nbsp and non-breaking
-				// end equal to final nbsp."
-				var newTrailingSpace = canonicalSpaceSequence(leadingSpace + trailingSpace + 1, initialNbsp, finalNbsp);
-
-				// "Remove the first leading space elements from new trailing
-				// space, and let new leading space be the result."
-				var newLeadingSpace = newTrailingSpace.slice(0, leadingSpace);
-				newTrailingSpace = newTrailingSpace.slice(leadingSpace);
-
-				// "Remove the first element from new trailing space, and let
-				// value be the result."
-				value = newTrailingSpace[0];
-				newTrailingSpace = newTrailingSpace.slice(1);
-
-			// "Otherwise:"
-			} else {
-				// "Let new leading space be the canonical space sequence of
-				// length leading space, with non-breaking start equal to
-				// initial nbsp and non-breaking end equal to false."
-				var newLeadingSpace = canonicalSpaceSequence(leadingSpace, initialNbsp, false);
-
-				// "Let new trailing space be the canonical space sequence of
-				// length trailing space, with non-breaking start equal to
-				// false and non-breaking end equal to final nbsp."
-				var newTrailingSpace = canonicalSpaceSequence(trailingSpace, false, finalNbsp);
-			}
-
-			// "Call replaceData(start offset, offset − start offset, new
-			// leading space) on node."
-			node.replaceData(startOffset, offset - startOffset, newLeadingSpace);
-
-			// "Subtract offset from end offset, then add start offset plus
-			// leading space to end offset."
-			endOffset -= offset;
-			endOffset += startOffset + leadingSpace;
-
-			// "Set offset to start offset plus leading space."
-			offset = startOffset + leadingSpace;
-
-			// "Call replaceData(offset, end offset − offset, new trailing
-			// space) on node."
-			node.replaceData(offset, endOffset - offset, newTrailingSpace);
+		// "If value is a space (U+0020), and either node is an Element whose
+		// computed value for "white-space" is neither "pre" nor "pre-wrap" or
+		// node is not an Element but its parent is an Element whose computed
+		// value for "white-space" is neither "pre" nor "pre-wrap", set value
+		// to a non-breaking space (U+00A0)."
+		var refElement = node.nodeType == Node.ELEMENT_NODE ? node : node.parentNode;
+		if (value == " "
+		&& refElement.nodeType == Node.ELEMENT_NODE
+		&& ["pre", "pre-wrap"].indexOf(getComputedStyle(refElement).whiteSpace) == -1) {
+			value = "\xa0";
 		}
 
 		// "If node is a Text node:"
@@ -5916,6 +5910,12 @@
 			getActiveRange().setStart(node, offset);
 			getActiveRange().setEnd(node, offset);
 
+			// "Canonicalize whitespace at (node, offset − 1)."
+			canonicalizeWhitespace(node, offset - 1);
+
+			// "Canonicalize whitespace at (node, offset)."
+			canonicalizeWhitespace(node, offset);
+
 			// "Abort these steps."
 			return;
 		}
@@ -5929,12 +5929,6 @@
 			node.removeChild(node.firstChild);
 		}
 
-		// "If value is a space (U+0020), set value to a non-breaking space
-		// (U+00A0)."
-		if (value == " ") {
-			value = "\xa0";
-		}
-
 		// "Let text be the result of calling createTextNode(value) on the
 		// context object."
 		var text = document.createTextNode(value);
--- a/source.html	Tue Jun 21 12:30:52 2011 -0600
+++ b/source.html	Tue Jun 21 15:28:14 2011 -0600
@@ -2861,6 +2861,78 @@
 
   <li>Return <var>buffer</var>.
 </ol>
+
+<p>To <dfn>canonicalize whitespace</dfn> at (<var>node</var>,
+<var>offset</var>):
+
+<p class=XXX>This algorithm fails in all kinds of common cases, like any
+non-text node, or whitespace that spans multiple text nodes.  Needs lots of
+fixing.
+
+<ol>
+  <li>If <var>node</var> is not a [[text]] node, or is not
+  <span>editable</span>, or its [[parent]]'s [[compval]] for "white-space" is
+  "pre" or "pre-wrap", abort these steps.
+
+  <!-- First go to the beginning of the current whitespace run. -->
+  <li>Let <var>start offset</var> equal <var>offset</var>.
+
+  <li>While <var>start offset</var> is positive and the (<var>start
+  offset</var> &minus; 1)st [[strel]] of <var>node</var>'s [[cddata]] is a
+  space (0x0020) or non-breaking space (0x00A0), subtract one from <var>start
+  offset</var>.
+
+  <!-- Now collapse any consecutive spaces. -->
+  <li>Let <var>end offset</var> equal <var>start offset</var>.
+
+  <li>While <var>end offset</var> is less than <var>node</var>'s [[strlen]],
+  and the <var>end offset</var>th [[strel]] of <var>node</var>'s [[cddata]] is
+  0x0020 or 0x00A0:
+
+  <ol>
+    <li>Let <var>length</var> equal zero.
+
+    <li>While <var>end offset</var> plus <var>length</var> is less than
+    <var>node</var>'s [[cdlength]], and the (<var>end offset</var> +
+    <var>length</var>)th [[strel]] of <var>node</var>'s [[cddata]] is 0x0020,
+    add one to <var>length</var>.
+
+    <li>If <var>length</var> is greater than one, call [[deletedata|<var>end
+    offset</var> + 1, <var>length</var> &minus; 1]] on <var>node</var>.
+
+    <li>Add one to <var>end offset</var>.
+  </ol>
+
+  <!-- Now replace with the canonical sequence. -->
+  <li>Let <var>replacement whitespace</var> be the <span>canonical space
+  sequence</span> of length <var>end offset</var> minus <var>start
+  offset</var>.  <var>non-breaking start</var> is true if <var>start
+  offset</var> is zero and false otherwise, and <var>non-breaking end</var> is
+  true if <var>end offset</var> is <var>node</var>'s [[cdlength]] and false
+  otherwise.
+
+  <li>While <var>start offset</var> is less than <var>end offset</var>:
+
+  <ol>
+    <li>Remove the first [[strel]] from <var>replacement whitespace</var>, and
+    let <var>element</var> be that [[strel]].
+
+    <li>If <var>element</var> is not the same as the <var>start offset</var>th
+    [[strel]] of <var>node</var>'s [[cddata]]:
+
+    <ol>
+      <!-- We need to insert then delete, so that we don't change range
+      boundary points. -->
+      <li>Call [[insertdata|<var>start offset</var>, <var>element</var>]] on
+      <var>node</var>.
+
+      <li>Call [[deletedata|<var>start offset</var> + 1, 1]] on
+      <var>node</var>.
+    </ol>
+
+    <li>Add one to <var>start offset</var>.
+  </ol>
+</ol>
 <!-- @} -->
 
 <h3>Allowed children</h3>
@@ -3253,9 +3325,15 @@
 
   <!-- This is based on deleteContents() in DOM Range. -->
   <li>If <var>start node</var> and <var>end node</var> are the same, and
-  <var>start node</var> is an <span>editable</span> [[text]] node, call
-  [[deletedata|<var>start offset</var>, <var>end offset</var> &minus;
-  <var>start offset</var>]] on <var>start node</var>.
+  <var>start node</var> is an <span>editable</span> [[text]] node:
+
+  <ol>
+    <li>Call [[deletedata|<var>start offset</var>, <var>end offset</var>
+    &minus; <var>start offset</var>]] on <var>start node</var>.
+
+    <li><span>Canonicalize whitespace</span> at (<var>start node</var>,
+    <var>start offset</var>).
+  </ol>
 
   <li>Otherwise:
 
@@ -4319,6 +4397,11 @@
   collapsed?  WebKit seems to do some normalization on the range before
   deciding whether it's collapsed, and that sounds like a good idea.
 
+  <li><span>Canonicalize whitespace</span> at (<span>active range</span>'s
+  [[startnode]], <span>active range</span>'s [[startoffset]]).
+  <!-- Needed so that if there are multiple consecutive spaces we backspace
+  over all at once. -->
+
   <li>Let <var>node</var> and <var>offset</var> be the <span>active
   range</span>'s [[rangestart]] [[bpnode]] and [[bpoffset]].
 
@@ -4806,6 +4889,9 @@
   collapsed?  WebKit seems to do some normalization on the range before
   deciding whether it's collapsed, and that sounds like a good idea.
 
+  <li><span>Canonicalize whitespace</span> at (<span>active range</span>'s
+  [[startnode]], <span>active range</span>'s [[startoffset]]).
+
   <li>Let <var>node</var> and <var>offset</var> be the <span>active
   range</span>'s [[rangestart]] [[bpnode]] and [[bpoffset]].
 
@@ -5679,106 +5765,12 @@
   and that [[child]] is a [[text]] node, set <var>node</var> to that [[child]],
   then set <var>offset</var> to zero.
 
-  <li>If <var>node</var> is a [[text]] node whose [[parent]]'s [[compval]]
-  for "white-space" is neither "pre" nor "pre-wrap":
-
-  <p class=XXX>This also needs to take visually adjoining text nodes into
-  account, even if their parents are in different elements.  When inserting "a"
-  in "&lt;a href=/>foo&nbsp;&lt;/a>[] ", for instance, we need to convert the
-  nbsp to a regular space.  This kind of thing is just not feasible using pure
-  DOM stuff, though, so the current definition is a bad hack that will often
-  fail in real-world cases.  Suggestions for how to improve it appreciated.
-
-  <ol>
-    <li>Let <var>leading space</var> equal zero.
-
-    <li>Let <var>start offset</var> equal <var>offset</var> minus one.
-
-    <li>While <var>start offset</var> is nonnegative and the
-    <var>start offset</var>th [[strel]] of <var>node</var>'s [[cddata]] is a
-    space (U+0020) or non-breaking space (U+00A0):
-
-    <ol>
-      <li>If the <var>start offset</var>th [[strel]] of <var>node</var>'s
-      [[cddata]] is a non-breaking space (U+00A0), or the [[strel]] before it
-      is not a space (U+0020), add one to <var>leading space</var>.
-
-      <li>Subtract one from <var>start offset</var>.
-    </ol>
-
-    <li>Add one to <var>start offset</var>.
-
-    <li>Let <var>trailing space</var> equal zero.
-
-    <li>Let <var>end offset</var> equal <var>offset</var>.
-
-    <li>While <var>end offset</var> is less than <var>node</var>'s [[cdlength]]
-    and the <var>end offset</var>th [[strel]] of <var>node</var>'s [[cddata]]
-    is a space (U+0020) or non-breaking space (U+00A0):
-
-    <ol>
-      <li>If the <var>end offset</var>th [[strel]] of <var>node</var>'s
-      [[cddata]] is a non-breaking space (U+00A0), or the [[strel]] before it
-      is not a space (U+0020), add one to <var>trailing space</var>.
-      <!-- If we're between two spaces that are collapsed together, this means
-      we're effectively at the end of the collapsed run.  This shouldn't happen
-      with user-created selections, of course. -->
-
-      <li>Add one to <var>end offset</var>.
-    </ol>
-
-    <li>Set <var>initial nbsp</var> to true if <var>start offset</var> is 0,
-    false otherwise.
-
-    <li>Set <var>final nbsp</var> to true if <var>end offset</var> is the
-    [[cdlength]] of <var>node</var>, false otherwise.
-
-    <p class=XXX>These are wrong in zillions of common cases.  Per XXX above,
-    the fix here is not obvious at all.
-
-    <li>If <var>value</var> is a space (U+0020):
-
-    <ol>
-      <li>Let <var>new trailing space</var> be the <span>canonical space
-      sequence</span> of length <var>leading space</var> plus <var>trailing
-      space</var> plus one, with <var>non-breaking start</var> equal to
-      <var>initial nbsp</var> and <var>non-breaking end</var> equal to
-      <var>final nbsp</var>.
-
-      <li>Remove the first <var>leading space</var> [[strels]] from <var>new
-      trailing space</var>, and let <var>new leading space</var> be the result.
-
-      <li>Remove the first [[strel]] from <var>new trailing space</var>, and
-      let <var>value</var> be the result.
-    </ol>
-
-    <li>Otherwise:
-
-    <ol>
-      <li>Let <var>new leading space</var> be the <span>canonical space
-      sequence</span> of length <var>leading space</var>, with
-      <var>non-breaking start</var> equal to <var>initial nbsp</var> and
-      <var>non-breaking end</var> equal to false.
-
-      <li>Let <var>new trailing space</var> be the <span>canonical space
-      sequence</span> of length <var>trailing space</var>, with
-      <var>non-breaking start</var> equal to false and <var>non-breaking
-      end</var> equal to <var>final nbsp</var>.
-    </ol>
-
-    <li>Call [[replacedata|<var>start offset</var>, <var>offset</var> &minus;
-    <var>start offset</var>, <var>new leading space</var>]] on <var>node</var>.
-
-    <li>Subtract <var>offset</var> from <var>end offset</var>, then add
-    <var>start offset</var> plus <var>leading space</var> to <var>end
-    offset</var>.
-
-    <li>Set <var>offset</var> to <var>start offset</var> plus <var>leading
-    space</var>.
-
-    <li>Call [[replacedata|<var>offset</var>, <var>end offset</var> &minus;
-    <var>offset</var>, <var>new trailing space</var>]] on <var>node</var>.
-  </ol>
+  <li>If <var>value</var> is a space (U+0020), and either <var>node</var> is an
+  [[element]] whose [[compval]] for "white-space" is neither "pre" nor
+  "pre-wrap" or <var>node</var> is not an [[element]] but its [[parent]] is an
+  [[element]] whose [[compval]] for "white-space" is neither "pre" nor
+  "pre-wrap", set <var>value</var> to a non-breaking space (U+00A0).
+  <!-- This may change to a space when we canonicalize. -->
 
   <li>If <var>node</var> is a [[text]] node:
 
@@ -5791,6 +5783,12 @@
     <li>Call [[selcollapse|<var>node</var>, <var>offset</var>]] on the
     [[contextobject]]'s [[selection]].
 
+    <li><span>Canonicalize whitespace</span> at (<var>node</var>,
+    <var>offset</var> &minus; 1).
+
+    <li><span>Canonicalize whitespace</span> at (<var>node</var>,
+    <var>offset</var>).
+
     <li>Abort these steps.
   </ol>
 
@@ -5799,9 +5797,6 @@
   <li>If <var>node</var> has only one [[child]], which is a <span>collapsed
   line break</span>, remove its [[child]] from it.
 
-  <li>If <var>value</var> is a space (U+0020), set <var>value</var> to a
-  non-breaking space (U+00A0).
-
   <p class=XXX>This is wrong in all sorts of cases, like
   "foo&lt;b>[]&lt;/b>bar".  As above, this is hard to get right without heavy
   CSS involvement.
--- a/tests.js	Tue Jun 21 12:30:52 2011 -0600
+++ b/tests.js	Tue Jun 21 15:28:14 2011 -0600
@@ -307,6 +307,15 @@
 		'<a href=/>foo</a>[]bar',
 		'foo<a href=/>[]bar</a>',
 
+		'foo &nbsp;[]bar',
+		'foo&nbsp; []bar',
+		'foo&nbsp;&nbsp;[]bar',
+		'foo  []bar',
+		'<b>foo </b>&nbsp;[]bar',
+		'<b>foo&nbsp;</b> []bar',
+		'<b>foo&nbsp;</b>&nbsp;[]bar',
+		'<b>foo </b> []bar',
+
 		// Tables with collapsed selection
 		'foo<table><tr><td>[]bar</table>baz',
 		'foo<table><tr><td>bar</table>[]baz',
@@ -934,6 +943,15 @@
 		'<a href=/>foo[]</a>bar',
 		'foo[]<a href=/>bar</a>',
 
+		'foo[] &nbsp;bar',
+		'foo[]&nbsp; bar',
+		'foo[]&nbsp;&nbsp;bar',
+		'foo[]  bar',
+		'<b>foo[] </b>&nbsp;bar',
+		'<b>foo[]&nbsp;</b> bar',
+		'<b>foo[]&nbsp;</b>&nbsp;bar',
+		'<b>foo[] </b> bar',
+
 		// Tables with collapsed selection
 		'foo[]<table><tr><td>bar</table>baz',
 		'foo<table><tr><td>bar[]</table>baz',