Wed, 13 Feb 2013 18:35:14 -0800
[css3-syntax] Rewrite the charset determining step to be a little clearer, and closer to my normal language.
css3-syntax/Overview.html | file | annotate | diff | comparison | revisions | |
css3-syntax/Overview.src.html | file | annotate | diff | comparison | revisions |
1.1 --- a/css3-syntax/Overview.html Wed Feb 13 18:26:41 2013 -0800 1.2 +++ b/css3-syntax/Overview.html Wed Feb 13 18:35:14 2013 -0800 1.3 @@ -536,21 +536,21 @@ 1.4 id=decode>decode</dfn></a> are defined in the <a 1.5 href="http://encoding.spec.whatwg.org/">Encoding Standard</a>. 1.6 1.7 + <p> First, determine the fallback encoding: 1.8 + 1.9 <ol> 1.10 - <li> Let <var>encoding</var> be utf-8. 1.11 - 1.12 <li> If HTTP or equivalent protocol defines an encoding (e.g. via the 1.13 charset parameter of the Content-Type header), <a 1.14 href="#get-an-encoding"><i>get an encoding</i></a> for the specified 1.15 - value. If that does not return failure, set <var>encoding</var> to the 1.16 - return value and jump to the last step of this algorithm. 1.17 - 1.18 - <li> Check the byte stream. If the first several bytes match the hex 1.19 - sequence 1.20 - <pre>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</pre> 1.21 + value. If that does not return failure, use the return value as the 1.22 + fallback encoding. 1.23 + 1.24 + <li> Otherwise, check the byte stream. If the first several bytes match 1.25 + the hex sequence 1.26 + <pre>40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B</pre> 1.27 then <a href="#get-an-encoding"><i>get an encoding</i></a> for the 1.28 - sequence of XX bytes, decoded per <code>windows-1252</code>, and let 1.29 - <var>temp</var> be the return value. 1.30 + sequence of <code>(not 22)*</code> bytes, decoded per 1.31 + <code>windows-1252</code>. 1.32 <p class=note> Note: Anything ASCII-compatible will do, so using 1.33 <code>windows-1252</code> is fine. 1.34 1.35 @@ -558,29 +558,32 @@ 1.36 the string "<code>@charset "…";</code>", where the "…" is the 1.37 sequence of bytes corresponding to the encoding's name. 1.38 1.39 - <p> If <var>temp</var> is <code>utf-16</code> or <code>utf-16be</code>, 1.40 - set <var>temp</var> to <code>utf-8</code>. If <var>temp</var> is not 1.41 - failure, set <var>encoding</var> to it and jump to the last step. 1.42 + <p> If the return value was <code>utf-16</code> or <code>utf-16be</code>, 1.43 + use <code>utf-8</code> as the fallback encoding; if it was anything else 1.44 + except failure, use the return value as the fallback encoding. 1.45 1.46 <p class=note> This mimics HTML <code><meta></code> behavior. 1.47 1.48 - <li> <a href="#get-an-encoding"><i>Get an encoding</i></a> for the value 1.49 - of the <code>charset</code> attribute on the <code><link></code> 1.50 - element or <code><?xml-stylesheet?></code> processing instruction that 1.51 - caused the style sheet to be included, if any. If that does not return 1.52 - failure, set <var>encoding</var> to the return value and jump to the last 1.53 - step. 1.54 - 1.55 - <li> Set <var>encoding</var> to the encoding of the referring style sheet 1.56 - or document, if any. 1.57 - 1.58 - <li> <a href="#decode"><i>Decode</i></a> the byte stream using fallback 1.59 - encoding <var>encoding</var>. 1.60 - <p class=note> Note: the <a href="#decode"><i>decode</i></a> algorithm 1.61 - lets the byte order mark (BOM) take precedence, hence the usage of the 1.62 - term "fallback" above. 1.63 + <li> Otherwise, <a href="#get-an-encoding"><i>get an encoding</i></a> for 1.64 + the value of the <code>charset</code> attribute on the 1.65 + <code><link></code> element or <code><?xml-stylesheet?></code> 1.66 + processing instruction that caused the style sheet to be included, if 1.67 + any. If that does not return failure, use the return value as the 1.68 + fallback encoding. 1.69 + 1.70 + <li> Otherwise, if the referring style sheet or document has an encoding, 1.71 + use that as the fallback encoding. 1.72 + 1.73 + <li> Otherwise, use <code>utf-8</code> as the fallback encoding. 1.74 </ol> 1.75 1.76 + <p> Then, <a href="#decode"><i>decode</i></a> the byte stream using the 1.77 + fallback encoding. 1.78 + 1.79 + <p class=note> Note: the <a href="#decode"><i>decode</i></a> algorithm lets 1.80 + the byte order mark (BOM) take precedence, hence the usage of the term 1.81 + "fallback" above. 1.82 + 1.83 <p class=issue> Anne says that steps 4/5 should be an input to this 1.84 algorithm from the specs that define importing stylesheet, to make the 1.85 algorithm as a whole cleaner. Perhaps abstract it into the concept of an
2.1 --- a/css3-syntax/Overview.src.html Wed Feb 13 18:26:41 2013 -0800 2.2 +++ b/css3-syntax/Overview.src.html Wed Feb 13 18:35:14 2013 -0800 2.3 @@ -256,25 +256,23 @@ 2.4 and <a href="http://encoding.spec.whatwg.org/#decode"><dfn>decode</dfn></a> 2.5 are defined in the <a href="http://encoding.spec.whatwg.org/">Encoding Standard</a>. 2.6 2.7 + <p> 2.8 + First, determine the fallback encoding: 2.9 + 2.10 <ol> 2.11 <li> 2.12 - Let <var>encoding</var> be utf-8. 2.13 - 2.14 - <li> 2.15 If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header), 2.16 <i>get an encoding</i> for the specified value. 2.17 If that does not return failure, 2.18 - set <var>encoding</var> to the return value 2.19 - and jump to the last step of this algorithm. 2.20 + use the return value as the fallback encoding. 2.21 2.22 <li> 2.23 - Check the byte stream. If the first several bytes match the hex sequence 2.24 - 2.25 - <pre>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</pre> 2.26 - 2.27 - then <i>get an encoding</i> for the sequence of XX bytes, 2.28 - decoded per <code>windows-1252</code>, 2.29 - and let <var>temp</var> be the return value. 2.30 + Otherwise, check the byte stream. If the first several bytes match the hex sequence 2.31 + 2.32 + <pre>40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B</pre> 2.33 + 2.34 + then <i>get an encoding</i> for the sequence of <code>(not 22)*</code> bytes, 2.35 + decoded per <code>windows-1252</code>. 2.36 2.37 <p class='note'> 2.38 Note: Anything ASCII-compatible will do, so using <code>windows-1252</code> is fine. 2.39 @@ -286,32 +284,33 @@ 2.40 where the "…" is the sequence of bytes corresponding to the encoding's name. 2.41 2.42 <p> 2.43 - If <var>temp</var> is <code>utf-16</code> or <code>utf-16be</code>, 2.44 - set <var>temp</var> to <code>utf-8</code>. 2.45 - If <var>temp</var> is not failure, 2.46 - set <var>encoding</var> to it 2.47 - and jump to the last step. 2.48 + If the return value was <code>utf-16</code> or <code>utf-16be</code>, 2.49 + use <code>utf-8</code> as the fallback encoding; 2.50 + if it was anything else except failure, 2.51 + use the return value as the fallback encoding. 2.52 2.53 <p class='note'> 2.54 This mimics HTML <code><meta></code> behavior. 2.55 2.56 <li> 2.57 - <i>Get an encoding</i> for the value of the <code>charset</code> attribute on the <code><link></code> element or <code><?xml-stylesheet?></code> processing instruction that caused the style sheet to be included, if any. 2.58 + Otherwise, <i>get an encoding</i> for the value of the <code>charset</code> attribute on the <code><link></code> element or <code><?xml-stylesheet?></code> processing instruction that caused the style sheet to be included, if any. 2.59 If that does not return failure, 2.60 - set <var>encoding</var> to the return value 2.61 - and jump to the last step. 2.62 + use the return value as the fallback encoding. 2.63 2.64 <li> 2.65 - Set <var>encoding</var> to the encoding of the referring style sheet or document, 2.66 - if any. 2.67 + Otherwise, if the referring style sheet or document has an encoding, 2.68 + use that as the fallback encoding. 2.69 2.70 <li> 2.71 - <i>Decode</i> the byte stream using fallback encoding <var>encoding</var>. 2.72 - 2.73 - <p class='note'> 2.74 - Note: the <i>decode</i> algorithm lets the byte order mark (BOM) take precedence, 2.75 - hence the usage of the term "fallback" above. 2.76 + Otherwise, use <code>utf-8</code> as the fallback encoding. 2.77 </ol> 2.78 + 2.79 + <p> 2.80 + Then, <i>decode</i> the byte stream using the fallback encoding. 2.81 + 2.82 + <p class='note'> 2.83 + Note: the <i>decode</i> algorithm lets the byte order mark (BOM) take precedence, 2.84 + hence the usage of the term "fallback" above. 2.85 2.86 <p class='issue'> 2.87 Anne says that steps 4/5 should be an input to this algorithm from the specs that define importing stylesheet,