[css3-syntax] Rewrite the charset determining step to be a little clearer, and closer to my normal language.

Wed, 13 Feb 2013 18:35:14 -0800

author
Tab Atkins Jr. <jackalmage@gmail.com>
date
Wed, 13 Feb 2013 18:35:14 -0800
changeset 7465
550ce5a48ffb
parent 7464
a1add5340f0f
child 7466
b15ae85d44af

[css3-syntax] Rewrite the charset determining step to be a little clearer, and closer to my normal language.

css3-syntax/Overview.html file | annotate | diff | comparison | revisions
css3-syntax/Overview.src.html file | annotate | diff | comparison | revisions
     1.1 --- a/css3-syntax/Overview.html	Wed Feb 13 18:26:41 2013 -0800
     1.2 +++ b/css3-syntax/Overview.html	Wed Feb 13 18:35:14 2013 -0800
     1.3 @@ -536,21 +536,21 @@
     1.4     id=decode>decode</dfn></a> are defined in the <a
     1.5     href="http://encoding.spec.whatwg.org/">Encoding Standard</a>.
     1.6  
     1.7 +  <p> First, determine the fallback encoding:
     1.8 +
     1.9    <ol>
    1.10 -   <li> Let <var>encoding</var> be utf-8.
    1.11 -
    1.12     <li> If HTTP or equivalent protocol defines an encoding (e.g. via the
    1.13      charset parameter of the Content-Type header), <a
    1.14      href="#get-an-encoding"><i>get an encoding</i></a> for the specified
    1.15 -    value. If that does not return failure, set <var>encoding</var> to the
    1.16 -    return value and jump to the last step of this algorithm.
    1.17 -
    1.18 -   <li> Check the byte stream. If the first several bytes match the hex
    1.19 -    sequence
    1.20 -    <pre>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</pre>
    1.21 +    value. If that does not return failure, use the return value as the
    1.22 +    fallback encoding.
    1.23 +
    1.24 +   <li> Otherwise, check the byte stream. If the first several bytes match
    1.25 +    the hex sequence
    1.26 +    <pre>40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B</pre>
    1.27      then <a href="#get-an-encoding"><i>get an encoding</i></a> for the
    1.28 -    sequence of XX bytes, decoded per <code>windows-1252</code>, and let
    1.29 -    <var>temp</var> be the return value.
    1.30 +    sequence of <code>(not 22)*</code> bytes, decoded per
    1.31 +    <code>windows-1252</code>.
    1.32      <p class=note> Note: Anything ASCII-compatible will do, so using
    1.33       <code>windows-1252</code> is fine.
    1.34  
    1.35 @@ -558,29 +558,32 @@
    1.36       the string "<code>@charset "…";</code>", where the "…" is the
    1.37       sequence of bytes corresponding to the encoding's name.
    1.38  
    1.39 -    <p> If <var>temp</var> is <code>utf-16</code> or <code>utf-16be</code>,
    1.40 -     set <var>temp</var> to <code>utf-8</code>. If <var>temp</var> is not
    1.41 -     failure, set <var>encoding</var> to it and jump to the last step.
    1.42 +    <p> If the return value was <code>utf-16</code> or <code>utf-16be</code>,
    1.43 +     use <code>utf-8</code> as the fallback encoding; if it was anything else
    1.44 +     except failure, use the return value as the fallback encoding.
    1.45  
    1.46      <p class=note> This mimics HTML <code>&lt;meta></code> behavior.
    1.47  
    1.48 -   <li> <a href="#get-an-encoding"><i>Get an encoding</i></a> for the value
    1.49 -    of the <code>charset</code> attribute on the <code>&lt;link></code>
    1.50 -    element or <code>&lt;?xml-stylesheet?></code> processing instruction that
    1.51 -    caused the style sheet to be included, if any. If that does not return
    1.52 -    failure, set <var>encoding</var> to the return value and jump to the last
    1.53 -    step.
    1.54 -
    1.55 -   <li> Set <var>encoding</var> to the encoding of the referring style sheet
    1.56 -    or document, if any.
    1.57 -
    1.58 -   <li> <a href="#decode"><i>Decode</i></a> the byte stream using fallback
    1.59 -    encoding <var>encoding</var>.
    1.60 -    <p class=note> Note: the <a href="#decode"><i>decode</i></a> algorithm
    1.61 -     lets the byte order mark (BOM) take precedence, hence the usage of the
    1.62 -     term "fallback" above.
    1.63 +   <li> Otherwise, <a href="#get-an-encoding"><i>get an encoding</i></a> for
    1.64 +    the value of the <code>charset</code> attribute on the
    1.65 +    <code>&lt;link></code> element or <code>&lt;?xml-stylesheet?></code>
    1.66 +    processing instruction that caused the style sheet to be included, if
    1.67 +    any. If that does not return failure, use the return value as the
    1.68 +    fallback encoding.
    1.69 +
    1.70 +   <li> Otherwise, if the referring style sheet or document has an encoding,
    1.71 +    use that as the fallback encoding.
    1.72 +
    1.73 +   <li> Otherwise, use <code>utf-8</code> as the fallback encoding.
    1.74    </ol>
    1.75  
    1.76 +  <p> Then, <a href="#decode"><i>decode</i></a> the byte stream using the
    1.77 +   fallback encoding.
    1.78 +
    1.79 +  <p class=note> Note: the <a href="#decode"><i>decode</i></a> algorithm lets
    1.80 +   the byte order mark (BOM) take precedence, hence the usage of the term
    1.81 +   "fallback" above.
    1.82 +
    1.83    <p class=issue> Anne says that steps 4/5 should be an input to this
    1.84     algorithm from the specs that define importing stylesheet, to make the
    1.85     algorithm as a whole cleaner. Perhaps abstract it into the concept of an
     2.1 --- a/css3-syntax/Overview.src.html	Wed Feb 13 18:26:41 2013 -0800
     2.2 +++ b/css3-syntax/Overview.src.html	Wed Feb 13 18:35:14 2013 -0800
     2.3 @@ -256,25 +256,23 @@
     2.4  		and <a href="http://encoding.spec.whatwg.org/#decode"><dfn>decode</dfn></a>
     2.5  		are defined in the <a href="http://encoding.spec.whatwg.org/">Encoding Standard</a>.
     2.6  
     2.7 +	<p>
     2.8 +		First, determine the fallback encoding:
     2.9 +
    2.10  	<ol>
    2.11  		<li>
    2.12 -			Let <var>encoding</var> be utf-8.
    2.13 -
    2.14 -		<li>
    2.15  			If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header),
    2.16  			<i>get an encoding</i> for the specified value.
    2.17  			If that does not return failure,
    2.18 -			set <var>encoding</var> to the return value
    2.19 -			and jump to the last step of this algorithm.
    2.20 +			use the return value as the fallback encoding.
    2.21  
    2.22  		<li>
    2.23 -			Check the byte stream. If the first several bytes match the hex sequence
    2.24 -
    2.25 -			<pre>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</pre>
    2.26 -
    2.27 -			then <i>get an encoding</i> for the sequence of XX bytes,
    2.28 -			decoded per <code>windows-1252</code>,
    2.29 -			and let <var>temp</var> be the return value.
    2.30 +			Otherwise, check the byte stream. If the first several bytes match the hex sequence
    2.31 +
    2.32 +			<pre>40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B</pre>
    2.33 +
    2.34 +			then <i>get an encoding</i> for the sequence of <code>(not 22)*</code> bytes,
    2.35 +			decoded per <code>windows-1252</code>.
    2.36  
    2.37  			<p class='note'>
    2.38  				Note: Anything ASCII-compatible will do, so using <code>windows-1252</code> is fine.
    2.39 @@ -286,32 +284,33 @@
    2.40  				where the "…" is the sequence of bytes corresponding to the encoding's name.
    2.41  
    2.42  			<p>
    2.43 -				If <var>temp</var> is <code>utf-16</code> or <code>utf-16be</code>,
    2.44 -				set <var>temp</var> to <code>utf-8</code>.
    2.45 -				If <var>temp</var> is not failure,
    2.46 -				set <var>encoding</var> to it
    2.47 - 				and jump to the last step.
    2.48 +				If the return value was <code>utf-16</code> or <code>utf-16be</code>,
    2.49 +				use <code>utf-8</code> as the fallback encoding;
    2.50 +				if it was anything else except failure,
    2.51 +				use the return value as the fallback encoding.
    2.52  
    2.53  			<p class='note'>
    2.54  				This mimics HTML <code>&lt;meta></code> behavior.
    2.55  
    2.56  		<li>
    2.57 -			<i>Get an encoding</i> for the value of the <code>charset</code> attribute on the <code>&lt;link></code> element or <code>&lt;?xml-stylesheet?></code> processing instruction that caused the style sheet to be included, if any.
    2.58 +			Otherwise, <i>get an encoding</i> for the value of the <code>charset</code> attribute on the <code>&lt;link></code> element or <code>&lt;?xml-stylesheet?></code> processing instruction that caused the style sheet to be included, if any.
    2.59  			If that does not return failure,
    2.60 -			set <var>encoding</var> to the return value
    2.61 -			and jump to the last step.
    2.62 +			use the return value as the fallback encoding.
    2.63  
    2.64  		<li>
    2.65 -			Set <var>encoding</var> to the encoding of the referring style sheet or document,
    2.66 -			if any.
    2.67 +			Otherwise, if the referring style sheet or document has an encoding,
    2.68 +			use that as the fallback encoding.
    2.69  
    2.70  		<li>
    2.71 -			<i>Decode</i> the byte stream using fallback encoding <var>encoding</var>.
    2.72 -
    2.73 -			<p class='note'>
    2.74 -				Note: the <i>decode</i> algorithm lets the byte order mark (BOM) take precedence,
    2.75 -				hence the usage of the term "fallback" above.
    2.76 +			Otherwise, use <code>utf-8</code> as the fallback encoding.
    2.77  	</ol>
    2.78 +		
    2.79 +	<p>
    2.80 +		Then, <i>decode</i> the byte stream using the fallback encoding.
    2.81 +
    2.82 +	<p class='note'>
    2.83 +		Note: the <i>decode</i> algorithm lets the byte order mark (BOM) take precedence,
    2.84 +		hence the usage of the term "fallback" above.
    2.85  
    2.86  	<p class='issue'>
    2.87  		Anne says that steps 4/5 should be an input to this algorithm from the specs that define importing stylesheet,

mercurial