# HG changeset patch # User Tab Atkins Jr. # Date 1360809314 28800 # Node ID 550ce5a48ffb04e92fbe1778a650f2ec0e74df14 # Parent a1add5340f0fc68536cd9c8c38ec518b0c8806d4 [css3-syntax] Rewrite the charset determining step to be a little clearer, and closer to my normal language. diff -r a1add5340f0f -r 550ce5a48ffb css3-syntax/Overview.html --- a/css3-syntax/Overview.html Wed Feb 13 18:26:41 2013 -0800 +++ b/css3-syntax/Overview.html Wed Feb 13 18:35:14 2013 -0800 @@ -536,21 +536,21 @@ id=decode>decode are defined in the Encoding Standard. +

First, determine the fallback encoding: +

    -
  1. Let encoding be utf-8. -
  2. If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header), get an encoding for the specified - value. If that does not return failure, set encoding to the - return value and jump to the last step of this algorithm. - -
  3. Check the byte stream. If the first several bytes match the hex - sequence -
    40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B
    + value. If that does not return failure, use the return value as the + fallback encoding. + +
  4. Otherwise, check the byte stream. If the first several bytes match + the hex sequence +
    40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B
    then get an encoding for the - sequence of XX bytes, decoded per windows-1252, and let - temp be the return value. + sequence of (not 22)* bytes, decoded per + windows-1252.

    Note: Anything ASCII-compatible will do, so using windows-1252 is fine. @@ -558,29 +558,32 @@ the string "@charset "…";", where the "…" is the sequence of bytes corresponding to the encoding's name. -

    If temp is utf-16 or utf-16be, - set temp to utf-8. If temp is not - failure, set encoding to it and jump to the last step. +

    If the return value was utf-16 or utf-16be, + use utf-8 as the fallback encoding; if it was anything else + except failure, use the return value as the fallback encoding.

    This mimics HTML <meta> behavior. -

  5. Get an encoding for the value - of the charset attribute on the <link> - element or <?xml-stylesheet?> processing instruction that - caused the style sheet to be included, if any. If that does not return - failure, set encoding to the return value and jump to the last - step. - -
  6. Set encoding to the encoding of the referring style sheet - or document, if any. - -
  7. Decode the byte stream using fallback - encoding encoding. -

    Note: the decode algorithm - lets the byte order mark (BOM) take precedence, hence the usage of the - term "fallback" above. +

  8. Otherwise, get an encoding for + the value of the charset attribute on the + <link> element or <?xml-stylesheet?> + processing instruction that caused the style sheet to be included, if + any. If that does not return failure, use the return value as the + fallback encoding. + +
  9. Otherwise, if the referring style sheet or document has an encoding, + use that as the fallback encoding. + +
  10. Otherwise, use utf-8 as the fallback encoding.
+

Then, decode the byte stream using the + fallback encoding. + +

Note: the decode algorithm lets + the byte order mark (BOM) take precedence, hence the usage of the term + "fallback" above. +

Anne says that steps 4/5 should be an input to this algorithm from the specs that define importing stylesheet, to make the algorithm as a whole cleaner. Perhaps abstract it into the concept of an diff -r a1add5340f0f -r 550ce5a48ffb css3-syntax/Overview.src.html --- a/css3-syntax/Overview.src.html Wed Feb 13 18:26:41 2013 -0800 +++ b/css3-syntax/Overview.src.html Wed Feb 13 18:35:14 2013 -0800 @@ -256,25 +256,23 @@ and decode are defined in the Encoding Standard. +

+ First, determine the fallback encoding: +

  1. - Let encoding be utf-8. - -
  2. If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header), get an encoding for the specified value. If that does not return failure, - set encoding to the return value - and jump to the last step of this algorithm. + use the return value as the fallback encoding.
  3. - Check the byte stream. If the first several bytes match the hex sequence - -
    40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B
    - - then get an encoding for the sequence of XX bytes, - decoded per windows-1252, - and let temp be the return value. + Otherwise, check the byte stream. If the first several bytes match the hex sequence + +
    40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B
    + + then get an encoding for the sequence of (not 22)* bytes, + decoded per windows-1252.

    Note: Anything ASCII-compatible will do, so using windows-1252 is fine. @@ -286,32 +284,33 @@ where the "…" is the sequence of bytes corresponding to the encoding's name.

    - If temp is utf-16 or utf-16be, - set temp to utf-8. - If temp is not failure, - set encoding to it - and jump to the last step. + If the return value was utf-16 or utf-16be, + use utf-8 as the fallback encoding; + if it was anything else except failure, + use the return value as the fallback encoding.

    This mimics HTML <meta> behavior.

  4. - Get an encoding for the value of the charset attribute on the <link> element or <?xml-stylesheet?> processing instruction that caused the style sheet to be included, if any. + Otherwise, get an encoding for the value of the charset attribute on the <link> element or <?xml-stylesheet?> processing instruction that caused the style sheet to be included, if any. If that does not return failure, - set encoding to the return value - and jump to the last step. + use the return value as the fallback encoding.
  5. - Set encoding to the encoding of the referring style sheet or document, - if any. + Otherwise, if the referring style sheet or document has an encoding, + use that as the fallback encoding.
  6. - Decode the byte stream using fallback encoding encoding. - -

    - Note: the decode algorithm lets the byte order mark (BOM) take precedence, - hence the usage of the term "fallback" above. + Otherwise, use utf-8 as the fallback encoding.

+ +

+ Then, decode the byte stream using the fallback encoding. + +

+ Note: the decode algorithm lets the byte order mark (BOM) take precedence, + hence the usage of the term "fallback" above.

Anne says that steps 4/5 should be an input to this algorithm from the specs that define importing stylesheet,