Initial commit of Tracking Selection Lists Editor's Draft
authorAndy Zeigler <andyzei@microsoft.com>
Tue, 03 Jan 2012 16:10:42 -0800
changeset 1 e5ea88440930
parent 0 b15192cf47f4
child 2 8ffb7f2dcdff
Initial commit of Tracking Selection Lists Editor's Draft
ED-tracking-tsl-20120103.html
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/ED-tracking-tsl-20120103.html	Tue Jan 03 16:10:42 2012 -0800
@@ -0,0 +1,368 @@
+<!DOCTYPE html>
+<html>
+<head>
+<title>Tracking Selection Lists</title>
+<meta charset="utf-8"/>
+<script src='http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js' class='remove'></script>
+<script class='remove'>
+var respecConfig = {
+specStatus:           "ED",
+shortName:            "url-filtering",
+edDraftURI:           "http://www.la-grange.net/2011/11/09/tracking/url-filtering-20111109",
+extraCSS:             ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css"],
+editors:  [
+{ name: "Andy Zeigler", url: "http://www.andyzeigler.com/",
+company: "Microsoft", companyURL: "http://www.microsoft.com/" },
+{ name: "Karl Dubost", url: "http://www.la-grange.net/karl/",
+company: "Opera Software", companyURL: "http://www.opera.com/" },
+],
+wg:           "Tracking Protection Working Group",
+wgURI:        "http://www.w3.org/2011/tracking-protection/",
+wgPublicList: "public-privacy",
+wgPatentURI:  "",
+};
+</script>
+</head>
+<body>
+
+<section id='abstract'>
+<p>The Tracking Selection Lists specification defines a format for interchangeable lists for blocking or allowing Web tracking elements and expected user-agent interpretation of this format.</p> 
+
+<p>A <dfn id="selection-list">selection list</dfn> contains parts of <a href="#dfn-third-party-uri">third-party URIs</a> that a browser may access automatically when referenced within a web page that a user deliberately visits. Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the <a href="#dfn-filter-list">selection list</a> limits the information other sites can collect about a user.</p>
+
+</section>
+
+<section>
+<h2>Conformance</h2>
+
+<p>As well as sections marked as non-normative, all authoring
+guidelines, diagrams, examples, and notes in this specification are
+non-normative. Everything else in this specification is normative. </p>
+
+<p>The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT,
+RECOMMENDED, MAY, and OPTIONAL in this specification are to be
+interpreted as described in [[!RFC2119]].</p>
+</section>
+
+<section>
+<h2>Introduction</h2>
+
+<p>This section is non-normative</p>
+
+<p>Today, consumers share information with more websites than the ones they see in the address bar in their browser. This is inherent in the design of the web and simply how the web works, and it has potentially unintended consequences. As consumers visit one site, many other sites receive information about their activities. For example, when a webpage includes a third-party image file—such as a “web beacon”—IP address information, cookies, and referrer data can be sent. A third-party script can have additional impact on user privacy and can collect arbitrary data from the first-party webpage.</p>
+
+<p>This situation results from how modern websites are built. Typically, a website today might bring together content from many other websites, leaving the impression that the website appears to be its own entity. When the browser calls any other website to request anything (an image, a cookie, HTML, a script that can execute), the browser explicitly provides information in order to get information. By limiting data requests to these sites, it is possible to limit the data available to these sites for collection and tracking.</p>
+
+<p>A selection list contains parts of third-party URIs that a browser may access automatically when referenced within a web page a user deliberately visits.</p>
+
+<p class="issue">Should TSLs also apply to 1<sup>st</sup>-party URIs? If so, there should probably be an option that does this – I think that by default, most of the rules you’d want to write are 3<sup>rd</sup>-party specific. There are valid use cases for 1<sup>st</sup>-party rules, such as CNAME’d DNS entries.</p>    
+
+<p class="issue">Karl Dubost: It assumes that tracking is made
+only through 3rd party uris.</p>
+
+<p class="issue">[andyzei] There is a difference between “expected” and “unexpected” tracking. 1<sup>st</sup>-party tracking is expected. 3<sup>rd</sup>-party tracking is not.  </p>
+
+<p>Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the selection list limits the information other sites can collect about a user.</p>      
+</section>
+
+<section>
+    <h2>Third-Party URIs</h2>
+
+<p>A third-party URI [[!URI]] is a URI with a (second-level) domain name that differs from that of the top-level containing document. When a user agent is sending an HTTP request to a URI, the server sends back a response possibly containing further URIs. Among these URIs, those with a different second-level domain name are considered, in that document, third-party URIs.</p>
+
+<p>A user agent must evaluate any URIs that indicate a sub-document—such as an iframe or any URIs defined in any sub-documents—as third-party with respect to the topmost document. </p>
+
+<p class="issue">Not testable. This needs to be framed as something implementable.</p>
+
+<p class="issue">"MUST evaluate" doesn't mean anything in that context.</p>
+
+<p>For example, consider a top-level document whose URI is http://www.microsoft.com. This page might contain an iframe whose src URI is http://www.example.com. If the page at http://www.example.com contains an img element whose src is http://www.example.com/img.png, the URI http://www.example.com/img.png is a third-party URI, as its domain name differs from that of the top-level page. </p>
+
+<p>A third-party download is any potential HTTP download request to a third-party URI. </p>
+
+<p>A user-agent must apply a selection list to third-party URIs only. </p>
+
+<p class="issue">Not testable. Apply a filter list is not an operation in this case.</p>
+
+<p class="issue">Need to clearly define "apply a filter list" / rules and frame requirements to match.</p>
+
+<h3 id="blocking-downloads">Blocking Downloads</h3>
+
+<p >When a user agent issues a request for a webpage and receives an HTTP status code that returns a document, and the user or user agent has chosen to apply a selection list, all third-party URIs that can generate a download request must be evaluated against this selection list. </p>
+
+<div class="issue">
+<p>incorrect sentence. Maybe something along the following paragraph will be the real implementable requirement. </p>
+<blockquote><p>(When the user and/or user agents have activated the filter list mechanism,) for each HTTP response sent by server, a user agent MUST drop any subsequent HTTP requests according to the rules defined in the filter list.</p></blockquote></div>
+
+<p>When a user agent blocks a download, that user agent should fire any DOM events pertaining to a download error, if applicable, and an exception, along with the URI value SHOULD be fired in the error console.</p>
+</section>
+
+
+<section>
+<h2>List Format</h2>
+
+<p >A selection list is a UTF-8 encoded text file that contains a header, comments, settings, and rules. Selection lists are parsed in a stateless manner across lines, meaning that the ordering of the lines has no effect on the meaning of the file. The only exception to this is the header, which must be the first line of the file. </p>
+
+
+<pre class="example">FilterList
+#
+# Line 1 is a header. 
+#
+# Lines 2-11 are comments. 
+# As a comment, any line that starts with a # character is ignored.
+#
+# Any line that begins with a <code>:</code> character is a setting, 
+# which is key-value pair.
+# The key-value pair <code>Expires = n</code> specifies 
+# to wait n days before checking for an update to the list.
+#
+# <strong>Using a setting</strong>.
+# Check for an update to the list in 3 days.
+
+<code>:Expires=3</code>  
+
+# <strong>Domain rule</strong>
+# Allow all URIs from the <code>example.com</code> domain name.
+
+<code>+d example.com</code>
+
+# <strong>Substring rule</strong>
+# Block any URI containing <code>spamspam</code>.
+
+<code>- spamspam</code>
+
+# <strong>Wildcard character</strong>
+# Block any URI that has a <code>foo</code> followed by a <code>bar</code>.
+
+<code>- foo*bar</code>
+
+# <strong>Domain rule</strong>
+# Block anything from <code>exampleexample.com</code>.
+
+<code>-d exampleexample.com</code>
+
+# <strong>Domain rule with optional path</strong>
+# Block any URI from <code>example.com</code> that contains the substring <code>bad.js</code> in the URI path.
+
+<code>-d example.com bad.js</code>
+</pre>
+
+<h3>Header</h3>
+
+<p >Selection lists must start with a single line that contains the string FilterList and is known as the header. The header line may start with the UTF-8 Byte Order Mark (BOM) (EF BB BF), which is ignored. Until this feature becomes standardized, vendors may use a prefix. For example, the Microsoft implementation uses msFilterList. </p>
+
+
+<h3>Comments</h3>
+
+<p >A comment line must start with a number sign (#) character. </p>
+
+<h3>Settings</h3>
+
+<p class="issue">Format about the settings doesn't define what the browser should do with the spaces. There is plenty of things to define here to make it implementable by browsers. Specifically in terms of error recovery or draconian mode. </p>
+
+<p >The selection list format supports settings in the form of key-value pairs. A settings line begins with a colon, (:) and has two string values separated by an equal sign (=). If a setting is not recognized, the user agent must ignore that setting. </p>
+
+<h4>Expires</h4>
+
+<p class="issue">Karl Dubost: The value seems arbitrary. Is there any rationale behind this range?</p>
+
+<p class="issue">[andyzei] It was chosen by anticipating the useful time range that a customer might want to set the value to. Having it bounded makes it easier to test.</p>
+
+<pre>Expires = n</pre>
+
+<p>The Expires setting defines how frequently (in <code>n</code> days) the user-agent will check for updates to the list.</p>
+<p>The value of <code>n</code> must be an integer between 1 and 30.</p>
+
+<p >The following list file requests that the user agent checks every 10 days (or the next time the user-agent is launched, if greater than 10 days) to see if there are updates to the list. </p>
+
+
+<pre>FilterList
+: Expires = 10
++ example.org</pre>
+
+<h3>Rules</h3>
+
+<p>Rules are the primary component of a selection list. A rule is a line in a selection list that changes the way the user-agent handles third-party content. </p>
+
+<p>Rules are matched against the URI of each third-party subdownload in a page. A URI that has a different second-level domain name than the URI in the address bar is a third-party URI. </p>
+
+<p>The basic format for a rule is as follows: </p>
+
+<pre>FilterList
+#
+# Allow rule 
++d string [string]
+#
+# Block rule
+- string</pre>
+
+<h4>Allow Rules</h4>
+
+
+<p >Allow rules allow content from the specified entity to function within the instance of the user agent. Allow rules must begin with a plus sign (+). Allow rules must be domain rules. </p>
+
+<h4>Block Rules</h4>
+
+<p >Block rules block content from the specified entity from functioning within the instance of the user agent. Block rules must begin with a minus sign (-). Block rules may be either domain rules or substring rules. </p>
+
+<h4>Domain Rules</h4>
+
+<p >Domain rules allow or block content on a particular domain. Domain rules must begin with the string <code>+d</code> (to allow content) or the string <code>-d</code> (to block content). For allow rules, the user-agent must evaluate the string specified in the domain part of the allow rule against the target URI, <strong>starting from the topmost domain label</strong>. An additional and optional string match may be specified to further limit the scope.</p>
+
+<p >For example, the followingallow domain rules allow the URI, <code>http://www.subdomain.example.com/file.html</code>. </p>
+
+<pre>+d example.com
++d subdomain.example.com</pre>
+
+<p >The following allow domain rules, with the optional string, also allow the URI, http://www.subdomain.example.com/file.html. </p>
+
+<pre>+d example.com file
++d example.com file.html
++d example.com html</pre>
+
+
+<p >The following allow domain rules fail to match and therefore fail to allow the URI, http://www.subdomain.example.com/file.html. </p>
+
+<pre>+d subdomain.example
+#  does not match starting at the topmost domain label
+#
++d othersubdomain.example.com
+#  not a complete match of specified domain labels
+#
++d example.com /path/file.html
+#  /path/file.html is not a substring of /file.html</pre>
+
+<p >For block rules, the user-agent must evaluate the string specified in the domain part of the block against <strong>any contiguous domain labels</strong>. </p>
+
+<p >For example, the following block domain rules block the URI, http://www.subdomain.example.com/file.html. </p>
+
+<pre>-d example.com
+-d subdomain.example.com</pre>
+
+
+<p>The following block domain rules, with the optional string, also block the URI, http://www.subdomain.example.com/file.html. </p>
+
+<pre>-d example.com file
+-d example.com file.html
+-d example.com html
+-d subdomain.example</pre>
+
+
+<p >The following block domain rules fail to match and therefore fail to block the URI, http://www.subdomain.example.com/file.html. </p>
+
+<pre>#
+-d othersubdomain.example.com
+#  not contiguous domain labels
+#
+-d example.com /path/file.html
+#  "/path/file.html" is not a substring of /file.html
+#</pre>
+
+<h4>Substring Rules</h4>
+
+<p >Substring rules match a substring in a URI, blocking content. For example, the following substring rules block the URI, http://www.example.com/test.html. </p>
+
+<pre>- example
+- exam
+- test.html
+- ex*le</pre>
+
+<p >However, the following substring rule does not match and therefore does not block the URI, http://www.example.com/test.html</p>
+
+<pre>- test2</pre>
+
+<h4>The Wildcard Character</h4>
+
+
+<p >The wildcard character (*) may be used within a substring rule. The wildcard character must match 0 or more of any character. </p>
+
+<p>Wildcard characters are greedy, meaning the wildcard will match as much text as possible. </p>
+
+
+<p >The wildcard character must not be used in the string representing the domain within a domain rule. The wildcard character may be used in the optional string part of a domain rule. </p>
+
+<p >The following example is valid because the wildcard character is used in the optional string part of the domain rule. </p>
+
+<pre>+d example.com sub*string</pre>
+
+
+<p >The following rule is invalid because the wildcard character is used in the domain part of the domain rule. </p>
+
+<pre># Invalid!
++d domain*.com substring </pre>
+</section>
+
+<section>
+    <h2>Processing Selection Lists</h2>
+
+<h3>Processing a Selection list</h3>
+
+<p >Selection lists may contain allow rules, block rules, and even duplicate rules that match the same URI. </p>
+
+<p >When a user agent evaluates a URI against a selection list, it must follow this algorithm: </p>
+
+
+<ol> 
+    <li>All allow rules in the selection listmust be processed first. No duplicate removal or other processing can be done on the rules.
+        <ul> 
+            <li>If the URI matches any allow rule, then the content at the URI must be allowed.</li>
+        </ul> 
+        </li> 
+    <li>All block rules must be processed. 
+        <ul> 
+            <li>If a URI matches any block rule, then the content at the URI must be blocked.</li> 
+        </ul> 
+        </li> 
+    <li>If no rule matches, then the content must be allowed.</li>
+</ol>
+
+<p >This algorithm effectively gives precedence to allow rules over block rules. </p>
+
+<h3>Processing Multiple Selection lists</h3>
+
+<p >If a user-agent supports the use of multiple selection lists simultaneously, then all allow rules from all selection listsmust be grouped together and all block rules from all selection listsmust be grouped together, such that when the user agent evaluates a URI it first evaluates all allow rules from all selection lists and then evaluates all block rules from all lists. User-agents may remove duplicate rules in lists provided that the meaning of the rules is maintained after the removal of duplicate rules. </p>
+</section>
+
+
+<section class='appendix'>
+<h2>Augmented Backus-Naur Form</h2>
+
+<p >The following example is an Augmented Backus-Naur Form (ABNF) [[ABNF]] for the Selection list format. </p>
+
+<pre>FilterList     =     Header [lines]
+Header         =     [UTF8BOM] "FilterList" EOL
+lines          =     line *(EOL [line])
+line           =     comment / key-value / rule
+comment        =     "#" *(VCHAR / WSP)
+key-value      =     ":" ALPHA 1*31(ALPHA/DIGIT) *WSP "=" *WSP 1*32(ALPHA/DIGIT)
+rule           =     allow-rule / block-rule
+allow-rule     =     "+" domain-exp
+block-rule     =     "-" domain-exp / substring-exp
+domain-exp     =     "d" 1*WSP string [substring-exp]
+substring-exp  =     1*WSP wcstring
+string         =     1*(ALPHA/DIGIT)
+wcstring       =     1*(ALPHA/DIGIT/"*")
+UTF8BOM        =     %xEF %xBB %xBF
+EOL            =     [CR] LF</pre>
+</section>
+
+<section id="glossary" class="appendix">
+<h2>Glossary</h2>
+</section>
+
+<section class="appendix">
+<h2>Issues</h2>
+
+<p class="issue">Interesting but what is the purpose of allowing something if there is no disallow rules for the same domain before.  Or maybe is it implied? In this case there is a need for an example with the two: block and allow.</p>
+
+<p class="issue">[andyzei] The only time that that's interesting is when you have multiple TPLs. If you want to write a TPL that ensures that no other TPL that the user has installed can block your site, then you can do that. Agreed there should be a better example of this.</p>
+</section>
+
+<section class="appendix">
+<h2>Acknowledgements</h2>
+
+<p>Many thanks to Rich Tibbet (Opera Software), Luca Venturi (Opera Software) for reviewing this document.</p>
+</section>
+
+</body>
+</html>