The Tracking Selection Lists specification defines a format for interchangeable lists for blocking or allowing Web tracking elements and expected user-agent interpretation of this format.

A selection list contains parts of third-party URIs that a browser may access automatically when referenced within a Web page that a user deliberately visits. Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these Web sites and blocking resources from other Web pages, the selection list limits the information other sites can collect about a user.

Issue 2: Third-party URIs might be confusing when reading along the two other Tracking Protection WG documents. The XLink definition doesn't help either. The third party is vaguely defined in the compliance document with A "third party" is any party, in a specific network interaction, that cannot infer with high probability that the user knowingly and intentionally communicated with it.

This document has not yet received consensus from the Tracking Protection WG to be published as a deliverable of this WG. The current document is a strawman proposal for the WG to better understand the benefits and caveats of this format.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [[!RFC2119]].

Introduction

This section is non-normative

When accessing the Web, users will download Web page which are the agglomeration of multiple resources (visible and not visible). By accessing these resources, the users share information with the owner of these resources. This is inherent in the design of the Web and simply how the Web works, and it has potentially unintended consequences. As users visit one site, many other sites receive information about their activities. For example, when a Web page includes an image file (coming from a different domain), IP address information, cookies, and referrer data can be sent by the user's browser. A script can have additional impact on user privacy and could collect arbitrary data from the initial Web page.

Issue 3: We could create a very simple infography showing a simple HTTP request and its consequences to illustrate the paragraph above.

This situation results from how Web sites are built. Typically, a Web site today might bring together content from many other Web sites, leaving the impression that the Web site appears to be its own entity. When the browser calls any other Web site to request anything (an image, a cookie, HTML, a script that can execute), the browser explicitly provides information in order to get information. By limiting requests to these sites, it is possible to limit the data available to these sites, including those used for collection and tracking.

A selection list contains parts of third-party URIs that a browser may access automatically when referenced within a Web page a user deliberately visits.

Issue 4: Should TSLs also apply to 1st-party URIs? If so, there should probably be an option that does this – I think that by default, most of the rules you’d want to write are 3rd-party specific. There are valid use cases for 1st-party rules, such as CNAME’d DNS entries.

Issue 5: Karl Dubost: It assumes that tracking is made only through 3rd party uris.

Issue 6: [andyzei] There is a difference between “expected” and “unexpected” tracking. 1st-party tracking is expected. 3rd-party tracking is not.  

Issue 7: [karl] we should say entirely neutral with regards to the intent of blocking or not the URLs.

Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these Web sites and blocking resources from other Web pages, the selection list limits the information other sites can collect about a user.

Third-Party URIs

A third-party URI [[!URI]] is a URI with a (second-level) domain name that differs from that of the top-level containing document. When a user agent is sending an HTTP request to a URI, the server sends back a response possibly containing further URIs. Among these URIs, those with a different second-level domain name are considered, in that document, third-party URIs.

A user agent must evaluate any URIs that indicate a sub-document—such as an iframe or any URIs defined in any sub-documents—as third-party with respect to the topmost document.

Issue 8: Not testable. This needs to be framed as something implementable.

Issue 9: "MUST evaluate" doesn't mean anything in that context.

For example, consider a top-level document whose URI is http://www.microsoft.com. This page might contain an iframe whose src URI is http://www.example.com. If the page at http://www.example.com contains an img element whose src is http://www.example.com/img.png, the URI http://www.example.com/img.png is a third-party URI, as its domain name differs from that of the top-level page.

A third-party download is any potential HTTP download request to a third-party URI.

A user-agent must apply a selection list to third-party URIs only.

Issue 10: Not testable. Apply a filter list is not an operation in this case.

Issue 11: Need to clearly define "apply a filter list" / rules and frame requirements to match.

Blocking Downloads

When a user agent issues a request for a Webpage and receives an HTTP status code that returns a document, and the user or user agent has chosen to apply a selection list, all third-party URIs that can generate a download request must be evaluated against this selection list.

Issue 12:

incorrect sentence. Maybe something along the following paragraph will be the real implementable requirement.

(When the user and/or user agents have activated the filter list mechanism,) for each HTTP response sent by server, a user agent MUST drop any subsequent HTTP requests according to the rules defined in the filter list.

When a user agent blocks a download, that user agent should fire any DOM events pertaining to a download error, if applicable, and an exception, along with the URI value SHOULD be fired in the error console.

List Format

A selection list is a UTF-8 encoded text file that contains a header, comments, settings, and rules. Selection lists are parsed in a stateless manner across lines, meaning that the ordering of the lines has no effect on the meaning of the file. The only exception to this is the header, which must be the first line of the file.

FilterList
#
# Line 1 is a header. 
#
# Lines 2-11 are comments. 
# As a comment, any line that starts with a # character is ignored.
#
# Any line that begins with a : character is a setting, 
# which is key-value pair.
# The key-value pair Expires = n specifies 
# to wait n days before checking for an update to the list.
#
# Using a setting.
# Check for an update to the list in 3 days.

:Expires=3  

# Domain rule
# Allow all URIs from the example.com domain name.

+d example.com

# Substring rule
# Block any URI containing spamspam.

- spamspam

# Wildcard character
# Block any URI that has a foo followed by a bar.

- foo*bar

# Domain rule
# Block anything from exampleexample.com.

-d exampleexample.com

# Domain rule with optional path
# Block any URI from example.com that contains the substring bad.js in the URI path.

-d example.com bad.js

Header

Selection lists must start with a single line that contains the string FilterList and is known as the header. The header line may start with the UTF-8 Byte Order Mark (BOM) (EF BB BF), which is ignored. Until this feature becomes standardized, vendors may use a prefix. For example, the Microsoft implementation uses msFilterList.

Comments

A comment line must start with a number sign (#) character.

Settings

Issue 13: Format about the settings doesn't define what the browser should do with the spaces. There is plenty of things to define here to make it implementable by browsers. Specifically in terms of error recovery or draconian mode.

The selection list format supports settings in the form of key-value pairs. A settings line begins with a colon, (:) and has two string values separated by an equal sign (=). If a setting is not recognized, the user agent must ignore that setting.

Expires

Issue 14: Karl Dubost: The value seems arbitrary. Is there any rationale behind this range?

Issue 15: [andyzei] It was chosen by anticipating the useful time range that a customer might want to set the value to. Having it bounded makes it easier to test.

Expires = n

The Expires setting defines how frequently (in n days) the user-agent will check for updates to the list.

The value of n must be an integer between 1 and 30.

The following list file requests that the user agent checks every 10 days (or the next time the user-agent is launched, if greater than 10 days) to see if there are updates to the list.

FilterList
: Expires = 10
+ example.org

Rules

Issue 18: What is the behavior for IP addresses?

Rules are the primary component of a selection list. A rule is a line in a selection list that changes the way the user-agent handles third-party content.

Rules are matched against the URI of each third-party subdownload in a page. A URI that has a different second-level domain name than the URI in the address bar is a third-party URI.

The basic format for a rule is as follows:

FilterList
#
# Allow rule 
+d string [string]
#
# Block rule
- string

Allow Rules

Allow rules allow content from the specified entity to function within the instance of the user agent. Allow rules must begin with a plus sign (+). Allow rules must be domain rules.

Block Rules

Block rules block content from the specified entity from functioning within the instance of the user agent. Block rules must begin with a minus sign (-). Block rules may be either domain rules or substring rules.

Domain Rules

Domain rules allow or block content on a particular domain. Domain rules must begin with the string +d (to allow content) or the string -d (to block content). For allow rules, the user-agent must evaluate the string specified in the domain part of the allow rule against the target URI, starting from the topmost domain label. An additional and optional string match may be specified to further limit the scope.

For example, the followingallow domain rules allow the URI, http://www.subdomain.example.com/file.html.

+d example.com
+d subdomain.example.com

The following allow domain rules, with the optional string, also allow the URI, http://www.subdomain.example.com/file.html.

+d example.com file
+d example.com file.html
+d example.com html

The following allow domain rules fail to match and therefore fail to allow the URI, http://www.subdomain.example.com/file.html.

+d subdomain.example
#  does not match starting at the topmost domain label
#
+d othersubdomain.example.com
#  not a complete match of specified domain labels
#
+d example.com /path/file.html
#  /path/file.html is not a substring of /file.html

For block rules, the user-agent must evaluate the string specified in the domain part of the block against any contiguous domain labels.

For example, the following block domain rules block the URI, http://www.subdomain.example.com/file.html.

-d example.com
-d subdomain.example.com

The following block domain rules, with the optional string, also block the URI, http://www.subdomain.example.com/file.html.

-d example.com file
-d example.com file.html
-d example.com html
-d subdomain.example

The following block domain rules fail to match and therefore fail to block the URI, http://www.subdomain.example.com/file.html.

#
-d othersubdomain.example.com
#  not contiguous domain labels
#
-d example.com /path/file.html
#  "/path/file.html" is not a substring of /file.html
#

Substring Rules

Substring rules match a substring in a URI, blocking content. For example, the following substring rules block the URI, http://www.example.com/test.html.

- example
- exam
- test.html
- ex*le

However, the following substring rule does not match and therefore does not block the URI, http://www.example.com/test.html

- test2

The Wildcard Character

The wildcard character (*) may be used within a substring rule. The wildcard character must match 0 or more of any character.

Wildcard characters are greedy, meaning the wildcard will match as much text as possible.

The wildcard character must not be used in the string representing the domain within a domain rule. The wildcard character may be used in the optional string part of a domain rule.

The following example is valid because the wildcard character is used in the optional string part of the domain rule.

+d example.com sub*string

The following rule is invalid because the wildcard character is used in the domain part of the domain rule.

# Invalid!
+d domain*.com substring 

Processing Selection Lists

Processing a Selection list

Selection lists may contain allow rules, block rules, and even duplicate rules that match the same URI.

When a user agent evaluates a URI against a selection list, it must follow this algorithm:

  1. All allow rules in the selection listmust be processed first. No duplicate removal or other processing can be done on the rules.
    • If the URI matches any allow rule, then the content at the URI must be allowed.
  2. All block rules must be processed.
    • If a URI matches any block rule, then the content at the URI must be blocked.
  3. If no rule matches, then the content must be allowed.

This algorithm effectively gives precedence to allow rules over block rules.

Processing Multiple Selection lists

If a user-agent supports the use of multiple selection lists simultaneously, then all allow rules from all selection listsmust be grouped together and all block rules from all selection listsmust be grouped together, such that when the user agent evaluates a URI it first evaluates all allow rules from all selection lists and then evaluates all block rules from all lists. User-agents may remove duplicate rules in lists provided that the meaning of the rules is maintained after the removal of duplicate rules.

Augmented Backus-Naur Form

The following example is an Augmented Backus-Naur Form (ABNF) [[ABNF]] for the Selection list format.

FilterList     =     Header [lines]
Header         =     [UTF8BOM] "FilterList" EOL
lines          =     line *(EOL [line])
line           =     comment / key-value / rule
comment        =     "#" *(VCHAR / WSP)
key-value      =     ":" ALPHA 1*31(ALPHA/DIGIT) *WSP "=" *WSP 1*32(ALPHA/DIGIT)
rule           =     allow-rule / block-rule
allow-rule     =     "+" domain-exp
block-rule     =     "-" domain-exp / substring-exp
domain-exp     =     "d" 1*WSP string [substring-exp]
substring-exp  =     1*WSP wcstring
string         =     1*(ALPHA/DIGIT)
wcstring       =     1*(ALPHA/DIGIT/"*")
UTF8BOM        =     %xEF %xBB %xBF
EOL            =     [CR] LF

Glossary

Issues

Issue 16: Interesting but what is the purpose of allowing something if there is no disallow rules for the same domain before.  Or maybe is it implied? In this case there is a need for an example with the two: block and allow.

Issue 17: [andyzei] The only time that that's interesting is when you have multiple TPLs. If you want to write a TPL that ensures that no other TPL that the user has installed can block your site, then you can do that. Agreed there should be a better example of this.

Acknowledgements

Many thanks to Rich Tibbet (Opera Software), Luca Venturi (Opera Software) for reviewing this document.