Tracking Selection Lists

The Tracking Selection Lists specification defines a format for interchangeable lists for blocking or allowing Web tracking elements and expected user-agent interpretation of this format.

A selection list contains parts of third-party URIs that a browser may access automatically when referenced within a Web page that a user deliberately visits. Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these Web sites and blocking resources from other Web pages, the selection list limits the information other sites can collect about a user.

Issue 2: Third-party URIs might be confusing when reading along the two other Tracking Protection WG documents. The XLink definition doesn't help either. The third party is vaguely defined in the compliance document with A "third party" is any party, in a specific network interaction, that cannot infer with high probability that the user knowingly and intentionally communicated with it.

Introduction

This section is non-normative

When accessing the Web, users will download Web page which are the agglomeration of multiple resources (visible and not visible). By accessing these resources, the users share information with the owner of these resources. This is inherent in the design of the Web and simply how the Web works, and it has potentially unintended consequences. As users visit one site, many other sites receive information about their activities. For example, when a Web page includes an image file (coming from a different domain), IP address information, cookies, and referrer data can be sent by the user's browser. A script can have additional impact on user privacy and could collect arbitrary data from the initial Web page.

Issue 3: We could create a very simple infography showing a simple HTTP request and its consequences to illustrate the paragraph above.

This situation results from how Web sites are built. Typically, a Web site today might bring together content from many other Web sites, leaving the impression that the Web site appears to be its own entity. When the browser calls any other Web site to request anything (an image, a cookie, HTML, a script that can execute), the browser explicitly provides information in order to get information. By limiting requests to these sites, it is possible to limit the data available to these sites, including those used for collection and tracking.

A selection list contains parts of third-party URIs that a browser may access automatically when referenced within a Web page a user deliberately visits.

Issue 4: Should TSLs also apply to 1^st-party URIs? If so, there should probably be an option that does this – I think that by default, most of the rules you’d want to write are 3^rd-party specific. There are valid use cases for 1^st-party rules, such as CNAME’d DNS entries.

Issue 5: Karl Dubost: It assumes that tracking is made only through 3rd party uris.

Issue 6: [andyzei] There is a difference between “expected” and “unexpected” tracking. 1^st-party tracking is expected. 3^rd-party tracking is not.

Issue 7: [karl] we should say entirely neutral with regards to the intent of blocking or not the URLs.

Rules in a selection list may change the way the user agent handles third-party content. By limiting the calls to these Web sites and blocking resources from other Web pages, the selection list limits the information other sites can collect about a user.

Third-Party URIs

A third-party URI [[!URI]] is a URI with a (second-level) domain name that differs from that of the top-level containing document. When a user agent is sending an HTTP request to a URI, the server sends back a response possibly containing further URIs. Among these URIs, those with a different second-level domain name are considered, in that document, third-party URIs.

A user agent must evaluate any URIs that indicate a sub-document—such as an iframe or any URIs defined in any sub-documents—as third-party with respect to the topmost document.

Issue 8: Not testable. This needs to be framed as something implementable.

Issue 9: "MUST evaluate" doesn't mean anything in that context.

For example, consider a top-level document whose URI is http://www.microsoft.com. This page might contain an iframe whose src URI is http://www.example.com. If the page at http://www.example.com contains an img element whose src is http://www.example.com/img.png, the URI http://www.example.com/img.png is a third-party URI, as its domain name differs from that of the top-level page.

A third-party download is any potential HTTP download request to a third-party URI.

A user-agent must apply a selection list to third-party URIs only.

Issue 10: Not testable. Apply a filter list is not an operation in this case.

Issue 11: Need to clearly define "apply a filter list" / rules and frame requirements to match.

Blocking Downloads

When a user agent issues a request for a Webpage and receives an HTTP status code that returns a document, and the user or user agent has chosen to apply a selection list, all third-party URIs that can generate a download request must be evaluated against this selection list.

Issue 12:

incorrect sentence. Maybe something along the following paragraph will be the real implementable requirement.

(When the user and/or user agents have activated the filter list mechanism,) for each HTTP response sent by server, a user agent MUST drop any subsequent HTTP requests according to the rules defined in the filter list.

When a user agent blocks a download, that user agent should fire any DOM events pertaining to a download error, if applicable, and an exception, along with the URI value SHOULD be fired in the error console.

List Format

A selection list is a UTF-8 encoded text file that contains a header, comments, settings, and rules. Selection lists are parsed in a stateless manner across lines, meaning that the ordering of the lines has no effect on the meaning of the file. The only exception to this is the header, which must be the first line of the file.

FilterList
#
# Line 1 is a header. 
#
# Lines 2-11 are comments. 
# As a comment, any line that starts with a # character is ignored.
#
# Any line that begins with a : character is a setting, 
# which is key-value pair.
# The key-value pair Expires = n specifies 
# to wait n days before checking for an update to the list.
#
# Using a setting.
# Check for an update to the list in 3 days.

:Expires=3  

# Domain rule
# Allow all URIs from the example.com domain name.

+d example.com

# Substring rule
# Block any URI containing spamspam.

- spamspam

# Wildcard character
# Block any URI that has a foo followed by a bar.

- foo*bar

# Domain rule
# Block anything from exampleexample.com.

-d exampleexample.com

# Domain rule with optional path
# Block any URI from example.com that contains the substring bad.js in the URI path.

-d example.com bad.js

Header

Selection lists must start with a single line that contains the string FilterList and is known as the header. The header line may start with the UTF-8 Byte Order Mark (BOM) (EF BB BF), which is ignored. Until this feature becomes standardized, vendors may use a prefix. For example, the Microsoft implementation uses msFilterList.

Comments

A comment line must start with a number sign (#) character.

Settings

Issue 13: Format about the settings doesn't define what the browser should do with the spaces. There is plenty of things to define here to make it implementable by browsers. Specifically in terms of error recovery or draconian mode.

The selection list format supports settings in the form of key-value pairs. A settings line begins with a colon, (:) and has two string values separated by an equal sign (=). If a setting is not recognized, the user agent must ignore that setting.

Expires

Issue 14: Karl Dubost: The value seems arbitrary. Is there any rationale behind this range?

Issue 15: [andyzei] It was chosen by anticipating the useful time range that a customer might want to set the value to. Having it bounded makes it easier to test.

Expires = n

The Expires setting defines how frequently (in n days) the user-agent will check for updates to the list.

The value of n must be an integer between 1 and 30.

The following list file requests that the user agent checks every 10 days (or the next time the user-agent is launched, if greater than 10 days) to see if there are updates to the list.

FilterList
: Expires = 10
+ example.org

Rules

Issue 18: What is the behavior for IP addresses?

Rules are the primary component of a selection list. A rule is a line in a selection list that changes the way the user-agent handles third-party content.

Rules are matched against the URI of each third-party subdownload in a page. A URI that has a different second-level domain name than the URI in the address bar is a third-party URI.

The basic format for a rule is as follows:

FilterList
#
# Allow rule 
+d string [string]
#
# Block rule
- string

Allow Rules

Allow rules allow content from the specified entity to function within the instance of the user agent. Allow rules must begin with a plus sign (+). Allow rules must be domain rules.

Block Rules

Block rules block content from the specified entity from functioning within the instance of the user agent. Block rules must begin with a minus sign (-). Block rules may be either domain rules or substring rules.

Domain Rules

Domain rules allow or block content on a particular domain. Domain rules must begin with the string +d (to allow content) or the string -d (to block content). For allow rules, the user-agent must evaluate the string specified in the domain part of the allow rule against the target URI, starting from the topmost domain label. An additional and optional string match may be specified to further limit the scope.

For example, the followingallow domain rules allow the URI, http://www.subdomain.example.com/file.html.

+d example.com
+d subdomain.example.com

The following allow domain rules, with the optional string, also allow the URI, http://www.subdomain.example.com/file.html.

+d example.com file
+d example.com file.html
+d example.com html

The following allow domain rules fail to match and therefore fail to allow the URI, http://www.subdomain.example.com/file.html.

+d subdomain.example
#  does not match starting at the topmost domain label
#
+d othersubdomain.example.com
#  not a complete match of specified domain labels
#
+d example.com /path/file.html
#  /path/file.html is not a substring of /file.html

For block rules, the user-agent must evaluate the string specified in the domain part of the block against any contiguous domain labels.

For example, the following block domain rules block the URI, http://www.subdomain.example.com/file.html.

-d example.com
-d subdomain.example.com

The following block domain rules, with the optional string, also block the URI, http://www.subdomain.example.com/file.html.

-d example.com file
-d example.com file.html
-d example.com html
-d subdomain.example

The following block domain rules fail to match and therefore fail to block the URI, http://www.subdomain.example.com/file.html.

#
-d othersubdomain.example.com
#  not contiguous domain labels
#
-d example.com /path/file.html
#  "/path/file.html" is not a substring of /file.html
#

Substring Rules

Substring rules match a substring in a URI, blocking content. For example, the following substring rules block the URI, http://www.example.com/test.html.

- example
- exam
- test.html
- ex*le

However, the following substring rule does not match and therefore does not block the URI, http://www.example.com/test.html

- test2

The Wildcard Character

The wildcard character (*) may be used within a substring rule. The wildcard character must match 0 or more of any character.

Wildcard characters are greedy, meaning the wildcard will match as much text as possible.

The wildcard character must not be used in the string representing the domain within a domain rule. The wildcard character may be used in the optional string part of a domain rule.

The following example is valid because the wildcard character is used in the optional string part of the domain rule.

+d example.com sub*string

The following rule is invalid because the wildcard character is used in the domain part of the domain rule.

# Invalid!
+d domain*.com substring

Conformance

Introduction

Third-Party URIs

Blocking Downloads

List Format

Header

Comments

Settings

Expires

Rules

Allow Rules

Block Rules

Domain Rules

Substring Rules

The Wildcard Character

Processing Selection Lists

Processing a Selection list

Processing Multiple Selection lists

Augmented Backus-Naur Form

Glossary

Issues

Acknowledgements