This document defines directives for the Content Security Policy mechanism to declare a set of input protections for a web resource's user interface, defines a non-normative set of heuristics for Web user agents to implement these input protections, and a reporting mechanism for when they are triggered.

This is the First Public Working Draft of the User Interface Safety Directives for Content Security Policy. [CSP]

Portions of the technology described in this document were originally developed as part of X-Frame-Options [XFRAMEOPTIONS], the ClearClick module of the Mozilla Firefox add-on NoScript, [CLEARCLICK] and in the InContext system implemented experimentally in Internet Explorer [INCONTEXT].

In addition to the documents in the W3C Web Application Security working group, the work on this document is also informed by the work of the IETF websec working group, particularly that working group's requirements document: draft-hodges-websec-framework-reqs.

Introduction

This document defines User Interface Safety directives for Content Security Policy, a mechanism web applications can use to mitigate some of the risks of User Interface (UI) Redressing [UI-REDRESS] and Clickjacking [UI-REDRESS] vulnerabilities that can lead to fraudulent actions not intended by the user.

Content Security Policy (CSP) is a declarative policy that lets the authors (or server administrators) of a web application restrict from where an application can load resources. This document defines directives to restrict display of or user interactin with a resource when it is in an embedded context. A user agent may implement the core directives of CSP independently from the directives in this specification, but this specification uses the policy conveyance and reporting mechanisms described in CSP. The intrepretation of terms imported into this document from CSP may vary depending on the version implemented by the user agent. For example, a source-expression in Content Security Policy 1.0 is at the granularity of an origin [ORIGIN] but may be more granular in future versions of the core Content Security Policy.

Application authors SHOULD transmit the directives in this specification as part of a single, complete Content Security Policy, as indicated by that specification.

In some UI Redressing attacks (also known as Clickjacking), a malicious web application presents a user interface of another web application in a manipulated context to the user, e.g. by partially obscuring the genuine user interface with opaque layers on top, hence tricking the user to click on a button out of context.

Existing anti-clickjacking measures including frame-busting [FRAMEBUSTING] codes and X-Frame-Options are fundamentally incompatible with embeddable third-party widgets, and insufficient to defend against timing-based attack vectors.

The User Interface Safety directives encompass the policies defined in X-Frame-Options and also provide a new mechanism to allow web applications to enable heuristic input protections for its user interfaces on user agents.

To mitigate UI redressing, for example, a web application can request that a user interface element should be fully visible for a minimum period of time before a user input can be delivered.

The User Interface Safety directive can often be applied to existing applications with few or no changes, but the heuristic hints supplied by the policy may require considerable experimental fine-tuning to achieve an acceptable error rate.

This specification obsoletes X-Frame-Options. Resources may supply an X-Frame-Options header in addition to a Content-Security-Policy header to indicate policy to user agents that do not implement the directives in this specification. A user agent that understands the directives in this document SHOULD ignore the X-Frame-Options header, when present, if User Interface Safety directives are also present in a Content-Security-Policy header. This is to allow resources to only be embedded if the mechanisms described in this specification are enforced, and more restrictive X-Frame-Options policies applied otherwise.

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("MUST", "SHOULD", "MAY", etc) used in introducing the algorithm.

A conformant user agent is one that implements all the requirements listed in this specification that are applicable to user-agents. Treatment of the input-protection directive values are at the discretion of the user agent.

A conformant server is one that implements all the requirements listed in this specification that are applicable to servers.

Terminology

This section defines several terms used throughout the document.

The term security policy, or simply policy, for the purposes of this specification refers to either:

  1. a set of security preferences for restricting the behavior of content within a given resource, or
  2. a fragment of text that codifies these preferences.

The security policies defined by this document are applied by a user agent on a per-resource representation basis. Specifically, when a user agent receives a policy along with the representation of a given resource, that policy applies to that resource representation only. This document often referes to that resource representation as the protected resource.

A server transmits its security policy for a particular resource as a collection of directives, such as default-src 'self', each of which controls a specific set of privileges for a document rendered by the user agent. More details are provided in the directives section.

A directive consists of a directive name, which indicates the privileges controlled by the directive, and a directive value, which specifies the restrictions the policy imposes on those privileges.

An ancestor is any resource between the protected resource and the top of the window frame tree; for example, if A embeds B which embeds C, both A and B are ancestors of C. If A embeds both B and C, B is not an ancestor of C, but A still is.

The term origin is defined in the Origin specification. [ORIGIN]

The term URI is defined in the URI specification. [[!URI]]

The <iframe>, <object>, <embed>, and <frame> elements are defined in the HTML5 standard. [[!HTML5]].

The <applet> element is defined in the HTML 4.01 standard. [[!HTML401]].

The Augmented Backus-Naur Form (ABNF) notation used in this document is specified in RFC 5234. [[!ABNF]]

The following core rules are included by reference, as defined in [ABNF Appendix B.1]: ALPHA (letters), DIGIT (decimal 0-9), WSP (white space) and VCHAR (printing characters).

The OWS rule is used where zero or more linear whitespace octets might appear. OWS SHOULD either not be produced or be produced as a single SP. Multiple OWS octets that occur within field-content SHOULD either be replaced with a single SP or transformed to all SP octets (each octet other than SP replaced with SP) before interpreting the field value or forwarding the message downstream.

OWS            = *( SP / HTAB / obs-fold )
               ; "optional" whitespace
obs-fold       = CRLF ( SP / HTAB )
               ; obsolete line folding

How to define source-expression, host-source, etc. that may have different definitions depending on version of CSP? Basically, how do we reference versions of CSP that may not yet exist?

Directives

This section describes the content security policy directives introduced in this specification.

frame-options

The frame-options directive indicates whether the user-agent should embed the resource using a frame, iframe, object, embed or applet tag, or equivalent functionality in non-HTML resources. Resources can use this to avoid many UI Redressing attacks by ensuring they are not embedded into other sites. The syntax for the name and value of the directive are described by the following ABNF grammar:

directive-name = "frame-options"
directive-value = ('deny' / 'self' / 1*1<host-source> ['self']) / ('deny' / 'self' / 1*1<host-source>) 'top-only'

Should we use source-expression rather than host-source here? host-source provides compatibility with current granularity of X-Frame-Options, but a source-expression will allow seamless forward-evolution to more granular expressions in CSP 1.1.

Unlike policies defined in Content Security Policy, the frame-options directive is not subject to the default-src directive. If this directive is not explicitly stated in the policy its value is assumed to be "*".

If the directive-value contains the keyword-source 'deny', the resource cannot be displayed in an embedded context, regardless of the origin attempting to do so, and all other values in the directive are ignored.

If the directive-value does not contain 'deny' it may contain the keyword-source 'self' alone to indicate that the resource may be embedded only if all ancestors are in the same origin as the protected resource.

If the directive-value does not contain the 'deny' keyword-source, a host-source value indicates an origin that is a valid ancestor for the resource. No more than one additional host-source may be specified, with the exception of the keyword-source 'self'.

If the 'top-only' keyword-source is specified, only the origin of the top-level browsing context is checked, not the full window frame tree of ancestors. This provides compatibility with X-Frame-Options but may introduce vulnerabilities in some cases as discussed in [SECURITY CONSIDERATIONS]. When 'top-only' is specified, a host-source may not be combined with 'self'.

input-protection

The input-protection directive, if present, instructs the user agent to apply the heuristic UI redressing protections described in Section 4 to user input events, such as click, keypress, touch, and drag, before they are delivered to the resource.

If set as part of a Content-Security-Policy, triggering of the hueristic should cancel delivery of the UI event to the target and cause a violation report to be sent. If set as part of a Content-Security-Policy-Report-Only, triggering of the heuristic should result in the event being delivered with the unsafe attribute on the UIEvent set to true and cause a violation report to be sent.

The optional directive values allow resource authors to provide hints to the heuristic to improve accuracy.

BWH: Consolidated the previous input-protection and input-protection-hints directives here. input-protection previously just had "block" and "report-only" options, which are covered by the two different types of CSP header already, without introducing new keyword-sources.

Also removed setting of unsafe attribute from the enforced directive because the event should be cancelled. Do we want to raise another type of event or allow a callback to be specified when an action is blocked?

directive-name    = "input-protection"
directive-value   = ["element-id=" name] ["ui-height=" num-val] ["ui-width=" num-val] ["ui-delay=" num-val] ["tolerance=" num-val]

If the policy does not contain a value for this directive or any of the optional values are absent, the user agent should apply default values as described in Section XXX. A user agent MAY ignore any or all values in input-protection.

element-id is the name of a user interface element in the DOM. If the policy does not contain an explicit element-id, the user agent should apply protection to the body element of the resource.

ui-height is a numeric value that defines the height of the viewing area to be used for performing the screenshot comparision.

ui-width is a numeric value that defines the width of the viewing area to be used for performing the screenshot comparision.

ui-delay is a numeric value that specifies the delay time, in milliseconds, used in the input protection heuristic.

Delay after display, or delay after mouseover?

tolerance is a numeric value from 0-99 that defines the threshold at which the screenshot comparison procedure of the input protection heuristic triggers a violation. A value of 0 indicates that no difference between the two images is permitted. A value of 99 provides little to no practical protection.

report-uri

Review this section to ensure that privacy violations do not occur - is any of this information not normally available to the DOM of an embedded resource?

The report-uri directive specifies a URI to which the user agent sends reports about policy violation.

The syntax for the name and value of this directive and the algorithm to prepare a report are described by Content Security Policy. [REF - fwd compatible]

The core Content Security Policy specification provides directives to rectrict from where external content may be loaded. As such, violation reports include a blocked-uri key/value pair that specifies the attempted resource load that was blocked by the policy.

As this is not applicable to the directives in this document, the following additional steps MUST be added to the algoritihm defined in Content Security Policy to prepare a violation report:

In step 1, when preparing the JSON object violation-object, add the following keys and values to the csp-report: [RFC4627]

If the violation is of the frame-options directive, add the following keys and values:

If the violation is of the input-protection directive, add the following keys and values:

What standard defines these attributes?

If the target of an UIEvent which triggers an input-protection violoation has an explictly-set id attribute:

Otherwise, if the target element does not have an explicit id attribute:

DOM interface

This specification introduces a new attribute for the UIEvent interface introduced in DOM Level 2.

unsafe attribute for the UIEvent interface

unsafe of type boolean, readonly
This is a non-configurable boolean property of input event objects. The value should be "true" if a violation occurred. The value should not not be set unless triggered by user initiated actions.

The unsafe attribute allows web applications to monitor and immediately respond to suspect violations in the report-only mode. Applications may also use this interface for capability detection. For example, a web application may monitor user inputs on a payment button element like this:

document.getElementById('payment-button').addEventListener("click", function(eventObj) {
  if ("unsafe" in eventObj) {
    if (eventObj.unsafe == true) {
      return reportUnsafeOrShowDialog();
    }
  }
  makePayment();
};

Input Protection Heuristic

This section is non-normative. The algorithm described here can be implemented mostly in terms of HTML5 constructs, but requries the ability to monitor and intercept actions in the rendering of a resource and delivery of events to that resource. User agents may apply equivalent protections using means more optimized for their implementation details, may ignore recommendations where the browsing environment eliminates certain classes of attack, (e.g. the cursor sanity check in a touch-only environment) or may implement some features in terms of the underlying operating system or platform rather than directly in the user agent.

Algorithm Description

  1. Listener registration - Register a "global" capturing event listener for mouse button, tapping, keyboard, drag & drop and focus events, which must be guaranteed to run before any other event handler of the same kind and therefore be able to prevent any event from being handled by the content, if needed. CBC: in order to guarantee the "first to process' event listener requirement and reduce registration overhead, ClearClick adds its listener to the Mozilla-specific DocShell object which is the immediate container of the topmost DOM window per any given tab. A crossbrowser approach likely to work is registering the listener on the topmost DOM window itself before any script has a chance to run.
  2. Fast-track bypass - Whenever the listener is called, check whether the event target or its owner document are flagged as "unlocked". If either is, return early. CBC: ClearClick uses an expando property to flag DOM nodes and windows, relying on a feature of Mozilla's chrome-exposed DOM wrappers which prevents content from seeing or tamper with expando properties set by privileged code. Other browsers may require different procedures to safely annotate documents and other DOM nodes. Furthermore, this and most of the remaining steps assume our listener can examine and manipulate any DOM node or window independently from its origin, bypassing SOP. This privilege should be granted by the listener having being registered by privileged (browser extension) code.
  3. Parent chain check - Check whether the event target is either a child of a nested document or a plugin content element (EMBED, APPLET or OBJECT). If it is not, or it is an embedded document belonging to a same-site parent chain (i.e. it and all its parents are from the same origin), flag the document as "unlocked" and return. Notice that the original Clickjacking demo by Hansen & Grossman worked despite the Flash content being served same-site: since plugins may follow type-specific origin policies, we never return early at this stage when interacting with plugin content, even if embedded same-site.
  4. Rapid fire check - Check whether the previous event we had observed was the same type on a document from a different origin, happened within the past 800ms (quarantine time). If it was, we assume a "Rapid fire" attack (e.g. the user has been tricked into repeatedly click on the same or a predictable location in a fast succession while the document gets changed under his mouse pointer) : halve the quarantine time and go to step 8. If next interaction happens with a different document, the quarantine time will be reset.
  5. Cursor sanity check - By querying computed-style with the ":hover" pseudo-class on the element (if the target is plugin content) or on the host frame element and its ancestors (if the target is a nested document), check whether the cursor has been hidden or changed to an possibly attacker-provided bitmap: if it has, go to step 7. This provides protection against "Phantom cursor" attacks, also known as "Cursorjacking".
  6. Obstruction check - By using an offscreen HTML 5 canvas element, we take two reasonably sized (300x200 on average, but growing or shrinking depending on document's inherent size and viewport constraints and hints provided by the ui-height and ui-width properties of input-protection-hints) screenshots of the region centered around the DOM element which is about to receive the event: one from its owner document's "point of view" (unobstructed by definition), the other from the topmost window's. In the plugin content case, we ensure the former "screenshot" contains the element itself only. If the number of the pixels which are different between the screenshots don't exceed a certain configurable tolerance rate (default 18%, or as set by the tolerance property of input-protection-hints), return. Otherwise we tentatively assume the DOM element our user is interacting with has been obstructed or obscured by a UI Redressing attempt.

    CBC: the screenshots are taken by using the CanvasRenderingContext2D.drawWindow() method11, which is a Mozilla-proprietary extension of the HTML 5 Canvas API available to privileged code only, allowing the content of DOM windows to be drawn on a canvas surface exactly as rendered on the screen. The rest of this phase relies on cross-browser canvas features, instead, such as pixel grabbing and data URL serialization.

Script Interfaces

Should we define 1.1 compatible script interfaces, or only recommend checking the presence of the unsafe attribute?

Examples

Sample Policy Definitions

This section provides some sample use cases and accompanying security policies.

Example 1: A resource wishes to block delivery of UI events to the DOM element with the id of "submitButton" and suggests a 15% tolerance threshold for determining obstruction of the element within a 200 by 200 pixel window:

Content-Security-Policy: input-protection element-id=submitButton tolerance=15
                         ui-height=200 ui-width=200

Example 2: An resource wishes to receive reports when the UI Safety heuristic is triggered for any element in the <body> :

Content-Security-Policy-Report-Only: input-protection;
                                     report-uri https://example.com/csp-report?unique_id=XKSJ9KAAHJDK9928KKSJEQ

Example 3: A resource wants to react to potential clickjacking directly, without sending a report, so it sets a report-only header but does not specify a report-uri. When a UIEvent is sent, the unsafe attribute will still be set when the heuristic is triggered:

Content-Security-Policy-Report-Only: input-protection

Sample Violation Report

This section contains an example violation report the user agent might sent to a server when the protected resource violations a sample policy.

In the following example, a document from http://example.org/page.html was rendered with the following CSP policy:

input-protection; report-uri https://example.org/csp-report.cgi?unique_id=12345

A click violated the policy.

{
  "csp-report": {
    "document-uri": "http://example.org/page.html",
    "referrer": "http://evil.example.com/haxor.html",
    "blocked-event-type": "click",
    "blocked-event-client-x": "325",
    "blocked-event-client-y": "122",
    "touch-event": "false",
    "client-width": "600",
    "client-height": "700",
    "blocked-target-id": "makePaymentSubmit",
    "blocked-target-xpath": TODO,
    "ancestor-chain": TODO,
    "violated-directive": "input-protection",
    "original-policy": "input-protection; report-uri https://example.org/csp-report.cgi?unique_id=12345"
  }
}

Security Considerations

This document specifies two embedding policy mechanisms: frame-ancestors and frame-options.

The frame-options directive verifies only the top-level browsing context, not all ancestors of a protected resource. The frame-options directive provides exact compatibility with X-Frame-Options, but may allow additional vulnerabilities as compared to frame-ancestors. For example, if a resource at origin A allows untrusted content from origin B to be displayed in an iframe, (perhaps using the sandbox attribute) that embedded content may itself embed more content from origin A. Because it does not check ancestors, a frame-options policy value of 'self' would allow this, even though the immediate embedding context of the resource is hostile, and B might be able to attack A.

Both frame-ancestors and frame-options provide deterministic protections within a single browsing window, but may not provide full protection in environments where multiple browser windows may overlap and be programatically closed or moved by malicious content. These directives SHOULD be deployed in concert with input-protection to provide additional protection in such environments.

UI Redressing and Clickjacking attacks rely on violating the contextual and temporal integrity of embedded content. Because these attacks target the subjective perception of the user and not well-defined security boundaries, the hueristic protections afforded by the input-protection directive can never be 100% effective for every interface. It provides no protection against certain classes of attacks, such as displaying content around an embedded resource that appears to extend a trusted dialog but provides misleading information.

Implementation Considerations

Many UI Redressing and Clickjacking attacks rely on exploiting specific features of user agents, such as repositioning of the browsing window, hiding or creating fake cursors, and script-driven scrolling and content repositioning. Not all attacks apply to all user agents in all contexts. User agents are free to optimize or not implement suggested heuristics when they do not apply, for example:

Some resource owners may specify a restrictive policy forbidding embedding in user agents that only understand X-Frame-Options but be more permissive with user agents that implement UI Safety directives. User agents that are aware of but choose not to implement any of the hurestics in this document MAY still ignore X-Frame-Options when presented in combination with UI Safety directives in a Content-Security-Policy. For example, a browsing environment that deliberately chooses not to implement UI Safety features because they interfere with assistive technologies SHOULD NOT deny users access to resources on this account. User agents taking this stance SHOULD implement the unsafe attribute of the UIEvent interface as this may be interrogated by client applications doing feature detection.

In environments that support multiple, overlapping browser windows, attacks may be mounted by positioning a target window under another, instructing the user to double click, and closing the obstructing window with the first click. [TODO: ref Jackson et al. paper here] In such environments user agent implementers may wish to use a native operating system screenshot facility to calculate the user's view for the obstruction check phase of the heuristic.

Implementation Considerations for Resource Authors

When possible, resource authors SHOULD make use of violation reports and the unsafe attribute to apply additional security measures in the application or during back-end processing. Real-time meaures in the application might include requiring completion of a CAPTCHA [REF] or responding to an out-of-band confirmation when the UI Safety heuristic is triggered. Example back-end measures might include increasing a fraud risk score for individual actions that trigger or targets accounts/resources that frequently trigger UI Safety hueristics. To be able to do this effectively, it is likely necesssary to encode into the report-uri a unique identifier that can be correlated to the authenticated user and the action they are taking.

IANA Considerations

This document does not define new message headers and uses the existing grammar of the Content-Security-Policy and Content-Security-Policy-Report-Only headers, so no updates to the permanent message header field registry (see [RFC3864]) are required.