W3C

Encrypted Media Extensions

W3C Editor's Draft 19 July 2012

Latest published version:
Not yet published
Latest editor's draft:
http://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html
Editors:
David Dorwin, Google, Inc.
Adrian Bateman, Microsoft Corporation
Mark Watson, Netflix, Inc.
Bug/Issue lists:
Bugzilla, Tracker
Discussion list:
public-html-media@w3.org
Test Suite:
None yet

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the HTML working group as an Editor's Draft. Please submit comments regarding this document by using the W3C's (public bug database) with the product set to HTML WG and the component set to Encrypted Media Extensions. If you cannot access the bug database, submit comments to public-html-media@w3.org (subscribe, archives) and arrangements will be made to transpose the comments to the bug database. All feedback is welcome.

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Abstract

This proposal extends HTMLMediaElement to enable playback of protected content. The proposed API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies. No "DRM" is added to the HTML5 specification, and only simple clear key decryption is required as a common baseline.

Table of Contents

1. Introduction

This section is non-normative.

This proposal allows JavaScript to select content protection mechanisms, control license/key exchange, and implement custom license management algorithms. It supports a wide range of use cases without requiring client-side modifications in each user agent for each use case. This also enables content providers to develop a single application solution for all devices. A generic stack implemented using the proposed APIs is shown below. This is just an example flow and is not intended to show all possible communication or uses.

A generic stack implemented using the proposed APIs

1.1 Goals

This section is non-normative.

This proposal was designed with the following goals in mind:

1.2. Definitions

Text in this font and color is non-normative.

1.2.1. Content Decryption Module (CDM)

This section is non-normative.

The Content Decryption Module (CDM) is a generic term for a part of or add-on to the user agent that provides functionality for one or more Key Systems. Implementations may or may not separate the implementations of CDMs and may or may not treat them as separate from the user agent. This is transparent to the API and application. A user agent may support one or more CDMs.

1.2.2. Key System

A Key System is a generic term for a decryption mechanism and/or content protection provider. Key System strings provide unique identification of a Key System. They used by the user agent to select the Content Decryption Modules and identify the source of a key-related event. Simple Decryption Key Systems are supported by all user agents. User agents may also provide additional CDMs with corresponding Key System strings.

Key System strings are always a reverse domain name. For example, "com.example.somesystem". Within a given system ("somesystem" in the example), subsystems may be defined as determined by the key system provider. For example, "com.example.somesystem.1" and "com.example.somesystem.1_5". Key system providers should keep in mind that these will be used for comparison and discovery, so they should be easy to compare and the structure should remain reasonably simple.

If a user agent returns "maybe" or "probably" for any subsystem string, it must return "maybe" when a parent system string is passed to canPlayType(). For example, if a user agent returns "maybe" or "probably" for "com.example.somesystem.1_5", it must return "maybe" for "com.example.somesystem".

1.2.3. Session ID

A session ID is an optional string ID used to associate calls related to a key/license lifetime, starting with the request. It is a local binding between a request and key/license. It does not associate keys or licenses for different streams (i.e. audio and video). If supported by the Key System, it is generated by the user agent/CDM and provided to the application in the keymessage event. (Session IDs need not necessarily be supported by the underlying content protection client or server.)

Each successful call to generateKeyRequest() generates a new Session ID (returned in the keymessage event).

Applications should always provide the session ID from an event in subsequent calls for this key or license. (This is a best practice, even if the current Key System does not support session IDs.) This may mean that the application must associate a server response with the session ID and provide them both to addKey().

If Session IDs are supported, a new one will be created each time generateKeyRequest() is called. The user agent/CDM manage the lifetime of Session IDs. All Session IDs are cleared from the media element when a load occurs, although the CDM may retain them for longer.

NOTE: The key acquisition process (calling generateKeyRequest()/addKey()) may be executed multiple times for different sessions (each identified by a sessionId).

1.2.4. Initialization Data

This section is non-normative.

Initialization Data is a generic term for container-specific data that is used by Content Decryption Modules to generate a key request. It should always allow unique identification of the key or keys needed to decrypt the content, possibly after being parsed by a CDM or server.

Key Systems usually require a block of initialization data containing information about the stream to be decrypted before they can construct a key request message. This block could be as simple as a key or content ID to send to a server or as complex as an opaque Key System-specific collection of data. This initialization information may be obtained in some application-specific way or may be stored with the media data. Container formats may provide for storage of such information, possibly for multiple Key Systems in a single media file.

Initialization data found in the media data is provided to the application in the initData attribute of the needkey event. This data has a container-specific format and is assumed to contain one or more generic or Key System-specific sets of initialization information.

Initialization Data - generic or containing information for the selected Key System - must be provided, in the same format, in the first media element method call that specifies a keySystem.

2. Media Element Extensions

We extend media element to allow decryption key acquisition to be handled in JavaScript. We also extend canPlayType() to provide basic information about the Key Systems supported by the user agent.

Note: For some CDMs, "key" and "key request" correspond to "license" and "license request", respectively.

partial interface HTMLMediaElement {
  // Adds optional 'keySystem' parameter.
  DOMString canPlayType(in DOMString type, in DOMstring? keySystem);

  void generateKeyRequest(in DOMString keySystem, in Uint8Array? initData);
  void addKey(in DOMString keySystem, in Uint8Array key, in Uint8Array? initData, in DOMString? sessionId);
  void cancelKeyRequest(in DOMString keySystem, in DOMString? sessionId);
};

partial interface HTMLSourceElement {
             attribute DOMString keySystem;
};

The canPlayType(type, keySystem) method is modified to add an optional second parameter to specify the Key System.

The following list shows some examples of how to use the keySystem parameter in canPlayType() calls.

Returns whether the Some System Key System is supported. Specific containers and codecs may or may not be supported with Some System.
video.canPlayType(null, "com.example.somesystem")
Returns whether version 1.5 of the Some System Key System is supported. Specific containers and codecs may or may not be supported with Some System 1.5.
video.canPlayType(null, "com.example.somesystem.1_5")
Returns whether the Some System Key System is present and supports the container and codec(s) specified by mimeType.
video.canPlayType(mimeType, "com.example.somesystem")
Returns whether the user agent supports Clear Key Simple Decryption of the container and codec(s) specified by mimeType.
video.canPlayType(mimeType, "org.w3.clearkey")
Returns whether the user agent supports the container and codec(s) specified by mimeType but not whether encrypted streams can be decrypted. This is no different from the current specification.
video.canPlayType(mimeType)
video.canPlayType(mimeType, null)
video.canPlayType(mimeType, "")

In addition to the steps in the current specification, this method must run the following steps:

  1. Check whether the Key System is supported with the specified container and codec type(s) by following the steps for the first matching condition from the following list:

    If keySystem is null
    Continue to the next step.
    If keySystem contains an unrecognized or unsupported Key System
    Return the empty string.
    If the Key System specified by keySystem does not support decrypting the container and/or codec specified in the rest of the type string.
    Return the empty string.
  2. Return "maybe" or "probably" as appropriate per the existing specification of canPlayType().

The generateKeyRequest(keySystem, initData) method must run the following steps:

Note: The contents of initData are container-specific Initialization Data.

  1. If the first argument is null, throw a SYNTAX_ERR.

  2. If networkState is NETWORK_EMPTY, throw an INVALID_STATE_ERR.

    In general, applications should wait for an event named needkey or loadstart (per the resource fetch algorithm) before calling this method.

  3. Initialize handler by following the steps for the first matching condition from the following list:

    If keySystem is one of the user agent's supported Key Systems
    Let handler be the content decryption module corresponding to keySystem.
    Otherwise
    Throw a NOT_SUPPORTED_ERR.
  4. Schedule a task to handle the call, providing initData.

    The user agent will asynchronously execute the following steps in the task:

    1. Load handler if necessary.

      If handler fails to load or initialize

      queue a task to fire a simple event named keyerror at the media element and abort the task.

      The event is of type MediaKeyErrorEvent and has:

    2. Let defaultURL be null.

    3. Use handler to generate a key request and follow the steps for the first matching condition from the following list:

      If a request is successfully generated
      1. Let key request be a key request generated by the CDM using initData, if provided.

        Note: handler must not use any data, including media data, not provided via initData.

      2. If initData is not null and contains a default URL for keySystem, let defaultURL be that URL.

      Otherwise

      queue a task to fire a simple event named keyerror at the media element and abort the task.

      The event is of type MediaKeyErrorEvent and has:

    4. Let sessionId be a unique Session ID string. It may be generated by handler.

    5. queue a task to fire a simple event named keymessage at the media element

      The event is of type MediaKeyMessageEvent and has:

      Note: message may be a request for multiple keys, depending on the keySystem and/or initData. This is transparent to the application.

The addKey(keySystem, key, initData, sessionId) method must run the following steps:

Note: The contents of key are keySystem-specific. It may be a raw key or a license containing a key. The contents may also vary depending on the container, key length, etc.

Note: The contents of initData are container-specific Initialization Data and should be the same format as the same parameter in generateKeyRequest(). It may be null.

  1. If the first or second argument is null, throw a SYNTAX_ERR.

  2. If the second argument is an empty array, throw a TYPE_MISMATCH_ERR.

  3. If networkState is NETWORK_EMPTY, throw an INVALID_STATE_ERR.

    In general, applications should wait for an event named needkey or loadstart (per the resource fetch algorithm) before calling this method.

  4. Initialize handler by following the steps for the first matching condition from the following list:

    If keySystem is one of the user agent's supported Key Systems
    Let handler be the content decryption module corresponding to keySystem.
    Otherwise
    Throw a NOT_SUPPORTED_ERR.
  5. If sessionId is not null and is unrecognized, throw an INVALID_ACCESS_ERR.

  6. Schedule a task to handle the call, providing key, initData, and sessionId.

    The user agent will asynchronously execute the following steps in the task:

    1. Load handler if necessary.

      If handler fails to load or initialize

      queue a task to fire a simple event named keyerror at the media element and abort the task.

      The event is of type MediaKeyErrorEvent and has:

    2. Let key stored be false.

    3. Let next message be null.

    4. Use handler to handle key.

      1. Process key.

      2. If key contains a key or license, store the key.

        1. Let key ID be null.

        2. If sessionId is not null and refers to a session with Initialization Data that contains a key ID, let key ID be that ID.

        3. If key is not null and contains a key ID, let key ID be that ID.

        4. If initData is not null and contains a key ID, let key ID be that ID.

        5. Store the key by following the steps for the first matching condition from the following list:

          If key ID is not null
          1. Clear any key not associated with a key ID.

          2. If a key already exists for key ID, delete that element.

          3. Store the key and/or license in key indexed by key ID. The replacement algorithm is Key System-dependent.

          Otherwise
          1. Clear all stored keys.

          2. Store the key and/or license in key with no associated key ID.

          At most one key may be stored if key IDs are not used.

          Clearing keys avoids needing to handle a mixture of keys with and without IDs in the Encrypted Block Encountered algorithm.

          Note: It is recommended that CDM providers support a standard and reasonably high minimum number of cached keys/licenses (with IDs) per media element as well as a standard replacement algorithm. This enables a reasonable number of key rotation algorithms to be implemented across user agents and may reduce the likelihood of playback interruptions in use cases that involve various streams in the same element (i.e. adaptive streams, various audio and video tracks) using different keys.

        6. Let key stored be true.

      3. If another message needs to be sent to the server, let next message be that message.

    5. If key stored is true and the media element is waiting for a key, queue a task to attempt to resume playback.

      In other words, resume playback if the necessary key is provided.

    6. Fire the appropriate event by following the steps for the first matching condition from the following list:

      If next message is null

      queue a task to fire a simple event named keyadded at the media element

      The event is of type MediaKeyCompleteEvent and has:

      Otherwise

      queue a task to fire a simple event named keymessage at the media element

      The event is of type MediaKeyMessageEvent and has:

      If any of the preceding steps in the task failed, queue a task to fire a simple event named keyerror at the media element.

      The event is of type MediaKeyErrorEvent and has:

The key acquisition process may involve the web page handling keymessage events, sending the message to a Key System-specific service, and calling addKey with the response message. This continues until the keyadded event is fired. During the process, the web page may wish to cancel the acquisition process. For example, if the page cannot contact the license service because of network issues it may wish to fallback to an alternative key system. The page calls cancelKeyRequest() to cancel the a key acquisition and return the media element to a state where generateKeyRequest() may be called again.

The cancelKeyRequest(keySystem, sessionId) method must run the following steps:

  1. If the first argument is null, throw a SYNTAX_ERR.

  2. If sessionId is not null and is unrecognized or not mapped to the keySystem, throw an INVALID_ACCESS_ERR.

  3. If a keyadded event has already been fired for this sessionId, throw an INVALID_STATE_ERR.
  4. Clear any internal state associated with the sessionId (or if this is null with the keySystem for this media element). This sessionId will now be unrecognized.
  5. TBD

The keySystem attribute of HTMLSourceElement specifies the Key System to be used with the media resource. The resource selection algorithm is modified to check the keySystem attribute after the existing step 5 of the Otherwise branch of step 6:

  1. ⌛ If candidate has a keySystem attribute whose value represents a Key System that the user agent knows it cannot use with type, then end the synchronous section, and jump down to the failed step below.

A media element is said to have a selected Key System when one of the following has occurred:

2.1. Error Codes

MediaError is extended, and a new error type is added.

partial interface MediaError {
  const unsigned short MEDIA_ERR_ENCRYPTED = 5;
};

interface MediaKeyError {
  const unsigned short MEDIA_KEYERR_UNKNOWN = 1;
  const unsigned short MEDIA_KEYERR_CLIENT = 2;
  const unsigned short MEDIA_KEYERR_SERVICE = 3;
  const unsigned short MEDIA_KEYERR_OUTPUT = 4;
  const unsigned short MEDIA_KEYERR_HARDWARECHANGE = 5;
  const unsigned short MEDIA_KEYERR_DOMAIN = 6;
};

The code attribute of a MediaError may additionally return the following:

MEDIA_ERR_ENCRYPTED (numeric value 5)
The stream could not be played because it is encrypted and one of the following:
  1. No key was provided and no needkey handler was provided
  2. The provided key could not be successfully applied
  3. The user agent does not support decryption of this media data

A MediaKeyError may be one of the following:

MEDIA_KEYERR_UNKNOWN (numeric value 1)
An unspecified error occurred. This value is used for errors that don't match any of the following codes.
MEDIA_KEYERR_CLIENT (numeric value 2)
The Key System could not be installed or updated.
MEDIA_KEYERR_SERVICE (numeric value 3)
The message passed into addKey indicated an error from the license service.
MEDIA_KEYERR_OUTPUT (numeric value 4)
There is no available output device with the required characteristics for the content protection system.
MEDIA_KEYERR_HARDWARECHANGE (numeric value 5)
A hardware configuration change caused a content protection error.
MEDIA_KEYERR_DOMAIN (numeric value 6)
An error occurred in a multi-device domain licensing configuration. The most common error is a failure to join the domain.

3. Events

3.1. Event Definitions

[Constructor(DOMString type, optional MediaKeyNeededEventInit eventInitDict)]
interface MediaKeyNeededEvent : Event {
  readonly attribute DOMString? keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute Uint8Array? initData;
};

dictionary MediaKeyNeededEventInit : EventInit {
  DOMString? keySystem;
  DOMString? sessionId;
  Uint8Array? initData;
};
[Constructor(DOMString type, optional MediaKeyMessageEventInit eventInitDict)]
interface MediaKeyMessageEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute Uint8Array message;
  readonly attribute DOMString? defaultURL;
};

dictionary MediaKeyMessageEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
  Uint8Array message;
  DOMString? defaultURL;
};
[Constructor(DOMString type, optional MediaKeyCompleteEventInit eventInitDict)]
interface MediaKeyCompleteEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
};

dictionary MediaKeyCompleteEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
};
[Constructor(DOMString type, optional MediaKeyErrorEventInit eventInitDict)]
interface MediaKeyErrorEvent : Event {
  readonly attribute DOMString keySystem;
  readonly attribute DOMString? sessionId;
  readonly attribute MediaKeyError errorCode;
  readonly attribute unsigned short systemCode;
};

dictionary MediaKeyErrorEventInit : EventInit {
  DOMString keySystem;
  DOMString? sessionId;
  MediaKeyError errorCode;
  unsigned short systemCode;
};
event . keySystem

Returns the name of the Key System that generated the event.

event . sessionId

Returns the Session ID the event is related to, if applicable.

event . initData

Returns the Initialization Data related to the event.

event . message

Returns the message (i.e. key request) to send.

event . defaultURL

Returns the default key exchange URL.

event . errorCode

Returns the MediaKeyError for the error that occurred.

event . systemCode

Returns a Key System-dependent status code for the error that occurred.

The keySystem attribute is an identifier for the Key System that generated the event. It may be null in the needkey event if the media element does not have a selected Key System.

The sessionId attribute is the Session ID for the key or license that this event refers to. It may be null.

The initData attribute contains Initialization Data specific to the event.

The message attribute contains a message from the CDM. Messages are Key System-specific. In most cases, it should be sent to a key server.

The defaultURL is the default URL to send the key request to as provided by the media data. It may be null.

The errorCode attribute contains the MediaKeyError code for the error that occurred.

The systemCode attribute contains a Key System-dependent status code for the error that occurred. This allows a more granular status to be returned than the more general errorCode. It should be 0 if there is no associated status code or such status codes are not supported by the Key System.

If a response (i.e. a license) is necessary, applications should use one of the new methods to provide the response.

3.2. Event Summary

Event name Interface Dispatched when... Preconditions
keyadded MediaKeyCompleteEvent A key has been added as the result of a addKey() call.
keyerror MediaKeyErrorEvent An error occurs in one of the new methods or CDM.
keymessage MediaKeyMessageEvent A message has been generated (and likely needs to be sent to a key server). For example, a key request has been generated as the result of a generateKeyRequest() call or another message must be sent in response to an addKey() call.
needkey MediaKeyNeededEvent The user agent needs a key or license to begin or continue playback.

It may have encountered media data that may/does require decryption to load or play OR need a new key/license to continue playback.
readyState is equal to HAVE_METADATA or greater. It is possible that the element is playing or has played.

4. Key Release

4.1. Introduction

This section is non-normative.

The above sections provide for delivery of key/license information to a Content Decryption Module. This section provides for management of the entire key/license lifecycle, that is, secure proof of key release. Use cases for such proof include any service where is it necessary for the service to know, reliably, which granted keys/licences are still available for use by the user and which have been deleted. Examples include a service with restrictions on the number of concurrent streams available to a user or a service where content is available on a rental basis, for use offline.

Secure proof of key release must necessarily involve the CDM due to the relative ease with which scripts may be modified. The CDM must provide a message asserting, in a CDM-specific form, that a specific key or license has been destroyed. Such messages must be cached in the CDM until acknowledgement of their delivery to the service has been received. This acknowledgement must also be in the form of a CDM-specific message.

The mechanism for secure proof of key release operates outside the scope of any media element. This is because proof-of-release messages may be cached in CDMs after the associated media elements have been destroyed. Proof-of-key-release messages are subject to the same origin policy: they shall only be delivered to scripts with the same origin as the script which created the media element that provided the key/license.

4.2. Key Release Manager

The following interface is defined for management of key release messages:

    [Constructor()]
    interface KeyReleaseManager : EventTarget {
        void getKeyReleases(in DOMString keySystem);
        void addKeyReleaseCommit(in DOMString keySystem, in DOMString sessionId, in Uint8Array message);
    }
    

The getKeyReleases(keysystem) method must run the following steps:

  1. If the first argument is null, throw a SYNTAX_ERR.

  2. Initialize handler by following the steps for the first matching condition from the following list:

    If keysystem is one of the user agent's supported Key Systems
    Let handler be the content decryption module corresponding to keySystem.
    Otherwise
    Throw a NOT_SUPPORTED_ERR.
  3. Schedule a task to handle the call.

    The user agent will asynchronously execute the following steps in the task:

    1. Load handler if necessary.

      If handler fails to load or initialize

      queue a task to fire a simple event named keyerror at the media element and abort the task.

      The event is of type MediaKeyErrorEvent and has:

    2. Use handler to generate one or more key release messages, if supported. handler should follow the steps for the first matching condition from the following list:

      If generating a key release message is not supported
      Let key release messages be null
      Otherwise
      Let key release messages be a set of key release messages generated by the CDM for the current origin.
    3. For each key release message in key release messages, queue a task to fire a simple event named keyrelease at the key release manager.

      The event is of type MediaKeyMessageEvent and has:

      • keySystem = keySystem
        sessionId = the sessionId originally associated with the provision of the key
        message = key release message
        defaultURL = value of the default URL, if stored by the CDM.

The addKeyReleaseCommit(keysystem, sessionId, message) method must run the following steps:

  1. If the first argument is null, throw a SYNTAX_ERR.

  2. Initialize handler by following the steps for the first matching condition from the following list:

    If keysystem is one of the user agent's supported Key Systems
    Let handler be the content decryption module corresponding to keySystem.
    Otherwise
    Throw a NOT_SUPPORTED_ERR.
  3. Schedule a task to handle the call, providing sessionId and message.

    The user agent will asynchronously execute the following steps in the task:

    1. Load handler if necessary.

      If handler fails to load or initialize

      queue a task to fire a simple event named keyerror at the media element and abort the task.

      The event is of type MediaKeyErrorEvent and has:

    2. Use handler to commit the message. handler should follow the steps for the first matching condition from the following list:

      If committing a key release message is supported and the message is valid:

      queue a task to fire a simple event named keyreleasecommitted at the key release manager.

      The event is of type MediaKeyCompleteEvent and has:

      Otherwise

      queue a task to fire a simple event named keyerror at the key release manager.

      The event is of type MediaKeyErrorEvent and has:

5. Algorithms

5.1. Encrypted Block Encountered

The following steps are run when the media element encounters a block (i.e. frame) of encrypted media data during the resource fetch algorithm:

  1. Let key system be null.

  2. Let handler be null.

  3. Let block initData be null.

  4. Let block key be null.

  5. If the block (or its parent entity) has Initialization Data, let block initData be that initialization data.

  6. Select the key system and handler by following the steps for the first matching condition from the following list:

    If the media element has a selected Key System
    Run the following steps:
    1. Let key system be the selected Key System.

    2. Let handler be the content decryption module corresponding to key system.

    Otherwise
    Jump to the Key Presence step below.
  7. Load handler if necessary.

    If handler fails to load or initialize

    queue a task to fire a simple event named keyerror at the media element and abort the task.

    The event is of type MediaKeyErrorEvent and has:

  8. Use handler to select the key:

    1. Let block key ID be null.

    2. If block initData is not null and contains a key ID, let block key ID be that ID.

    3. Select the key by following the steps for the first matching condition from the following list:

      If block key ID is not null

      Select the key by using handler to follow the steps for the first matching condition from the following list:

      If handler has a key cached for block key ID
      Let block key be the matching cached key.
      If handler has a key cached with no ID (there can be one at most)
      Let block key be the single cached key.
      Otherwise (handler has no keys cached OR has one or more keys cached, none of which have a matching key ID)
      Jump to the Key Presence step below.
      Otherwise

      Select the key by using handler to follow the steps for the first matching condition from the following list:

      If handler has a single key cached (with or without a key ID)
      Let block key be the single cached key.
      If handler has more than one key cached (all would have IDs)
      Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.
      Otherwise
      Jump to the Key Presence step below.
  9. Key Presence: Handle the presence of a key by following the steps for the first matching condition from the following list:

    If handler is not null and block key is not null.
    Use handler to Decrypt the block using block key by following the steps for the first matching condition from the following list:
    If decryption fails
    Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.
    Otherwise
    Continue.

    Note: Not all decryption problems (i.e. using the wrong key) will result in a decryption failure. In such cases, no error is fired here but one may be fired during decode.

    If there is an event handler for needkey
    queue a task to fire a simple event named needkey at the media element.

    The event is of type MediaKeyNeededEvent and has:

    The media element is said to be potentially playing unless playback stops because the stream cannot be decrypted, in which case the media element is said to be waiting for a key.

    Otherwise
    Abort media element's resource fetch algorithm and run the steps to report a MEDIA_ERR_ENCRYPTED error.

For frame-based encryption, this may be implemented as follows when the media element attempts to decode a frame as part of the resource fetch algorithm:

  1. Let encrypted be false.

  2. Detect whether the frame is encrypted.

    If the frame is encrypted
    Run the steps above.
    Otherwise
    Continue.
  3. Decode the frame.

  4. Provide the frame for rendering.

The following paragraph is added to Playing the media resource.

5.2. Potentially Encrypted Stream Encountered

The following steps are run when the media element encounters a source that may contain encrypted blocks or streams during the resource fetch algorithm:

  1. Let key system be null.

  2. Let handler be null.

  3. Let initData be null.

  4. If Initialization Data was encountered, let initData be that initialization data.

  5. Select the key system and handler by following the steps for the first matching condition from the following list:

    If the media element has a selected Key System
    Run the following steps:
    1. Let key system be the selected Key System.

    2. Let handler be the content decryption module corresponding to key system.

    Otherwise
    Jump to the Need Key step below.
  6. Load handler if necessary.

    If handler fails to load or initialize

    queue a task to fire a simple event named keyerror at the media element and abort the task.

    The event is of type MediaKeyErrorEvent and has:

  7. Use handler to determine whether the key is known:

    1. Let key ID be null.

    2. If a key ID for the source is known at this time, let key ID be that ID.

    3. If initData is not null and contains a key ID, let key ID be that ID.

    4. Determine whether the key is already known by following the steps for the first matching condition from the following list:

      If key ID is not null

      Determine whether the key is known by following the steps for the first matching condition from the following list:

      If there is a key cached for key ID
      Jump to the Continue Normal Flow step below.
      Otherwise
      Jump to the Need Key step below.
      Otherwise

      Determine whether the key is known by following the steps for the first matching condition from the following list:

      If there is a single key cached (with or without a key ID)
      Jump to the Continue Normal Flow step below.
      Otherwise
      Jump to the Need Key step below.
  8. Need Key: queue a task to fire a simple event named needkey at the media element.

    The event is of type MediaKeyNeededEvent and has:

    Firing this event allows the application to begin acquiring the key process before it is needed.

    Note that readyState is not changed and no algorithms are aborted. This algorithm is merely informative.

  9. Continue Normal Flow: Continue with the existing media element's resource fetch algorithm.

5.3. Addition to Media Element Load Algorithm

The following step is added to the existing media element load algorithm:

6. Simple Decryption

All user agents must support the simple decryption capabilities described in this section regardless of whether they support a more advanced CDM. This ensures that there is a common baseline level of protection that is guaranteed to be supported in all user agents, including those that are entirely open source. Thus, content providers that need only basic protection can build simple applications that will work on all platforms without needing to work with any content protection providers.

6.1. Clear Key

The "org.w3.clearkey" Key System indicates a plain-text clear (unencrypted) key will be used to decrypt the source. No additional client-side content protection is required. Use of this Key System is described below.

The keySystem parameter and keySystem attributes are always "org.w3.clearkey" with the exception of events before the Key System has been selected. All events except needkey have a valid sessionId string, which is numerical.

The initData attribute of the needkey event and the initData parameters of generateKeyRequest() and addKey() are the same container-specific Initialization Data format and values. If supported, these values should provide some type of identification of the content or key that could be used to look up the key (since there is no defined logic for parsing it). For containers that support a simple key ID, it should be a binary array containing the raw key ID. For other containers, it may be some other opaque blob or null.

generateKeyRequest() may optionally be called. The resulting MediaKeyMessageEvent has:

To provide a key using this Key System, pass the following to addKey():

7. Examples

This section and its subsections are non-normative.

This section contains example solutions for various use cases using the proposed extensions. These are not the only solutions to these use cases. Video elements are used in the examples, but the same would apply to all media elements. In some cases, such as using synchronous XHR, the examples are simplified to keep the focus on the extensions.

7.1. Source and Key Known at Page Load (Clear Key Encryption)

In this simple example, the source file and clear-text key are hard-coded in the page.

This example is very simple because it does not care when the key has been added or associating that event with the addKey() call. It also does not handle errors.

<script>
  function load() {
    var video = document.getElementById("video");
    var key = new Uint8Array([ 0xaa, 0xbb, 0xcc, ... ]);
    video.addKey("org.w3.clearkey", key, null);
  }
</script>

<body onload="load()">
  <video src="foo.webm" autoplay id="video"></video>
</body>

7.2. Source Known but Key Not Known at Page Load

In this case, the Initialization Data is contained in the media data. If this was not the case, handleKeyNeeded() could obtain and provide it instead of getting it from the event.

If any asynchronous operation is required to get the key in handleKeyNeeded(), it could be called a second time if the stream is detected as potentially encrypted before an encrypted block/frame is encountered. In this case, applications may want to handle subsequent calls specially to avoid redundant license requests. This is not shown in the examples below.

7.2.1. Clear Key Encryption

This solution uses the Clear Key Simple Decryption.

As with the previous example, this one is very simple because it does not care when the key has been added or handle errors.

<script>
  function handleKeyNeeded(event) {
    if (event.keySystem && event.keySystem != "org.w3.clearkey")
      throw "Unhandled keySystem in event";
    var initData = event.initData;
    var video = event.target;

    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "http://.../getkey", false);
    xmlhttp.send(initData);
    var key = new Uint8Array(xmlhttp.response);
    video.addKey("org.w3.clearkey", key, initData, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)"></video>

7.2.2. Other Content Decryption Module

This solution uses more advanced decryption from a fictitious content decryption module called Some System.

<script>
  function handleKeyNeeded(event) {
    if (event.keySystem && event.keySystem != "com.example.somesystem.1_0")
      throw "Unhandled keySystem in event";
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest("com.example.somesystem.1_0", initData);
  }

  function licenseRequestReady(event) {
    if (event.keySystem != "com.example.somesystem.1_0")
      throw "Unhandled keySystem in event";
    var request = event.message;
    if (!request)
      throw "Could not create license request";

    var video = event.target;

    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "http://.../getkey", false);
    xmlhttp.send(request);
    var license = new Uint8Array(xmlhttp.response);
    video.addKey("com.example.somesystem.1_0", license, null, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>

7.3. Selecting a Supported Key System

Below is an example of detecting supported Key System using canPlayType() and selecting one.

<script>
  var keySystem;
  var licenseUrl;

  function selectKeySystem(video) {
    if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem") != “”) {
      licenseUrl = “https://license.example.com/getkey”; // OR “https://example.<My Video Site domain>”
      if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.2_0") != “”) {
        keySystem = “com.example.somesystem.2_0”;
      } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "com.example.somesystem.1_5") != “”) {
        keySystem = “com.example.somesystem.1_5”;
      }
    } else if (video.canPlayType("video/webm; codecs='vp8, vorbis'", "foobar") != “” {
      licenseUrl = “https://license.foobar.com/request”;
      keySystem = “foobar”;
    } else {
      throw “Key System not supported”;
    }
  }

  function handleKeyNeeded(event) {
    var targetKeySystem = event.keySystem;  
    if (targetKeySystem == null) {
      selectKeySystem(video);  // See previous example for implementation.
      targetKeySystem = keySystem;
    }
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest(targetKeySystem, initData);
  }

  function licenseRequestReady(event) {
    if (event.keySystem != keySystem)
      throw "Message from unexpected Key System";
    var request = event.message;
    if (!request)
      throw "Could not create license request";

    var video = event.target;
    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", licenseUrl, false);
    xmlhttp.send(request);
    var license = new Uint8Array(xmlhttp.response);
    video.addKey(keySystem, license, null, event.sessionId);
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="licenseRequestReady(event)"></video>

7.4. Using All Events

This is a more complete example showing all events being used along with asynchronous XHR.

Note that handleKeyMessage could be called multiple times, including in response to the addKey() call if multiple round trips are required and for any other reason the Key System might need to send a message.

<script>
  var keySystem;
  var licenseUrl;

  function handleMessageResponse() {
    var license = new Uint8Array(xmlhttp.response);
    var video = document.getElementById(“video”);
    video.addKey(keySystem, license, null, this.sessionId);
  }
  
  function sendMessage(message, sessionId) {
    xmlhttp = new XMLHttpRequest();
    xmlhttp.sessionId = sessionId;
    xmlhttp.onreadystatechange = handleMessageResponse;
    xmlhttp.open("POST", licenseUrl, true);
    xmlhttp.send(message);
  }

  function handleKeyNeeded(event) {
    var targetKeySystem = event.keySystem;  
    if (targetKeySystem == null) {
      selectKeySystem(video);  // See previous example for implementation.
      targetKeySystem = keySystem;
    }
    var initData = event.initData;
    var video = event.target;

    video.generateKeyRequest(targetKeySystem, initData);
  }

  function handleKeyMessage(event) {
    if (event.keySystem != keySystem)
      throw "Message from unexpected Key System";
    var message = event.message;
    if (!message)
      throw "Invalid key message";
  
    sendMessage(message, event.sessionId);
  }

  function handleKeyComplete(event) {
    // Do some bookkeeping with event.sessionId if necessary.
  }

  function handleKeyError(event) {
    // Report event.errorCode and do some bookkeeping with event.sessionId if necessary.
  }
</script>

<video src="foo.webm" autoplay onneedkey="handleKeyNeeded(event)" onkeymessage="handleKeyMessage(event)" onkeyadded="handleKeyComplete(event)" onkeyerror="handleKeyError(event)"></video>

8. FAQ

This section and its subsections are non-normative.

8.1. Use Cases

What use cases does this support?

Everything from user-generated content to be shared with family (user is not an adversary) to online radio to feature-length movies.

Is this proposal compatible with adaptive streaming?

Yes, this proposal is compatible with both "Type 1" and "Type 3" adaptive streaming modes as defined by the W3C Web & TV Interest Group.

Is key rotation supported?

Yes.

Can I encrypt captions / <track> elements?

No, this proposal only supports decrypting audio and video that are part of the media data.

Can I let the user agent select the appropriate CDM using <source> elements?

Yes, using the keySystem attribute of the HTMLSourceElement. When used with type attribute, this will select the first <source> element (container, codec, and Key System) that the user agent might support. The selected CDM will not be reported to the application until an event is fired.

Is a heartbeat supported?

Yes.

Heartbeat is a mode of operation where the Content Decryption Module requires to receive an explicit heartbeat message from its server on a regular basis, otherwise decryption is blocked. This enables use-cases requiring strict online control of access to the content. Heartbeat must be supported by the CDM and is implemented in this model by supplying an expiration time or valid duration in the license provided to the CDM. Before expiry of this license, the CDM must trigger a new message exchange to obtain an updated license.

8.2. Use

Can I send a token for the signed-in user with the license request?

Yes. The application can add this to the license request (sent via XMLHttpRequest in the examples) or send it to the CDM via generateKeyRequest() to be included in the license request.

How do I resume playback after receiving the needkey event in the Encrypted Block Encountered algorithm?

Assuming there are no other issues, playback will resume when the needed key is provided by addKey() and processed.

Can an application use multiple content protection providers / Key Systems?

Yes, this will likely be necessary to support all or a majority of user agents. An application could also use different Key Systems on a single user agent for different purposes.

How do I add support for a CDM to my application?

We envision CDM providers creating JavaScript libraries that application developers can include. canPlayType() can then be used to select from supported libraries.

How do I determine if the UA supports specific capabilities for a given provider?

This is vendor-/Key System-specific.

Obtaining this information could take time and is open-ended, so it is not appropriate for canPlayType(). There is also no way for canPlayType() to attest to capabilities anyway. Some basic Key System feature detection may be available via canPlayType().

How should an application handle a needkey event with a null keySystem attribute?

This is a very common scenario because it happens when the user agent encounters encrypted media and does not have an appropriate key. If the application does not already know which Key System to use, it should use canPlayType() to select an appropriate one. When the keySystem attribute is null, the initData attribute is always independent of the Key System.

What is a license URL (licenseUrl) in the examples?

This is the URL for a server capable of providing the key for the stream, usually using the Initialization Data and often after verifying the requesting user. The URL is application- and/or Key System-specific and may be a content provider or a Key System provider depending on the solution.

This is too complex and hard to use.

That's not a question, but we'll try to address it anyway. As shown in the examples, the basic use cases are reasonably simple and only require a little setup to get the key and provide it to the user agent. We believe most small content sites can add basic protection to their applications with minimal efforts.

The more complex cases, such as fast time to first frame and various license management algorithms, require more complex code, but professional-strength content protection is complex and that is to be expected. Professional-strength content protection requires server components and working with one or more content protection vendors, so this isn’t really any more complex. In fact, if you implement a few solutions, it will work on any browser-based platform, avoiding the need for per-platform solutions on both the server and client. The fixed set of interfaces may even lead to more consistent patterns and behavior across various solutions. It is generally the large content providers that have more complex requirements, and we believe they will have the appropriate resources to implement applications that meet their requirements.

Providers of content decryption modules will need to provide detailed specifications for actions and events to guide content providers in designing the algorithms in their applications. They can also provide a JavaScript libraries for their solution that can be integrated into any application. An application would then basically select a solution and delegate a lot of the work to the appropriate library.

8.3. API

How is the decryption algorithm specified?

This is container specific. A container may standardize on a specific algorithm (i.e. AES-128) and/or allow it to be specified. The user agent must know and/or detect the appropriate algorithm to use with the key provided by this API.

What are the advantages of doing license/key exchange in the application?

Advantages include:

Why does canPlayType() need to be modified? Why doesn't it provide more information?

The modifications allow applications to detect whether the user agent is capable of supporting the application's encrypted content (at any level of protection) and to allow the application to branch to the appropriate code and/or select a CDM library.

At the same time, we do not want to put too much burden on canPlayType() and it must remain a synchronous method that can be processed from static data. See the related question.

Why does canPlayType() need a second parameter? Why not just add Key System to the type parameter string (or MIME type)?

This could have gone either way, and the behavior of both existing user agents and those that implement these extensions would be the same. (Existing user agents ignore it in both cases.) The main reason for using a separate parameter is that the Key System is not part of the MIME type (see the related question), and the type parameter is generally used interchangeably with the MIME type. Separating the Key System from the MIME type should avoid confusion.

The downside is that the same type string cannot be used for both canPlayType() and the source element's type attribute. Instead, the Key System is passed as a second parameter to canPlayType() and as a separate attribute to the source element.

Will I be informed if a call to one of the new methods fails?

Errors that occur during synchronous portion of the algorithms will be thrown. For asynchronous portions (i.e. when a task is scheduled), a MediaKeyErrorEvent will be fired.

Why isn't the Key System part of the MIME type (like codecs)?

In many cases (especially the direction the content providers and standards are moving), the stream is not specific to any one Key System or provider. Multiple Key Systems could be used to decrypt the same generic stream. Thus, the Key System is not information about the file and should not be part of the MIME type.

One could argue that the encryption algorithm (e.g. AES-128) and configuration should be in the MIME type. That is not required for this proposal, so it is not addressed here.

Why do we need another event?

While many use case could be implemented without an additional event (by requiring the app to provide all the information up front), some use cases may be better handled by an event.

Why does the event need multiple attributes?

The keySystem attribute ensures that the application knows which CDM caused the event so it can know how to handle the event. While the application could probably know or discover this in other ways, this makes it simple for the application.

Why do we need a new MediaError code?

Without a new error code (MEDIA_ERR_ENCRYPTED), it is not possible for user agents to clearly indicate to an application that playback failed because the content was encrypted and user agents will likely need to fire a MEDIA_ERR_DECODE or MEDIA_ERR_SRC_NOT_SUPPORTED, which would be confusing.

Will adding a new error code to MediaError break existing applications?

Applications that are not aware of the new error code (MEDIA_ERR_ENCRYPTED) may not correctly handle it, but they should still be able to detect that an error has occurred because a) an error event is fired and b) media . error is not null.

Why do we need a new error type (MediaKeyError) and event (MediaKeyErrorEvent)?

While key/license exchange errors are fatal to the exchange session, most are not fatal to playback. This is especially true if the media element already has a key for the current (and future) frames or, for example, the exchange was for a different stream in an adaptive streaming scenario. The separation allows the media element to continue playback while the application attempts to resolve the exchange problem or until the requested key/license is actually needed.

What happens if a response to the needkey event from a encountering a potentially encrypted stream is not received before encountering an encrypted block?

The Encrypted Block Encountered algorithm will proceed as normal. If no appropriate key has been provided, a second needkey event will be fired and decoding will stop.

The same needkey event with the same attributes is fired for both Encrypted Block Encountered and Potentially Encrypted Stream Encountered. How can an application distinguish between the two?

The same event was used intentionally to reduce the complexity of applications. Ideally, they would not need to know.

What if a different [supported] Key System is passed to one of the new methods in subsequent calls to the same HTMLMediaElement?

(Expanding on the question, this relates to the new methods, including generateKeyRequest() and addKey(), that modify state and does not apply to canPlayType(), which is explicitly intended to be called with multiple Key System strings. For example, what if generateKeyRequest() is called with one Key System then addKey() is called with another; or if addKey() is called twice with two different Key Systems.)

If a load occurs between calls with different Key Systems, then there is no problem.

Otherwise, the calls will be treated separately. generateKeyRequest() will start a new session with a new Session ID. addKey() will behave as normal unless sessionId parameter is not null and is unrecognized for the specified keySystem parameter.

What if a key/license for the same Initialization Data (i.e. key ID) is provided more than once to addKey()?

Replace it, updating the ordering to reflect that this key ID was most recently added. In other words, simply replacing the existing key data is not sufficient. The exact algorithm is covered in addKey().

8.4. Source, Containers, and Streams

What containers and codecs are supported?

Containers and codecs are not specified. A user agent may support decryption of whichever container and codec combination(s) it wishes.

If a user agent support decryption of a container/codec combination (as reported by canPlayType()), it must also support Simple Decryption of that combination.

What if a container/codec does not support indicating the stream or a frame/block is encrypted?

The application must use addKey() to indicate the stream is encrypted and provide the key before decoding starts.

Must the container provide Initialization Data or a content key ID?

This is ideal, but the API would also support the application sending the Initialization Data or ID directly to the server or providing it to the CDM via generateKeyRequest().

What if a container/codec does not support key IDs or bit(s)?

The application will need to use some other mechanism to select the appropriate key for the content. The user agent will only be able to use one key at a time. Key rotation will be much more complex or impossible.

Can I use different keys for each stream (adaptive streaming)?

Yes, though you may want to consider the complexity and performance drawbacks. For the best user experience, you will want to provide keys for the streams to the user agent before the switch.

What elements of the source are encrypted?

This depends on the container/codec being used. This proposal should support all cases, including entirely encrypted streams, individual frames encrypted separately, groups of frames encrypted, and portions of frames encrypted. If not all blocks or frames are encrypted, the user agent should be able to easily detect this, either based on an indication in the container or the block/frame.

Must all blocks/frames in a stream be encrypted?

No, subject to container/codec limitations.

What cipher and parameters should be used for decryption?

The cipher and parameters should be implicit in or specified by the container. If some are optional, the application must know what is supported by the CDM.

What cipher and parameters should be used for Simple Decryption? Which must the user agent support?

As in the above question, these are either implicit in or specified by the container. User agents must support any default or baseline ciphers and parameters in the container specification. Practically, user agents should support all ciphers and parameters commonly used with the container.

8.5. Content and Key Protection

Can I ensure the content key is protected without working with a content protection provider?

No. Protecting the content key would require that the browser's media stack have some secret that cannot easily be obtained. This is the type of thing DRM solutions provide. Establishing a standard mechanism to support this is beyond the scope of HTML5 standards and should be deferred to specific user agent solutions. In addition, it is not something that fully open source browsers could natively support.

Content protected using this proposal without a content protection provider is still more secure and a higher barrier than providing an unencrypted file over HTTP or HTTPS. We would also argue that it is no less secure than encrypted HLS. For long streams, key rotation can provide additional protection.

It is also possible to extend the proposed specification in the future to support a more robust basic case without changing the API.

Can a user agent support multiple content protection providers?

Yes. The application will query the user agent's capabilities and select the Key System to use.

Can a user agent protect the compressed content?

Yes, this proposal naturally supports such protection.

Can a user agent protect the rendering path or protect the uncompressed content after decoding?

Yes, a user agent could use platform-specific capabilities to protect the rendering path.

9. Revision History

Version Comment
0.1a Corrects minor mistakes in 0.1.
0.1 Initial Proposal