This document lists the design goals and requirements that HTML5 video, audio and media interfaces should support for Web and TV applications. Also this document includes links to several concrete proposals that are intended to meet the requirements developed by the Web and TV Interest Group participants.
The MPTF is a subset of the Web and TV Interest Group. The goal of MPTF is to discuss requirements placed on the HTML5 video, audio and media interfaces by media formats that used for Web and TV applications. The MPTF also proposes APIs that meet these requirements. The requirements and use cases in this document are the result of discussion within the Media Pipeline Task Force of the Web and TV Interest Group. This document proposes additions to the HTML5 specification so that user agents can make bandwidth measurements and control adaptive streaming using a set of APIs in an interoperable way. This proposal extends the capability of HTML5's <video> and <audio> elements to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams. The Task Force believes all the requirements and use cases listed in this document will be next reviewd and discussed by the HTML Working Group for inclusion in the HTML specification.

Introduction

A majority of Internet traffic is now streaming video.

However, there are currently no standards or common conventions to provide commercial quality IP streaming video across different platforms and between unrelated companies.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, SHALL, SHOULD and SHOULD NOT in this specification are to be interpreted as described in RFC 2119 [[RFC2119]].

This specification only applies to one class of product: W3C Technical Reports . A number of specifications may be created to address the requirements enumerated in this document. In some cases the union of multiple parts of different specifications may be needed to address a single requirement. Nevertheless, this document speaks only of conforming specifications .

Conforming specifications are ones that address one or more requirements listed in this document. Conforming specifications should attempt to address SHOULD level requirements requirements unless there is a technically valid reason not to do so.

Terminology

Adaptive Bit Rate
Adaptive bit rate media is characterized by short independent parallel media stream segments that can be individually selected and rendered according to some selective criteria. Typically, the parallel segments are differentiated by a feature such as required bandwidth, image resolution, etc.
Common Time Base
A common time base is a time reference that can be unambiguously interpreted for synchronization purposes.
Trick Play
Trick play refers to common media playback controls such as play, stop, pause, rewind and fast forward.

MPTF Requirements for Adaptive Bit Rate Streaming

This section list the requirements that conforming specification(s) would need to adopt in order to ensure a common interface and interpretation for the playback and control of adaptive bit rate media. These requirements are the result of an interactive process of feedback and discussion within the Media Pipeline Task Force of the Web and TV Interest Group

General

Standards Compatibility

Compatibility with widely deployed standards

One of the primary purposes for standardizing the way the media elements use adaptive bitrate streaming is to enable different existing and future adaptive bitrate streming methods to work consistently with HTML5 media tags. Therefore, media tags must work with the widely deployed adaptive bitrate methods that are available now.

Media Tags

The <video> and <audio> tags should be used to specify video and audio in HTML.

In the past, the <obj> tag has been used to add non-standard functionality to HTML pages. In order to provide more consistent functionality, the <video> and <audio> elements were added to HTML. This allows for consistent handling of streaming media between different browsers and encoded with different codecs. In order to maintain this consistency, any ABR solution must define how the video and audio elements can be used for playback of adaptive delivery format media.

Common Time Reference

A common time reference must be unambiguously defined for combining tracks with different time references and for "continuous" tracks. Overlapping track segments must also be handled. (DASH may provide a reasonable model.)

Frequently, it is necessary to synchronize serveral different steaming content sources. For example, audio tracks must be synchronized with streaming video or the experience of watching the video becomes unpleasant. Synchronization is also important for advertising, closed caption and other streaming media features. Since different streaming media sources have different time references, a strategy for synchronizing these different references must be adopted.

Splicing

It should be possible to seamlessly splice content with a discontinuous timeline (such as advertising) into the presentation.

In addition to a common time reference, mapping to that common time reference must be seamless enough to enable continuous playback from sources spliced relative to different time bases.

Trick Play

Search and trick-play must be unambiguously defined in the context of this common time reference (e.g. anchor point and offset).

A common time reference is also important in the context of trick play. Pausing, advancing or rewinding media content must be done accurately and within a common reference. This is necessary in order to advance or rewind to exact locations or the adjust the playback lockation by precise increments.

Playability

The ability of the user agent to play a piece of content must be determined quickly and with reasonable accuracy (e.g. using CanPlayType(codec, level, profile) or other means)

With media content available from sources around the world, it is important to quickly determine whether various content sources can be rendered. Therefore this determinitaion must be madew with a minimum of overhead.

No ABR Method Preference

The standard interface to support adaptive bit rate streaming must not advantage one specific ABR method over another.

HTML aspires to be a level playing field. This philosophy enables innovation to flourish and allows superior solutions to become quickly implemnted and adopted. ABR media systems are and should continue to be innovative solutions within this spirit of openness. Support for different ABR systems should not require any proprietary modification of the user agent

Open Source Browsers

The standard interface to support adaptive bit rate streaming must work with "open source" browsers.

The ability for a browser vendor to implement playback of ABR media in accordance with the requirements in this document must be supported.

Common Parameters

Any parameters required for use of the ABR system must be identified and specifiable.

While specific implementations may include vendor-specific parameters for special features, the parameters required for basic playback should be publicly specified.

Common Errors

Specific errors relevant to ABR media must be identified and reportable.

While specific implementations may include vendor-specific error codes, the error codes required for basic operation and diagnosis should be publicly specified. However, the particular ABR systems to be supported is an implementation decision.

(Others)

Use Cases

This section is a non-exhaustive list of use cases that would be enabled by one (or more) specifications implementing the requirements listed above. Each use case is written according to the following template:

Ux. <TITLE>
Use case title
Description
  • High level description/overview of the goals of the use case
  • Schematic illustration (devices involved, work flows, etc.) (Optional)
Motivation
  • Explanation of the benefit to the ecosystem
  • Why existing standards cannot be used to accomplish this use case
Dependencies
Other use cases, proposals or other ongoing standardization activities which this use case is dependent on or related to.
Requirements
List of requirements implied by this Use Case.

U1. Play Adaptive Bit Rate Content

Description

A user can play adaptive bit rate content identified in media tags regardless of the particular adaptive bit rate method used to format the content. Support for the playable content formats must be provided by the browser or extensible features of the browser.

Possible implementation:

  • The user selects a content item for playback.
  • The content plays.
Motivation

There is no standard interface for adaptive bit rate content content. This leads to the implementation of multiple incompatible playback systems and interfaces. What should be standardized is:

  • an interface to specify the playback of adaptive bit rate content.
Dependencies

In order to play adaptive bit rate content, the application interface must be provided.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for particular ABR media type
Specify the ABR parameters

U2. Trick Mode with Adaptive Bit Rate Content

Description

A user can use trick-play modes (pause, rewind, fast-forward) with adaptive bit rate content regardless of the particular adaptive bit rate method used to format the content.

Possible implementation:

  • The user selects a content item for playback.
  • The user clicks a trick-play button (pause, rewind, fast-forward) in the user interface.
  • The playback of the content is changed according to the trick-play feature selected.
Motivation

Playback of media should be consistent regardless of the particular format of the content. Trick-play modes available to non-adaptive media formats should also be available to adaptive bit rate media. What should be standardized is:

  • trick-play modes should work consistently across all streaming media formats.
Dependencies

None.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for trick-play modes
Specify the ABR parameters

U3. Search Adaptive Bit Rate Content

Description

A user can search adaptive bit rate content identified in media tags to position playback at a specific point in time regardless of the particular adaptive bit rate method used to format the content.

Possible implementation:

  • The user selects a content item for playback.
  • The user selects a particular point in time in the playback of the video (e.g. 30 seconds ahead, 75% through the video, etc.).
  • The playback pointer is positioned at the specified location in time.
  • The content plays beginning at the new position.
Motivation

Playback of media should be consistent regardless of the particular format of the content. Search modes available to non-adaptive media formats should also be available to adaptive bit rate media. What should be standardized is:

  • an interface to specify searching of adaptive bit rate content.
Dependencies

None.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for particular ABR media type
Specify the ABR parameters

U4. Merge, Splice and Append Adaptive Bit Rate Content

Description

A user merge, splice and append adaptive bit rate content identified in media tags regardless of the particular adaptive bit rate method used to format the content.

Possible implementation:

  • The user selects two content items and identifies a location in time where they are to be merged (both items continue playing at the merge point), spliced (one item stops playing and the other item continues) or appended (one item plays to its end, then the other item plays from its beginning).
  • The content plays as specified.
Motivation

There is no standard interface for merging, splicing content. Content can typically be appended by queueing up the next segment, but there is no guarantee that the common time reference will be preserved. What should be standardized is:

  • an interface to specify the means to merge, splice and append adaptive bit rate content;
  • a specification on how to preserve a common time base when merging, splicing or appending adaptive bit rate content.
Dependencies

None.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for particular ABR media type
Specify the ABR parameters

U5. Continuous Adaptive Bit Rate Content

Description

A user can play a continuous stream of adaptive bit rate content identified in media tags regardless of the particular adaptive bit rate method used to format the content. Continuous content could be a stream that is continuously encoded from a live source (e.g. it has no specific finite length) or a play list that is continually getting content appended and has a common time base.

Possible implementation:

  • The user selects a "channel" for playback.
  • The channel plays continuously until the user selects a different source.
Motivation

There is no standard interface for continuous playback adaptive bit rate content. What should be standardized is:

  • an interface to specify the playback of continuous adaptive bit rate content;
  • a specification for maintaining a common time base for continuous content.
Dependencies

None.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for particular ABR media type
Specify the ABR parameters

U6. Timed Tracks and Adaptive Bit Rate Content

Description

A user can add timed tracks adaptive bit rate content at specific points in time regardless of the particular adaptive bit rate method used to format the content.

Possible implementation:

  • The user selects a content item, timed text track to be merged and a specific location in time.
  • The content item and timed text track are merged as specified.
  • The merged content plays.
Motivation

There is no standard interface for merging timed text tracks with adaptive bit rate content. What should be standardized is:

  • an interface to specify the merging of timed text tracks with adaptive bit rate content.
Dependencies

None.

Requirements
Low Level High Level
Compatibility with existing standards
Support for media tags
Support for a common time reference
Support for particular ABR media type
Specify the ABR parameters

Other Issues or Use Cases

Byte Range, Events, Events at start of request

(Others?)

Security

In the context of adaptive bit rate media, security is primarily concerned with ensuring that authorized users are able to play the media and unauthorized users are not. This may involve verifying that the content has been legally obtained. It may also mean that personally produced video is only viewable by friends and family. If viewing is intended to be restricted, a content protection system must be in place. Adaptive bit rate video should be treated the same as any video element in this regard. A content protection system for media elements has been proposed to the W3C HTML WG and is being reviewed. (Put reference here)

Next Steps

Proposals

Adaptive Bitrate calls for HTML5 <video> tag (WIP) (Duncan Rowden)

This proposal refers to work done in WHATWG regarding three implementation methods and their associated APIs.

Mediasource specification (Aaron Colwell, Kilroy Hughes, Mark Watson)

This proposal was jointly developed by Microsoft, Google and Netflix. It is comprehensive and is intended to meet the requirements described in this document.

Acknowledgements

Many thanks to the members of the Media Pipeline Task Force of the W3C Web & TV Interest Group who collaborated to create this requirements document and reviewed the proposals to be submitted to the HTML WG for inclusion in the HTML specification.