W3C

Media Source Extensions

W3C Editor's Draft 28 November 2012

This version:
http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html
Latest published version:
http://www.w3.org/TR//
Latest editor's draft:
http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html
Editors:
Aaron Colwell, Google Inc.
Adrian Bateman, Microsoft Corporation
Mark Watson, Netflix Inc.

Abstract

This proposal extends HTMLMediaElement to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the HTML Working Group as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-html-media@w3.org (subscribe, archives). All feedback is welcome.

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This proposal allows JavaScript to dynamically construct media streams for <audio> and <video>. It defines objects that allow JavaScript to pass media segments to an HTMLMediaElement. A buffering model is also included to describe how the user agent should act when different media segments are appended at different times. Byte stream specifications for WebM & ISO Base Media File Format are given to specify the expected format of media segments used with these extensions.

1.1 Goals

This proposal was designed with the following goals in mind:

1.2 Definitions

Initialization Segment

A sequence of bytes that contains all of the initialization information required to decode a sequence of media segments. This includes codec initialization data, Track ID mappings for multiplexed segments, and timestamp offsets (e.g. edit lists).

Container specific examples of initialization segments:

ISO Base Media File Format
A moov box.
WebM
The concatenation of the the EBML Header, Segment Header, Info element, and Tracks element.
Media Segment

A sequence of bytes that contain packetized & timestamped media data for a portion of the presentation timeline. Media segments are always associated with the most recently appended initialization segment.

Container specific examples of media segments:

ISO Base Media File Format
A moof box followed by one or more mdat boxes.
WebM
A Cluster element
Source Buffer

A hypothetical buffer that contains a distinct sequence of initialization segments & media segments. When media segments are passed to append() they update the state of this buffer. The source buffer only allows a single media segment to cover a specific point in the presentation timeline of each track. If a media segment gets appended that contains media data overlapping (in presentation time) with media data from an existing segment, then the new media data will override the old media data. Since media segments depend on initialization segments the source buffer is also responsible for maintaining these associations. During playback, the media element pulls segment data out of the source buffers, demultiplexes it if necessary, and enqueues it into track buffers so it will get decoded and displayed. buffered describes the time ranges that are covered by media segments in the source buffer.

Active Source Buffers

The set of source buffers that are providing the selected video track, the enabled audio tracks, and the "showing" or "hidden" text tracks. This is a subset of all the source buffers associated with a specific MediaSource object. See Changes to selected/enabled track state for details.

Track Buffer

A hypothetical buffer that represents initialization and media data for a single AudioTrack, VideoTrack, or TextTrack that has been queued for playback. This buffer may not exist in actual implementations, but it is intended to represent media data that will be decoded no matter what media segments are appended to update the source buffer. This distinction is important when considering appends that happen close to the current playback position. See Source Buffer to Track Buffer transfer for details.

Random Access Point

A position in a media segment where decoding and continuous playback can begin without relying on any previous data in the segment. For video this tends to be the location of I-frames. In the case of audio, most audio frames can be treated as a random access point. Since video tracks tend to have a more sparse distribution of random access points, the location of these points are usually considered the random access points for multiplexed streams.

Presentation Start Time

The presentation start time is the earliest time point in the presentation and specifies the initial playback position and earliest possible position. All presentations created using this specification have a presentation start time of 0. Appending media segments with negative timestamps will cause playback to terminate with a MediaError.MEDIA_ERR_DECODE error unless timestampOffset is used to make the timestamps greater than or equal to 0.

MediaSource object URL

A MediaSource object URL is a unique Blob URI created by createObjectURL(). It is used to attach a MediaSource object to an HTMLMediaElement.

These URLs are the same as what the File API specification calls a Blob URI, except that anything in the definition of that feature that refers to File and Blob objects is hereby extended to also apply to MediaSource objects.

Track ID

A Track ID is a byte stream format specific identifier that marks sections of the byte stream as being part of a specific track. The Track ID in a track description identifies which sections of a media segment belong to that track.

Track Description

A byte stream format specific structure that provides the Track ID, codec configuration, and other metadata for a single track. Each track description inside a single initialization segment must have a unique Track ID.

Coded Frame

A unit of compressed media data that has a presentation timestamp and decode timestamp. The presentation timestamp indicates when the frame should be rendered. The decode timestamp indicates when the frame needs to be decoded. If frames can be decoded out of order, then the decode timestamp must be present in the bytestream. If frames cannot be decoded out of order and a decode timestamp is not present in the bytestream, then the decode timestamp is equal to the presentation timestamp.

Parent Media Source
The parent media source of a SourceBuffer object is the MediaSource object that created it.

2. Source Buffer Model

The subsections below outline the buffering model for this proposal. It describes how to add and remove source buffers from the presentation and describes the various rules and behaviors associated with appending data to an individual source buffer. At the highest level, the web application simply creates source buffers and appends a sequence of initialization segments and media segments to update the buffer's state. The media element pulls media data out of the source buffers, plays it, and fires events just like it would if a normal URL was passed to the src attribute. The web application is expected to monitor media element events to determine when it needs to append more media segments.

2.1 Creating Source Buffers

SourceBuffer objects can be created once a MediaSource object enters the "open" state. The application calls addSourceBuffer() with a type string that indicates the format of the data it intends to append to the new SourceBuffer. If the user agent supports the format and has sufficient resources, a new SourceBuffer object is created, added to sourceBuffers, and returned by the method. If the user agent doesn't support the specified format or can't support another SourceBuffer then it will throw an appropriate exception to signal why the request couldn't be satisfied.

2.2 Removing Source Buffers

Removing a SourceBuffer with removeSourceBuffer() releases all resources associated with the object. This includes destroying the all the segment data, track buffers, and decoders. The media element will also remove the appropriate tracks from audioTracks, videoTracks, & textTracks and fire the necessary change events. Playback may become degraded or stop if the currently selected VideoTrack or the only enabled AudioTracks are removed.

2.3 Basic appending model

Updating the state of a source buffer requires appending at least one initialization segment and one or more media segments via append(). The following list outlines some of the basic rules for appending segments.

2.4 Initialization Segment constraints

To simplify the implementation and facilitate interoperability, a few constraints are placed on the initialization segments that are appended to a specific SourceBuffer:

2.5 Media Segment constraints

To simplify the implementation and facilitate interoperability, a few constraints are placed on the media segments that are appended to a specific SourceBuffer:

2.6 Appending the first Initialization Segment

Once a new SourceBuffer has been created, it expects an initialization segment to be appended first. This first segment indicates the number and type of streams contained in the media segments that follow. This allows the media element to configure the necessary decoders and output devices. This first segment can also cause a HTMLMediaElement.readyState transition to HAVE_METADATA if this is the first SourceBuffer, or if it is the first track of a specific type (i.e. first audio, first video track, or first text track). If neither of the conditions hold then the tracks for this new SourceBuffer will just appear as disabled tracks and won't affect the current HTMLMediaElement.readyState until they are selected. The media element will also add the appropriate tracks to the audioTracks, videoTracks, & textTracks collections and fire the necessary change events. The description for append() contains all the details.

2.7 Appending a Media Segment to an unbuffered region

If a media segment is appended to a time range that is not covered by existing segments in the source buffer, then its data is copied directly into the source buffer. Addition of this data may trigger HTMLMediaElement.readyState transitions depending on what other data is buffered and whether the media element has determined if it can start playback. Calls to buffered will always reflect the current TimeRanges buffered in the SourceBuffer.

2.8 Appending a Media Segment over a buffered region

There are several ways that media segments can overlap segments in the source buffer. Behavior for the different overlap situations are described below. If more than one overlap applies, then the start overlap gets resolved first, followed by any complete overlaps, and finally the end overlap. If a segment contains multiple tracks then the overlap is resolved independently for each track.

2.8.1 Complete Overlap

The figure above shows how the source buffer gets updated when a new media segment completely overlaps a segment in the buffer. In this case, the new segment completely replaces the old segment.

2.8.2 Start Overlap

The figure above shows how the source buffer gets updated when the beginning of a new media segment overlaps a segment in the buffer. In this case the new segment replaces all the old media data in the overlapping region. Since media segments are constrained to starting with random access points, this provides a seamless transition between segments.

When an audio frame in the source buffer overlaps with the start of the new media segment special behavior is required. At a minimum implementations must support dropping the old audio frame that overlaps the start of the new segment and insert silence for the small gap that is created. Higher quality implementations may support crossfading or crosslapping between the overlapping audio frames. No matter which strategy is implemented, no gaps are created in the ranges reported by buffered and playback must never stall at the overlap.

2.8.3 End Overlap

The figure above shows how the source buffer gets updated when the end of a new media segment overlaps a segment in the buffer. In this case, the media element tries to keep as much of the old segment as possible. The amount saved depends on where the closest random access point, in the old segment, is to the end of the new segment. In the case of audio, if the gap is smaller than the size of an audio frame, then the media element should insert silence for this gap and not reflect it in buffered.

An implementation may keep old segment data before the end of the new segment to avoid creating a gap if it wishes. Doing this though can significantly increase implementation complexity and could cause delays at the splice point. The key property that must be preserved is the entirety of the new segment gets added to the source buffer and it is up to the implementation how much of the old segment data is retained. The web application can use buffered to determine how much of the old segment was preserved.

2.8.4 Middle Overlap

The figure above shows how the source buffer gets updated when the new media segment is in the middle of the old segment. This condition is handled by first resolving the start overlap and then resolving the end overlap.

2.9 Source Buffer to Track Buffer transfer

The source buffer represents the media that the web application would like the media element to play. The track buffer contains the data that will actually get decoded and rendered. In most cases the track buffer will simply contain a subset of the source buffer near the current playback position. These two buffers start to diverge though when media segments that overlap or are very close to the current playback position are appended. Depending on the contents of the new media segment it may not be possible to switch to the new data immediately because there isn't a random access point close enough to the current playback position. The quality of the implementation determines how much data is considered "in the track buffer". It should transfer data to the track buffer as late as possible whilst maintaining seamless playback. Some implementations may be able to instantiate multiple decoders or decode the new data significantly faster than real-time to achieve a seamless splice immediately. Other implementations may delay until the next random access point before switching to the newly appended data. Notice that this difference in behavior is only observable when appending close to the current playback position. The track buffer represents a media subsegment, like a group of pictures or something with similar decode dependencies, that the media element commits to playing. This commitment may be influenced by a variety of things like limited decoding resources, hardware decode buffers, a jitter buffer, or the desire to limit implementation complexity.

Here is an example to help clarify the role of the track buffer. Say the current playback position has a timestamp of 8 and the media element pulled frames with timestamp 9 & 10 into the track buffer. The web application then appends a higher quality media segment that starts with a random access point at timestamp 9. The source buffer will get updated with the higher quality data, but the media element won't be able to switch to this higher quality data until the next random access point at timestamp 20. This is because a frame for timestamp 9 is already in the track buffer. As you can see the track buffer represents the "point of no return." for decoding. If a seek occurs the media element may choose to use the higher quality data since a seek might imply flushing the track buffer and the user expects a break in playback.

2.10 Media Segment Eviction

When a new media segment is appended, memory constraints may cause previously appended segments to get evicted from the source buffer. The eviction algorithm is implementation dependent, but segments that aren't likely to be needed soon are the most likely to get evicted. The buffered attribute allows the web application to monitor what time ranges are currently buffered in the source buffer.

2.11 Applying Timestamp Offsets

For some use cases like ad-insertion or seamless playlists, the web application may want to insert a media segment in the presentation timeline at a location that is different than what the internal timestamps indicate. This can be accomplished by using the timestampOffset attribute on the SourceBuffer object. The value of timestampOffset is added to all timestamps inside a media segment before the contents of that segment are added to the source buffer. The timestampOffset applies to an entire media segment. An exception is thrown if the application tries to update the attribute when only part of a media segment has been appended. Both positive or negative offsets can be assigned to timestampOffset. If an offset causes a media segment timestamp to get converted to a time before the presentation start time, playback will terminate with a MediaError.MEDIA_ERR_DECODE error.

Here is a simple example to clarify how timestampOffset can be used. Say I have two sounds I want to play in sequence. The first sound is 5 seconds long and the second one is 10 seconds. Both sound files have timestamps that start at 0. First append the initialization segment and all media segments for the first sound. Now set timestampOffset to 5 seconds. Finally append the initialization segment and media segments for the second sound. This will result in a 15 second presentation that plays the two sounds in sequence.

3. MediaSource Object

The MediaSource object represents a source of media data for an HTMLMediaElement. It keeps track of the readyState for this source as well as a list of SourceBuffer objects that can be used to add media data to the presentation. MediaSource objects are created by the web application and then attached to an HTMLMediaElement. The application uses the SourceBuffer objects in sourceBuffers to add media data to this source. The HTMLMediaElement fetches this media data from the MediaSource object when it is needed during playback.

enum ReadyState {
    "closed",
    "open",
    "ended"
};
Enumeration description
closed Indicates the source is not currently attached to a media element.
open The source has been opened by a media element and is ready for data to be appended to the SourceBuffer objects in sourceBuffers.
ended The source is still attached to a media element, but endOfStream() has been called. Appending data to SourceBuffer objects in this state is not allowed.
enum EndOfStreamError {
    "network",
    "decode"
};
Enumeration description
network

Terminates playback and signals that a network error has occured.

Note

If the JavaScript fetching media data encounters a network error it should use this status code to terminate playback.

decode

Terminates playback and signals that a decoding error has occured.

Note

If the JavaScript code fetching media data has problems parsing the data it should use this status code to terminate playback.

[Constructor]
interface MediaSource : EventTarget {
    readonly attribute SourceBufferList    sourceBuffers;
    readonly attribute SourceBufferList    activeSourceBuffers;
             attribute unrestricted double duration;
    SourceBuffer addSourceBuffer (DOMString type);
    void         removeSourceBuffer (SourceBuffer sourceBuffer);
    readonly attribute ReadyState          readyState;
    void         endOfStream (optional EndOfStreamError error);
    static bool  isTypeSupported (DOMString type);
};

3.1 Attributes

activeSourceBuffers of type SourceBufferList, readonly
Contains the subset of sourceBuffers that represents the active source buffers.
duration of type unrestricted double

Allows the web application to set the presentation duration. The duration is initially set to NaN when the MediaSource object is created.

On getting, run the following steps:

  1. If the readyState attribute is "closed" then return NaN and abort these steps.
  2. Return the current value of the attribute.

On setting, run the following steps:

  1. If the value being set is negative or NaN then throw an INVALID_ACCESS_ERR exception and abort these steps.
  2. If the readyState attribute is not "open" then throw an INVALID_STATE_ERR exception and abort these steps.
  3. Run the duration change algorithm with new duration set to the value being set.
    Note

    append() and endOfStream() can update the duration under certain circumstances.

readyState of type ReadyState, readonly

Indicates the current state of the MediaSource object. When the MediaSource is created readyState must be set to "closed".

sourceBuffers of type SourceBufferList, readonly
Contains the list of SourceBuffer objects associated with this MediaSource. When readyState equals "closed" this list will be empty. Once readyState transitions to "open" SourceBuffer objects can be added to this list by using addSourceBuffer().

3.2 Methods

addSourceBuffer

Adds a new SourceBuffer to sourceBuffers.

When this method is invoked, the user agent must run the following steps:

  1. If type is null or an empty string then throw an INVALID_ACCESS_ERR exception and abort these steps.
  2. If type contains a MIME type that is not supported or contains a MIME type that is not supported with the types specified for the other SourceBuffer objects in sourceBuffers, then throw a NOT_SUPPORTED_ERR exception and abort these steps.
  3. If the user agent can't handle any more SourceBuffer objects then throw a QUOTA_EXCEEDED_ERR exception and abort these steps.
  4. If the readyState attribute is not in the "open" state then throw an INVALID_STATE_ERR exception and abort these steps.
  5. Create a new SourceBuffer object and associated resources.
  6. Add the new object to sourceBuffers and queue a task to fire a simple event named addsourcebuffer at sourceBuffers.
  7. Return the new object.
ParameterTypeNullableOptionalDescription
typeDOMString
Return type: SourceBuffer
endOfStream

Signals the end of the stream.

When this method is invoked, the user agent must run the following steps:

  1. If the readyState attribute is not in the "open" state then throw an INVALID_STATE_ERR exception and abort these steps.
  2. Change the readyState attribute value to "ended".
  3. Queue a task to fire a simple event named sourceended at the MediaSource.
  4. If error is not set, null, or an empty string
    1. Run the duration change algorithm with new duration set to the highest end timestamp across all SourceBuffer objects in sourceBuffers.
      Note

      This allows the duration to properly reflect the end of the appended media segments. For example, if the duration was explicitly set to 10 seconds and only media segments for 0 to 5 seconds were appended before endOfStream() was called, then the duration will get updated to 5 seconds.

    2. Notify the media element that it now has all of the media data. Playback should continue until all the media passed in via append() has been played.
    If error is set to "network"
    If the HTMLMediaElement.readyState attribute equals HAVE_NOTHING
    Run the steps of the resource fetch algorithm.
    If the HTMLMediaElement.readyState attribute is greater than HAVE_NOTHING
    Run the "If the connection is interrupted after some media data has been received, causing the user agent to give up trying to fetch the resource" steps of the resource fetch algorithm.
    If error is set to "decode"
    If the HTMLMediaElement.readyState attribute equals HAVE_NOTHING
    Run the "If the media data can be fetched but is found by inspection to be in an unsupported format, or can otherwise not be rendered at all" steps of the resource fetch algorithm.
    If the HTMLMediaElement.readyState attribute is greater than HAVE_NOTHING
    Run the media data is corrupted steps of the resource fetch algorithm.
    Otherwise
    Throw an INVALID_ACCESS_ERR exception.
ParameterTypeNullableOptionalDescription
errorEndOfStreamError
Return type: void
isTypeSupported, static

Check to see whether the MediaSource is capable of creating SourceBuffer objects for the the specified MIME type.

When this method is invoked, the user agent must run the following steps:

  1. If type an empty string, then return false.
  2. If type does not contain a valid MIME type string, then return false.
  3. If type contains a media type or media subtype that the MediaSource does not support, then return false.
  4. If type contains at a codec that the MediaSource does not support, then return false.
  5. If the MediaSource does not support the specified combination of media type, media subtype, and codecs then return false.
  6. Return true.
Note

If true is returned from this method, it only indicates that the MediaSource implementation is capable of creating SourceBuffer objects for the specified MIME type. A addSourceBuffer() call may still fail if sufficient resources are not available to support the addition of a new SourceBuffer.

Note

This method returning true implies that HTMLMediaElement.canPlayType() will return "maybe" or "probably" since it does not make sense for a MediaSource to support a type the HTMLMediaElement knows it cannot play.

ParameterTypeNullableOptionalDescription
typeDOMString
Return type: bool
removeSourceBuffer

Removes a SourceBuffer from sourceBuffers.

When this method is invoked, the user agent must run the following steps:

  1. If sourceBuffer is null then throw an INVALID_ACCESS_ERR exception and abort these steps.
  2. If sourceBuffer specifies an object that is not in sourceBuffers then throw a NOT_FOUND_ERR exception and abort these steps.
  3. Remove track information from audioTracks, videoTracks, and textTracks for all tracks associated with sourceBuffer and queue a task to fire a simple event named change at the modified lists.
  4. If sourceBuffer is in activeSourceBuffers, then remove it from activeSourceBuffers and queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers.
  5. Remove sourceBuffer from sourceBuffers and queue a task to fire a simple event named removesourcebuffer at sourceBuffers.
  6. Destroy all resources for sourceBuffer.
ParameterTypeNullableOptionalDescription
sourceBufferSourceBuffer
Return type: void

3.3 Event Summary

Event name Interface Dispatched when...
sourceopen Event When readyState transitions from "closed" to "open" or from "ended" to "open".
sourceended Event When readyState transitions from "open" to "ended".
sourceclose Event When readyState transitions from "open" to "closed" or "ended" to "closed".

3.4 Algorithms

3.4.1 Attaching to a media element

A MediaSource object can be attached to a media element by assigning a MediaSource object URL to the media element src attribute or the src attribute of a <source> inside a media element. A MediaSource object URL is created by passing a MediaSource object to createObjectURL().

If the resource fetch algorithm absolute URL matches the MediaSource object URL, run the following steps right before the "Perform a potentially CORS-enabled fetch" step in the resource fetch algorithm.

    If readyState is NOT set to "closed"
    Run the steps of the resource fetch algorithm.
    Otherwise
    1. Set the readyState attribute to "open".
    2. Queue a task to fire a simple event named sourceopen at the MediaSource.
    3. Allow the resource fetch algorithm to progress based on data passed in via append().

3.4.2 Detaching from a media element

The following steps are run in any case where the media element is going to transition to NETWORK_EMPTY and queue a task to fire a simple event named emptied at the media element. These steps should be run right before the transition.

  1. Set the readyState attribute to "closed".
  2. Set the duration attribute to NaN.
  3. Remove all the SourceBuffer objects from activeSourceBuffers.
  4. Queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers.
  5. Remove all the SourceBuffer objects from sourceBuffers.
  6. Queue a task to fire a simple event named removesourcebuffer at sourceBuffers.
  7. Queue a task to fire a simple event named sourceclose at the MediaSource.

3.4.3 Seeking

Run the following steps as part of the "Wait until the user agent has established whether or not the media data for the new playback position is available, and, if it is, until it has decoded enough data to play back that position" step of the seek algorithm:

  1. The media element looks for media segments containing the new playback position in each SourceBuffer object in activeSourceBuffers.
  2. If one or more of the objects in activeSourceBuffers is missing media segments for the new playback position
    1. Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.
    2. The media element waits for the necessary media segments to be passed to append(). The web application can use buffered to determine what the media element needs to resume playback.
    Otherwise
    Continue
  3. The media element resets all decoders and initializes each one with data from the appropriate initialization segment.
  4. The media element feeds data from the media segments into the decoders until the new playback position is reached.
  5. Resume the seek algorithm at the "Await a stable state" step.

3.4.4 SourceBuffer Monitoring

The following steps are periodically run during playback to make sure that all of the SourceBuffer objects in activeSourceBuffers have enough data to ensure uninterrupted playback. Appending new segments and changes to activeSourceBuffers also cause these steps to run because they affect the conditions that trigger state transitions. The web application can monitor changes in HTMLMediaElement.readyState to drive media segment appending.

If buffered for all objects in activeSourceBuffers do not contain TimeRanges for the current playback position:
  1. Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.
  2. If this is the first transition to HAVE_METADATA, then queue a task to fire a simple event named loadedmetadata at the media element.
  3. Abort these steps.
If buffered for all objects in activeSourceBuffers contain TimeRanges that include the current playback position and enough data to ensure uninterrupted playback:
  1. Set the HTMLMediaElement.readyState attribute to HAVE_ENOUGH_DATA.
  2. Queue a task to fire a simple event named canplaythrough at the media element.
  3. Playback may resume at this point if it was previously suspended by a transition to HAVE_CURRENT_DATA.
  4. Abort these steps.
If buffered for at least one object in activeSourceBuffers contains a TimeRange that includes the current playback position but not enough data to ensure uninterrupted playback:
  1. Set the HTMLMediaElement.readyState attribute to HAVE_FUTURE_DATA.
  2. If the previous value of HTMLMediaElement.readyState was less than HAVE_FUTURE_DATA, then queue a task to fire a simple event named canplay at the media element.
  3. Playback may resume at this point if it was previously suspended by a transition to HAVE_CURRENT_DATA.
  4. Abort these steps.
If buffered for at least one object in activeSourceBuffers contains a TimeRange that ends at the current playback position and does not have a range covering the time immediately after the current position:
  1. Set the HTMLMediaElement.readyState attribute to HAVE_CURRENT_DATA.
  2. If this is the first transition to HAVE_CURRENT_DATA, then queue a task to fire a simple event named loadeddata at the media element.
  3. Playback is suspended at this point since the media element doesn't have enough data to advance the timeline.
  4. Abort these steps.

3.4.5 Changes to selected/enabled track state

During playback activeSourceBuffers needs to be updated if the selected video track, the enabled audio tracks, or a text track mode changes. When one or more of these changes occur the following steps need to be followed.

If the selected video track changes:
  1. If the SourceBuffer associated with the previously selected video track is not associated with any other enabled tracks, run the following steps:
    1. Remove the SourceBuffer from activeSourceBuffers.
    2. Queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers
  2. If the SourceBuffer associated with the newly selected video track is not already in activeSourceBuffers, run the following steps:
    1. Add the SourceBuffer to activeSourceBuffers.
    2. Queue a task to fire a simple event named addsourcebuffer at activeSourceBuffers
If an audio track becomes disabled and the SourceBuffer associated with this track is not associated with any other enabled or selected track
  1. Remove the SourceBuffer associated with the audio track from activeSourceBuffers
  2. Queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers
If an audio track becomes enabled and the SourceBuffer associated with this track is not already in activeSourceBuffers
  1. Add the SourceBuffer associated with the audio track to activeSourceBuffers
  2. Queue a task to fire a simple event named addsourcebuffer at activeSourceBuffers
If a text track mode becomes "disabled" and the SourceBuffer associated with this track is not associated with any other enabled or selected track
  1. Remove the SourceBuffer associated with the text track from activeSourceBuffers
  2. Queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers
If a text track mode becomes "showing" or "hidden" and the SourceBuffer associated with this track is not already in activeSourceBuffers
  1. Add the SourceBuffer associated with the text track to activeSourceBuffers
  2. Queue a task to fire a simple event named addsourcebuffer at activeSourceBuffers

3.4.6 Duration change

Follow these steps when duration needs to change to a new duration.

  1. If the current value of duration is equal to new duration, then abort these steps.
  2. Set old duration to the current value of duration.
  3. Update duration to new duration.
  4. If the new duration is less than old duration, then call remove(new duration, old duration) on all objects in sourceBuffers.
    Note

    This preserves audio frames that start before and end after the duration. The user agent must end playback at duration even if the audio frame extends beyond this time.

  5. Update the media controller duration to new duration and run the HTMLMediaElement duration change algorithm.

4. SourceBuffer Object

interface SourceBuffer : EventTarget {
    readonly attribute TimeRanges buffered;
             attribute double     timestampOffset;
    void append (Uint8Array data);
    void abort ();
    void remove (double start, double end);
};

4.1 Attributes

buffered of type TimeRanges, readonly

Indicates what TimeRanges are buffered in the SourceBuffer.

When the attribute is read the following steps must occur:

  1. If this object has been removed from the sourceBuffers attribute of the parent media source then throw an INVALID_STATE_ERR exception and abort these steps.
  2. Return a new static normalized TimeRanges object for the media segments buffered.
timestampOffset of type double

Controls the offset applied to timestamps inside subsequent media segments that are appended to this SourceBuffer. The timestampOffset is initially set to 0 which indicates that no offset is being applied.

On getting, the initial value or the last value that was successfully set is returned.

On setting, run following steps:

  1. If this object has been removed from the sourceBuffers attribute of the parent media source, then throw an INVALID_STATE_ERR exception and abort these steps.
  2. If the readyState attribute of the parent media source is not in the "open" state, then throw an INVALID_STATE_ERR exception and abort these steps.
  3. If this object is waiting for the end of a media segment to be appended, then throw an INVALID_STATE_ERR and abort these steps.
  4. Update the attribute to the new value.

4.2 Methods

abort

Aborts the current segment and resets the segment parser.

When this method is invoked, the user agent must run the following steps:

  1. If this object has been removed from the sourceBuffers attribute of the parent media source then throw an INVALID_STATE_ERR exception and abort these steps.
  2. If the readyState attribute of the parent media source is not in the "open" state then throw an INVALID_STATE_ERR exception and abort these steps.
  3. The media element aborts parsing the current segment.
  4. If the append state equals PARSING_MEDIA_SEGMENT and the input buffer contains some complete coded frames, then run the coded frame processing algorithm as if the media segment only contained these frames.
  5. Remove all bytes from the input buffer.
  6. Set append state to WAITING_FOR_SEGMENT.
No parameters.
Return type: void
append

Appends segment data to the source buffer.

When this method is invoked, the user agent must run the following steps:

  1. If data is null then throw an INVALID_ACCESS_ERR exception and abort these steps.
  2. If this object has been removed from the sourceBuffers attribute of the parent media source then throw an INVALID_STATE_ERR exception and abort these steps.
  3. If the readyState attribute of the parent media source is in the "closed" state then throw an INVALID_STATE_ERR exception and abort these steps.
  4. If the readyState attribute of the parent media source is in the "ended" state then run the following steps:

    1. Set the readyState attribute of the parent media source to "open"
    2. Queue a task to fire a simple event named sourceopen at the parent media source .
  5. If data.byteLength is 0, then abort these steps.
  6. Add data to the end of the input buffer
  7. Run the segment parser loop.
ParameterTypeNullableOptionalDescription
dataUint8Array
Return type: void
remove

Removes media for a specific time range.

When this method is invoked, the user agent must run the following steps:

  1. If start is negative or greater than duration, then throw an INVALID_ACCESS_ERR exception and abort these steps.
  2. If end is less than or equal to start, then throw an INVALID_ACCESS_ERR exception and abort these steps.
  3. If this object has been removed from the sourceBuffers attribute of the parent media source then throw an INVALID_STATE_ERR exception and abort these steps.
  4. If the readyState attribute of the parent media source is not in the "open" state then throw an INVALID_STATE_ERR exception and abort these steps.
  5. For each track in this source buffer, run the following steps:

    1. Let remove end timestamp be the current value of duration
    2. If this track has a random access point timestamp that is greater than or equal to end, then update remove end timestamp to that timestamp.

      Note

      Random access point timestamps can be different across tracks because the dependencies between coded frames within a track are usually different than the dependencies in another track.

    3. Remove all media data, for this track, that contain starting timestamps greater than or equal to start and less than the remove end timestamp.
    4. If this object is in activeSourceBuffers, the current playback position is greater than or equal to start and less than the remove end timestamp, and HTMLMediaElement.readyState is greater than HAVE_METADATA, then set the HTMLMediaElement.readyState attribute to HAVE_METADATA and stall playback.

      Note

      This transition occurs because media data for the current position has been removed. Playback cannot progress until media for the current playback position is appended or the selected/enabled tracks change.

ParameterTypeNullableOptionalDescription
startdouble
enddouble
Return type: void

4.3 Algorithms

4.3.1 Segment Parser Loop

All SourceBuffer objects have an internal append state variable that keeps track of the high-level segment parsing state. It is initially set to WAITING_FOR_SEGMENT and can transition to the following states as data is appended.

Append state name Description
WAITING_FOR_SEGMENT Waiting for the start of an initialization segment or media segment to be appended.
PARSING_INIT_SEGMENT Currently parsing an initialization segment.
PARSING_MEDIA_SEGMENT Currently parsing a media segment.

The input buffer is a byte buffer that is used to hold unparsed bytes across append() calls. The buffer is empty when the SourceBuffer object is created.

While the input buffer is not empty, run the following steps in a loop:

  1. If the input buffer starts with bytes that violate the byte stream format specifications, then call endOfStream("decode"), and abort this algorithm.
  2. Remove any bytes that the byte stream format specifications say should be ignored from the start of the input buffer.
  3. If the append state equals WAITING_FOR_SEGMENT, then run the following steps:

    1. If the beginning of the input buffer indicates the start of an initialization segment, set the append state to PARSING_INIT_SEGMENT.
    2. If the beginning of the input buffer indicates the start of an media segment, set append state to PARSING_MEDIA_SEGMENT.
    3. Return to the top of the loop.
  4. If the append state equals PARSING_INIT_SEGMENT, then run the following steps:

    1. If the input buffer does not contain a complete initialization segment yet, then exit the loop.
    2. Run the initialization segment received algorithm.
    3. Remove the initialization segment bytes from the beginning of the input buffer.
    4. Set append state to WAITING_FOR_SEGMENT.
    5. Return to the top of the loop.
  5. If the append state equals PARSING_MEDIA_SEGMENT, then run the following steps:

    1. If the input buffer does not contain a complete media segment header yet, then exit the loop.

      Note

      Implementations may choose to implement this state as an incremental parser so that it is not necessary to have the entire media segment before running the coded frame processing algorithm.

    2. Run the coded frame processing algorithm.
    3. Remove the media segment bytes from the beginning of the input buffer.
    4. Set append state to WAITING_FOR_SEGMENT.

      Note

      Incremental parsers should only do this transition after the entire media segment has been received.

    5. Return to the top of the loop.

4.3.2 Initialization Segment Received

The following steps are run when the segment parser loop successfully parses a complete initialization segment:

  1. Update the duration attribute if it currently equals NaN:
  2. If the initialization segment contains a duration:
    Run the duration change algorithm with new duration set to the duration in the initialization segment.
    Otherwise:
    Run the duration change algorithm with new duration set to positive Infinity.
  3. Handle state transitions:
  4. If the HTMLMediaElement.readyState attribute is HAVE_NOTHING:
    1. Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.
    2. Queue a task to fire a simple event named loadedmetadata at the media element.
    If the HTMLMediaElement.readyState attribute is greater than HAVE_CURRENT_DATA and the initialization segment contains the first video or first audio track in the presentation:
    Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.
    Otherwise:
    Continue
  5. Update audioTracks
  6. If initialization segment contains the first audio track:
    1. Add an AudioTrack and mark it as enabled.
    2. Add this SourceBuffer to activeSourceBuffers.
    If initialization segment contains audio tracks beyond those already in the presentation:
    Add a disabled AudioTrack for each audio track in the initialization segment.
  7. Update videoTracks:
  8. If initialization segment contains the first video track:
    1. Add a VideoTrack and mark it as selected.
    2. Add this SourceBuffer to activeSourceBuffers.
    If initialization segment contains the video tracks beyond those already in the presentation:
    Add a disabled VideoTrack for each video track in the initialization segment.
  9. Update textTracks
    1. Add a TextTrack for each text track in the initialization segment.
    2. If the text track mode is "showing" or "hidden" then add this SourceBuffer to activeSourceBuffers.

4.3.3 Coded Frame Processing

When a complete coded frame has been parsed by the segment parser loop then the following steps are run:

  1. For each coded frame in the media segment run the following steps:

    1. Let presentation timestamp be a double precision floating point representation of the coded frame's presentation timestamp.
    2. Let decode timestamp be a double precision floating point representation of the coded frame's decode timestamp.
    3. If timestampOffset is not 0, then run the following steps:

      1. Add timestampOffset to the presentation timestamp.
      2. Add timestampOffset to the decode timestamp.
      3. If the presentation timestamp or decode timestamp is less than the presentation start time, then call endOfStream("decode"), and abort these steps.
    4. Add the coded frame with the presentation timestamp and decode timestamp, to the source buffer.
  2. If the HTMLMediaElement.readyState attribute is HAVE_METADATA and the new coded frames cause all objects in activeSourceBuffers to have media data for the current playback position, then run the following steps:

    1. Set the HTMLMediaElement.readyState attribute to HAVE_CURRENT_DATA.
    2. If this is the first transition to HAVE_CURRENT_DATA, then queue a task to fire a simple event named loadeddata at the media element.
  3. If the HTMLMediaElement.readyState attribute is HAVE_CURRENT_DATA and the new coded frames cause all objects in activeSourceBuffers to have media data beyond the current playback position, then run the following steps:

    1. Set the HTMLMediaElement.readyState attribute to HAVE_FUTURE_DATA.
    2. Queue a task to fire a simple event named canplay at the media element.
  4. If the HTMLMediaElement.readyState attribute is HAVE_FUTURE_DATA and the new coded frames cause all objects in activeSourceBuffers to have enough data to start playback, then run the following steps:

    1. Set the HTMLMediaElement.readyState attribute to HAVE_ENOUGH_DATA.
    2. Queue a task to fire a simple event named canplaythrough at the media element.
  5. If the media segment contains data beyond the current duration, then run the duration change algorithm with new duration set to the maximum of the current duration and the highest end timestamp reported by HTMLMediaElement.buffered.

5. SourceBufferList Object

SourceBufferList is a simple container object for SourceBuffer objects. It provides read-only array access and fires events when the list is modified.

interface SourceBufferList : EventTarget {
    readonly attribute unsigned long length;
    getter SourceBuffer (unsigned long index);
};

5.1 Attributes

length of type unsigned long, readonly

Indicates the number of SourceBuffer objects in the list.

5.2 Methods

SourceBuffer

Allows the SourceBuffer objects in the list to be accessed with an array operator (i.e. []).

When this method is invoked, the user agent must run the following steps:

  1. If index is greater than or equal to the length attribute then return undefined and abort these steps.
  2. Return the index'th SourceBuffer object in the list.
ParameterTypeNullableOptionalDescription
indexunsigned long
Return type: getter

5.3 Event Summary

Event name Interface Dispatched when...
addsourcebuffer Event When a SourceBuffer is added to the list.
removesourcebuffer Event When a SourceBuffer is removed from the list.

6. URL Object

partial interface URL {
    static DOMString createObjectURL (MediaSource mediaSource);
};

6.1 Methods

createObjectURL, static

Creates URLs for MediaSource objects.

When this method is invoked, the user agent must run the following steps:

  1. If mediaSource is NULL the return null.
  2. Return a unique MediaSource object URL that can be used to dereference the mediaSource argument, and run the rest of the algorithm asynchronously.
  3. provide a stable state
  4. Revoke the MediaSource object URL by calling revokeObjectURL() on it.
Note

This algorithm is intended to mirror the behavior of the File API createObjectURL() method with autoRevoke set to true.

ParameterTypeNullableOptionalDescription
mediaSourceMediaSource
Return type: DOMString

7. HTMLMediaElement attributes

This section specifies what existing attributes on the HTMLMediaElement should return when a MediaSource is attached to the element.

The HTMLMediaElement.seekable attribute returns a new static normalized TimeRanges object created based on the following steps:

If duration equals NaN
Return an empty TimeRanges object.
If duration equals positive Infinity
Return a single range with a start time of 0 and an end time equal to the highest end time reported by the HTMLMediaElement.buffered attribute.
Otherwise
Return a single range with a start time of 0 and an end time equal to duration.

The HTMLMediaElement.buffered attribute returns a new static normalized TimeRanges object created based on the following steps:

  1. Let active ranges be the ranges returned by buffered for each SourceBuffer object in activeSourceBuffers.
  2. Let intersection range be the intersection of the active ranges.
  3. If readyState is "ended", then run the following steps:

    1. Let highest end time be the largest end time in the active ranges.
    2. Let highest intersection end time be the highest end time in the intersection range.
    3. If the highest intersection end time is less than the highest end time, then update the intersection range so that the highest intersection end time equals the highest end time.
  4. Return the intersection range.

8. Byte Stream Formats

The bytes provided through append() for a SourceBuffer form a logical byte stream. The format of this byte stream depends on the media container format in use and is defined in a byte stream format specification. Byte stream format specifications based on WebM and the ISO Base Media File Format are provided below. If these formats are supported then the byte stream formats described below must be supported.

This section provides general requirements for all byte stream formats:

Byte stream specifications must at a minimum define constraints which ensure that the above requirements hold. Additional constraints may be defined, for example to simplify implementation.

8.1 WebM Byte Streams

This section defines segment formats for implementations that choose to support WebM.

8.1.1 Initialization Segments

A WebM initialization segment must contain a subset of the elements at the start of a typical WebM file.

The following rules apply to WebM initialization segments:

  1. The initialization segment must start with an EBML Header element, followed by a Segment header.
  2. The size value in the Segment header must signal an "unknown size" or contain a value large enough to include the Segment Information and Tracks elements that follow.
  3. A Segment Information element and a Tracks element must appear, in that order, after the Segment header and before any further EBML Header or Cluster elements.
  4. Any elements other than an EBML Header or a Cluster that occur before, in between, or after the Segment Information and Tracks elements are ignored.

8.1.2 Media Segments

A WebM media segment is a single Cluster element.

The following rules apply to WebM media segments:

  1. The Timecode element in the Cluster contains a presentation timestamp in TimecodeScale units.
  2. The TimecodeScale in the WebM initialization segment most recently appended applies to all timestamps in the Cluster
  3. The Cluster header may contain an "unknown" size value. If it does then the end of the cluster is reached when another Cluster header or an element header that indicates the start of an WebM initialization segment is encountered.
  4. Block & SimpleBlock elements must be in time increasing order consistent with the WebM spec.
  5. If the most recent WebM initialization segment describes multiple tracks, then blocks from all the tracks must be interleaved in time increasing order. At least one block from all audio and video tracks must be present.
  6. Cues or Chapters elements may follow a Cluster element. These elements must be accepted and ignored by the user agent.

8.1.3 Random Access Points

A SimpleBlock element with its Keyframe flag set signals the location of a random access point for that track. Media segments containing multiple tracks are only considered a random access point if the first SimpleBlock for each track has its Keyframe flag set. The order of the multiplexed blocks must conform to the WebM Muxer Guidelines.

8.2 ISO Base Media File Format Byte Streams

This section defines segment formats for implementations that choose to support the ISO Base Media File Format ISO/IEC 14496-12 (ISO BMFF).

8.2.1 Initialization Segments

An ISO BMFF initialization segment must contain a single Movie Header Box (moov). The tracks in the Movie Header Box must not contain any samples (i.e. the entry_count in the stts, stsc and stco boxes must be set to zero). A Movie Extends (mvex) box must be contained in the Movie Header Box to indicate that Movie Fragments are to be expected.

The initialization segment may contain Edit Boxes (edts) which provide a mapping of composition times for each track to the global presentation time.

8.2.2 Media Segments

An ISO BMFF media segment must contain a single Movie Fragment Box (moof) followed by one or more Media Data Boxes (mdat).

The following rules apply to ISO BMFF media segments:

  1. The Movie Fragment Box must contain at least one Track Fragment Box (traf).
  2. The Movie Fragment Box must use movie-fragment relative addressing and the flag default-base-is-moof must be set; absolute byte-offsets must not be used.
  3. External data references must not be used.
  4. If the Movie Fragment contains multiple tracks, the duration by which each track extends should be as close to equal as practical.
  5. Each Track Fragment Box must contain a Track Fragment Decode Time Box (tfdt)
  6. The first sample in each Track Fragment Run Box (trun) must indicate that the sample is a random access point.
  7. The Media Data Boxes must contain all the samples referenced by the Track Fragment Run Boxes (trun) of the Movie Fragment Box.

8.2.3 Random Access Points

A random access point as defined in this specification corresponds to a Stream Access Point of type 1 or 2 as defined in Annex I of ISO/IEC 14496-12.

9. Examples

Example use of the Media Source Extensions

<script>
  function onSourceOpen(videoTag, e) {
    var mediaSource = e.target;
    var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"');

    videoTag.addEventListener('seeking', onSeeking.bind(videoTag, mediaSource));
    videoTag.addEventListener('progress', onProgress.bind(videoTag, mediaSource));

    var initSegment = GetInitializationSegment();

    if (initSegment == null) {
      // Error fetching the initialization segment. Signal end of stream with an error.
      mediaSource.endOfStream("network");
      return;
    }

    // Append the initialization segment.
    sourceBuffer.append(initSegment);

    // Append some initial media data.
    appendNextMediaSegment(mediaSource);
  }

  function appendNextMediaSegment(mediaSource) {
    if (mediaSource.readyState == "ended")
      return;

    // If we have run out of stream data, then signal end of stream.
    if (!HaveMoreMediaSegments()) {
      mediaSource.endOfStream();
      return;
    }

    var mediaSegment = GetNextMediaSegment();

    if (!mediaSegment) {
      // Error fetching the next media segment.
      mediaSource.endOfStream("network");
      return;
    }

    mediaSource.sourceBuffers[0].append(mediaSegment);
  }

  function onSeeking(mediaSource, e) {
    var video = e.target;

    // Abort current segment append.
    mediaSource.sourceBuffers[0].abort();

    // Notify the media segment loading code to start fetching data at the
    // new playback position.
    SeekToMediaSegmentAt(video.currentTime);

    // Append media segments from the new playback position.
    appendNextMediaSegment(mediaSource);
    appendNextMediaSegment(mediaSource);
  }

  function onProgress(mediaSource, e) {
    appendNextMediaSegment(mediaSource);
  }
</script>

<video id="v" autoplay> </video>

<script>
  var video = document.getElementById('v');
  var mediaSource = new MediaSource();
  mediaSource.addEventListener('sourceopen', onSourceOpen.bind(this, video));
  video.src = window.URL.createObjectURL(mediaSource);
</script>
          

10. Revision History

Version Comment
28 November 2012
  • Added transition to HAVE_METADATA when current playback position is removed.
  • Added remove() calls to duration change algorithm.
  • Added MediaSource.isTypeSupported() method.
  • Remove initialization segments are optional text.
09 November 2012 Converted document to ReSpec.
18 October 2012 Refactored SourceBuffer.append() & added SourceBuffer.remove().
8 October 2012
  • Defined what HTMLMediaElement.seekable and HTMLMediaElement.buffered should return.
  • Updated seeking algorithm to run inside Step 10 of the HTMLMediaElement seeking algorithm.
  • Removed transition from "ended" to "open" in the seeking algorithm.
  • Clarified all the event targets.
1 October 2012 Fixed various addsourcebuffer & removesourcebuffer bugs and allow append() in ended state.
13 September 2012 Updated endOfStream() behavior to change based on the value of HTMLMediaElement.readyState.
24 August 2012
  • Added early abort on to duration change algorithm.
  • Added createObjectURL() IDL & algorithm.
  • Added Track ID & Track description definitions.
  • Rewrote start overlap for audio frames text.
  • Removed rendering silence requirement from section 2.5.
22 August 2012
  • Clarified WebM byte stream requirements.
  • Clarified SourceBuffer.buffered return value.
  • Clarified addsourcebuffer & removesourcebuffer event targets.
  • Clarified when media source attaches to the HTMLMediaElement.
  • Introduced duration change algorithm and update relevant algorithms to use it.
17 August 2012 Minor editorial fixes.
09 August 2012 Change presentation start time to always be 0 instead of using format specific rules about the first media segment appended.
30 July 2012 Added SourceBuffer.timestampOffset and MediaSource.duration.
17 July 2012 Replaced SourceBufferList.remove() with MediaSource.removeSourceBuffer().
02 July 2012 Converted to the object-oriented API
26 June 2012 Converted to Editor's draft.
0.5 Minor updates before proposing to W3C HTML-WG.
0.4 Major revision. Adding source IDs, defining buffer model, and clarifying byte stream formats.
0.3 Minor text updates.
0.2 Updates to reflect initial WebKit implementation.
0.1 Initial Proposal