W3C

Proposal: Media Capture and Streams Settings API v5

Editor's Draft 30 November 2012

This is an out-of-date proposal. Click here for a newer version (v6)

Author:
Travis Leithead, Microsoft

Abstract

This proposal describes additions and suggested changes to the Media Capture and Streams specification in order to support the goal of device settings retrieval and modification. This proposal incorporates feedback from the W3C TPAC 2012 event and builds on four prior proposals with the same goal [v4] [v3] [v2] [v1].

Table of Contents

1. Remove LocalMediaStream interface

In this proposal, the derived LocalMediaStream interface is removed. Rather than returning a LocalMediaStream instance in the NavigatorUserMediaSuccessCallback, a vanilla MediaStream object is returned. The primary difference is in the tracks contained in that MediaStream object.

1.1 Rationale

The LocalMediaStream object currently extends MediaStream by adding a single method "stop()". In my prior proposals, this object was radically altered in order to facilitate several goals:
Provide a predictable home for developers to find and modify device settings
A previous proposal went out of its way to strongly associate LocalMediaStream objects with devices. This seemed like a good design because local device configuration is always on the local media stream. This made for a stable, dependable API surface for all local media stream instances (no guesswork).
Prevent track-list mutations
A previous proposal also removed the track lists on local media streams (resulting in some dramatic inheritance changes). Mutable tracks lists on LocalMediaStream objects seemed like the wrong design considering the current thinking that a getUserMedia request would only ever produce a LocalMediaStream with at most one audio or video track.

Some feedback even suggested re-considering the "at most one video/audio track per request to getUserMedia".

While thinking about these goals and the feedback, I began to consider a few things:

Device-centric tracks
With tracks supplemented with device-characteristics (duck-typing), the LocalMediaStream's stop() API was a convenience feature for stopping all tracks backed by a device on the LocalMediaStream object. With device- centric tracks a stop() API should be present on the tracks themselves.
Mutable track lists
Mutable track lists were not a desirable feature while I was locked into considering the LocalMediaStream as strongly associated with device-control. What is actually necessary, is that there is a something immutable associated with devices--that "something" doesn't necessarily need to be a LocalMediaStream or any MediaStream-like object at all! Once I unlocked that line of thinking, I began to experiment with the notion of a device list which then ultimately brought back a use-case for having mutable track lists for MediaStream objects. (It did not bring back a need for LocalMediaStream objects themselves though.)
Work flow for access to additional device streams
It is now understood that to request additional streams for different devices (e.g., the second camera on a dual-camera mobile phone), one must invoke getUserMedia a second time. In my prior proposal, this would result in a separate LocalMediaStream instance. At this point there are two LocalMediaStream objects each with their own devices. While this was nice for consistency of process, it was a challenge in terms of use of the objects with a MediaStream consumer like the Video tag.

To illustrate this challenge, consider how my prior proposal required a re-hookup of the MediaStream to a video tag consumer:

  1. First request to getUserMedia
  2. LocalMediaStream (1) obtained from success callback
  3. createObjectURL and preview in a video tag
  4. Second call to getUserMedia
  5. LocalMediaStream (2) obtained from success callback
  6. createObjectURL and preview in same video tag

Note that this process has to bind a completely new LocalMediaStream to the video tag a second time (if re-using the same video tag) only because the second LocalMediaStream object was different than the first.

It is much more efficient for developer code to simply add/remove tracks to a MediaStream that are relevant, without needing to change the consumer of the MediaStream.

Usage of getUserMedia for permission rather than for additional device access
The getUserMedia method is the gateway for permission to media. This proposal does not suggest changing that concept. It does suggest, however, that more information can be made available for discovery of additional devices within the approved "category" or "type" of media, and provide a way to obtain those additional devices without needing to go through the "permissions" route (i.e., getUserMedia).
Importance of restricting control to LocalMediaStream
Upon reflection of the feedback around the prior proposal, the relative importance of restricting control of the devices associated with tracks on the LocalMediaStream to only the LocalMediaStream did not seem as vital, insofar as the device-level access via the track is not directly available through a PeerConnection to a remote browser.

2. Media Stream Tracks

With changes to getUserMedia to support a synchronous API, this proposal enables developer code to directly create Media Stream Tracks. It also introduces the concept of the "new" readyState for tracks, a state which signals that the specified track is not connected to a source.

All tracks now have a source attribute, which is used to access a source object. The source object can be used to read additional settings about the source (content) of a track and to alter the source (content) of the track. This proposal describes local media device sources (cameras and microphones), and a skeleton description for a remote media device source (tracks that originate from a peer connection). Media device source objects are described in the next section.

Below is the new track hierarchy. It is somewhat simplified due to the exclusion of source objects:

2.1 Updating MediaStreamTrack

This section defines MediaStreamTrack in order to add the new "new" state and associated event handlers. The definition is otherwise identical to the current definition except that the defined constants are replaced by strings (using an enumerated type).

2.1.1 MediaStreamTrack interface

interface MediaStreamTrack : EventTarget {
             attribute DOMString           id;
    readonly attribute DOMString           kind;
    readonly attribute DOMString           label;
             attribute boolean             enabled;
    readonly attribute TrackReadyStateEnum readyState;
             attribute EventHandler        onstart;
             attribute EventHandler        onmute;
             attribute EventHandler        onunmute;
             attribute EventHandler        onended;
};
Attributes
id of type DOMString
Provides a mechanism for developers to identify this track and to reference it by getTrackById. (This is a preliminary definition, but is expected in the latest editor's draft soon.)
kind of type DOMString, readonly
See kind definition in the current editor's draft.
label of type DOMString, readonly
See label definition in the current editor's draft.
enabled of type boolean
See enabled definition in the current editor's draft.
readyState of type TrackReadyStateEnum, readonly
The track's current state. Tracks start off in the "new" state after being instantiated.

State transitions are as follows:

  • new -> live The user has approved access to this track and a media device source is now attached and streaming data.
  • new -> ended The user rejected this track (did not approve its use).
  • live -> muted The source is temporarily suspended (cannot provide streaming data).
  • live -> ended The stream has ended (for various reasons).
  • muted -> live The stream has resumed.
  • muted -> ended The stream was suspended and will no longer be able to provide any further data.
onstart of type EventHandler
Event handler for the start event. The start event is fired when this track transitions from the "new" state to the "live" state.
Issue 1

Issue: When working with multiple "new" tracks, I found that I wanted to have a more centralized place to be notified when getUserMedia would activate all the tracks in a media stream. Perhaps there's a convenience handler somewhere else, for example on the MediaStream? There's some work flows to consider here before landing a final design...

onmute of type EventHandler
See onmute definition in the current editor's draft.
onunmute of type EventHandler
See onunmute definition in the current editor's draft.
onended of type EventHandler
See onended definition in the current editor's draft.

2.1.2 TrackReadyStateEnum enumeration

enum TrackReadyStateEnum {
    "new",
    "live",
    "muted",
    "ended"
};
Enumeration description
newThe track type is new and has not been initialized (connected to a source of any kind). This state implies that the track's label will be the empty string.
liveSee the definition of the LIVE constant in the current editor's draft.
mutedSee the definition of the MUTED constant in the current editor's draft.
endedSee the definition of the ENDED constant in the current editor's draft.

2.2 Creating Derived Tracks

MediaStreamTrack objects cannot be instantiated directly. To create an instance of a MediaStreamTrack, one of its derived track types may be instantiated directly. These derived types are defined in this section. Each of these track types has general IDL attributes specific to all tracks of the given type as well as a mechanism to obtain the device object that is providing the source for this track.

Note

Note: I'm intentionally keeping these interfaces as sparse as possible. Features of the video/audio tracks that are settings (generally mutable) have been moved to the track's device source instead.

It's important to note that the camera's green light doesn't come on when a new track is created; nor does the user get prompted to enable the camera/microphone. Those actions only happen after the developer has requested that a media stream containing "new" tracks be bound to a source via getUserMedia. Until that point tracks are inert.

2.2.1 VideoStreamTrack interface

VideoStreamTrack objects are of kind "video".

Note

Example: VideoStreamTrack objects are instantiated in JavaScript using the new operator:
new VideoStreamTrack();

Issue 2

Issue: It's been suggested that these track constructors can take optional constraints. Such constraints could be stored until activated by getUserMedia to help the user select what characteristics to use when pre-configuring a device source and attaching it to the track. These are not added in this proposal, but could be added later-on.

[Constructor]
interface VideoStreamTrack : MediaStreamTrack {
    readonly attribute VideoFacingEnum    facing;
    readonly attribute VideoStreamSource? source;
};
Attributes
facing of type VideoFacingEnum, readonly
From the user's perspective, this attribute describes whether this camera is pointed toward the user ("user") or away from the user ("environment"). If this information cannot be reliably obtained, for example from a USB external camera, or if the VideoStreamTrack's readyState is "new", the value "unknown" is returned.
source of type VideoStreamSource, readonly, nullable
Returns the VideoStreamSource object providing the source for this track (if available). A VideoStreamSource may be a camera, a peer connection, or a local image or video file. Some VideoStreamTrack sources may not expose a VideoStreamSource object, in which case this property must return null. When a VideoStreamTrack is first created, and while it remains in the "new" state, the source attribute must return null.

2.2.2 AudioStreamTrack interface

AudioStreamTrack objects are of kind "audio".

Note

Example: AudioStreamTrack objects are instantiated in JavaScript using the new operator:
new AudioStreamTrack();

[Constructor]
interface AudioStreamTrack : MediaStreamTrack {
    readonly attribute unsigned long      level;
    readonly attribute AudioStreamSource? source;
};
Attributes
level of type unsigned long, readonly
The current level of audio that the microphone is picking up at this moment (if this is an AudioDeviceTrack), or the current level of audio flowing through the track (generally) otherwise. Will return 0 if this track is in the "new" state. The relative strength (amplitude) of the level is proportional to the gain of the audio source device (e.g., to increase the pick-up of the microphone, increase the gain setting).
source of type AudioStreamSource, readonly, nullable
Returns the AudioStreamSource object providing the source for this track (if available). An AudioStreamSource may be provided by a microphone, a peer connection, or a local audio file. Some AudioStreamTrack sources may not expose an AudioStreamSource object, in which case this property must return null. When an AudioStreamTrack is first created, and while it remains in the "new" state, the source attribute must return null.

3. Media Stream Sources

VideoStreamSource and AudioStreamSource objects are instantiated by the user agent to represent a source that is providing the media for a MediaStreamTrack. The association of a source object with a media track occurs asynchronously after permission for use of the track has been requested by getUserMedia. When the user agent has attached the source of a MediaStreamTrack, the source object can be accessed via that track's source attribute.

Note

Note: Some MediaStreamTracks may not provide a source object; for example, if the source is coming from an encrypted media source, or a local file source.

Issue 3

Issue: Need to define whether source objects are singletons. For example, if one track adds an expando property onto a source object, will another track that has that same source see the expando on its source object?

3.1 Local Video and Audio Sources

VideoStreamSource and AudioStreamSource objects are created by the user agent to represent a camera or microphone device/source for which the source attributes can be inspected and/or changed. At the moment these are limited to local cameras, local microphones, and peer connection sources, but additional sources can be defined later (such a local file system sources for images or audio files).

3.1.1 VideoStreamSource interface

interface VideoStreamSource : EventTarget {
    readonly attribute unsigned long          width;
    readonly attribute unsigned long          height;
    readonly attribute float                  frameRate;
    readonly attribute VideoRotationEnum      rotation;
    readonly attribute VideoMirrorEnum        mirror;
    readonly attribute float                  zoom;
    readonly attribute VideoFocusModeEnum     focusMode;
    readonly attribute VideoFillLightModeEnum fillLightMode;
    void                 stop ();
    static unsigned long getNumDevices ();
};
Attributes
width of type unsigned long, readonly
The "natural" width (in pixels) of the source of the video flowing through the track. For cameras implementing this interface, this value represents the current setting of the camera's sensor (in terms of number of pixels). This value is independent of the camera's rotation (if the camera's rotation setting is changed, it does not impact this value). For example, consider a camera setting with width of 1024 pixels and height of 768 pixels. If the camera's rotation setting is changed by 90 degrees, the width is still reported as 1024 pixels. However, a <video> element sink used to preview this track would report a width of 768 pixels (the effective width with rotation factored in).
height of type unsigned long, readonly
The "natural" height (in pixels) of the video provided by this source. See the "width" attribute for additional info.
frameRate of type float, readonly
The expected frames per second rate of video provided by this source.
rotation of type VideoRotationEnum, readonly
The current rotation value in use by the camera. If not supported, the property must be initialized to "0".
mirror of type VideoMirrorEnum, readonly
The current image mirroring behavior being applied. If not supported, the property must be initialized to "none".
zoom of type float, readonly
The current zoom scale value in use by the camera. If not supported this property will always return 1.0.
focusMode of type VideoFocusModeEnum, readonly
The camera's current focusMode state. The initial/default value is "auto".
fillLightMode of type VideoFillLightModeEnum, readonly
The camera's current fill light/flash mode.
Methods
stop
Stops this source, which will cause the related track to enter the ended state. Same behavior of the old LocalMediaStream's stop API, but only affects this track source.
No parameters.
Return type: void
getNumDevices, static
Returns the number of video sources that are currently available in this UA. As a static method, this information can be queried without instantiating any VideoStreamTrack or VideoStreamSource objects or without calling getUserMedia.
Issue 4

Issue: This information deliberately adds to the fingerprinting surface of the UA. However, this information could also be obtained via other round-about techniques using getUserMedia. This editor deems it worthwhile directly providing this data as it seems important for determining whether multiple devices of this type are available.

Issue 5

Issue: The ability to be notified when new devices become available has been dropped from this proposal (it was available in v4 via the DeviceList object).

No parameters.
Return type: unsigned long

3.1.2 VideoFacingEnum enumeration

enum VideoFacingEnum {
    "unknown",
    "user",
    "environment"
};
Enumeration description
unknownThe relative directionality of the camera cannot be determined by the user agent based on the hardware.
userThe camera is facing toward the user (a self-view camera).
environmentThe camera is facing away from the user (viewing the environment).

3.1.3 VideoRotationEnum enumeration

enum VideoRotationEnum {
    "0",
    "90",
    "180",
    "270"
};
Enumeration description
0No effective rotation applied (default value if no rotation is supported by the device software).
90A rotation of 90 degrees counter-clockwise (270 degrees in a clockwise rotation).
180A rotation of 180 degrees.
270A rotation of 270 degrees counter-clockwise (90 degrees in a clockwise rotation).

3.1.4 VideoMirrorEnum enumeration

enum VideoMirrorEnum {
    "none",
    "horizontal",
    "vertical"
};
Enumeration description
noneNo effective mirroring is being applied (default value if no mirroring is supported by the device software).
horizontalThe image is mirrored along the camera's width value. This setting does not consider the camera's current rotation, so if a 90 degree rotation was also applied to this source, then the "horizontal" mirroring would appear to be a vertical mirroring in a given sink.
verticalThe image is mirrored along the camera's height value. This setting does not consider the camera's current rotation, so if a 90 degree rotation was also applied to this source, then the "vertical" mirroring would appear to be a horizontal mirroring in a given sink.

3.1.5 VideoFocusModeEnum enumeration

enum VideoFocusModeEnum {
    "notavailable",
    "auto",
    "manual"
};
Enumeration description
notavailableThis camera does not have an option to change focus modes.
autoThe camera auto-focuses.
manualThe camera must be manually focused.

3.1.6 VideoFillLightModeEnum enumeration

enum VideoFillLightModeEnum {
    "notavailable",
    "auto",
    "off",
    "flash",
    "on"
};
Enumeration description
notavailableThis video device does not have an option to change fill light modes (e.g., the camera does not have a flash).
autoThe video device's fill light will be enabled when required (typically low light conditions). Otherwise it will be off. Note that auto does not guarantee that a flash will fire when takePicture is called. Use flash to guarantee firing of the flash for the takePicture API. auto is the initial value.
offThe video device's fill light and/or flash will not be used.
flashIf the video device is a camera supporting high-resolution photo-mode, this setting will always cause the flash to fire for the takePicture API. Otherwise, if the video device does not support this mode, this value is equivalent to auto.
onThe video device's fill light will be turned on (and remain on) until this setting is changed again, or the underlying track object has ended.

3.1.7 AudioStreamSource interface

interface AudioStreamSource : EventTarget {
    readonly attribute unsigned long gain;
    void                 stop ();
    static unsigned long getNumDevices ();
};
Attributes
gain of type unsigned long, readonly
The sensitivity of the microphone. This value must be a whole number between 0 and 100 inclusive. The gain value establishes the maximum threshold of the the microphone's sensitivity. When set to 0, the microphone is essentially off (it will not be able to pick-up any sound). A value of 100 means the microphone is configured for it's maximum gain/sensitivity. When first initialized for this track, the gain value should be set to 50, the initial value.
Methods
stop
Causes this track to enter the ended state. Same behavior of the old LocalMediaStream's stop API, but only for this track source.
No parameters.
Return type: void
getNumDevices, static
Returns the number of potential audio sources that are available in this UA. As a static method, this information can be queried without instantiating any AudioStreamTrack or AudioStreamSource objects or without calling getUserMedia.
Issue 6

Issue: This information deliberately adds to the fingerprinting surface of the UA. However, this information can also be obtained by other round-about techniques using getUserMedia, and is important for determining whether multiple devices of this type are available.

Issue 7

Issue: The ability to be notified when new devices become available has been dropped from this proposal (it was available in v4 via the DeviceList object).

No parameters.
Return type: unsigned long

3.2 Camera sources with "high-resolution picture" modes

The PictureStreamSource derived interface is created by the user agent if the camera source providing the VideoStreamSource supports an optional "high-resolution picture mode" with picture settings that are separate from those of its basic video source (which is usually considered its preview mode).

The PictureStreamSource object presents a set of capabilities and controls for taking high-resolution pictures. The unique settings of this object are only applied at the time when the takePicture API is invoked.

3.2.1 PictureStreamSource interface

interface PictureStreamSource : VideoStreamSource {
    readonly attribute unsigned long photoWidth;
    readonly attribute unsigned long photoHeight;
    void takePicture ();
             attribute EventHandler  onpicture;
             attribute EventHandler  onpictureerror;
};
Attributes
photoWidth of type unsigned long, readonly
The width (in pixels) of the configured high-resolution photo-mode's sensor.
photoHeight of type unsigned long, readonly
The height (in pixels) of the configured high-resolution photo-mode's sensor.
onpicture of type EventHandler
Register/unregister for "picture" events. The handler should expect to get a BlobEvent object as its first parameter.
Note

The BlobEvent returns a picture (as a Blob) in a compressed format (for example: PNG/JPEG) rather than a raw ImageData object due to the expected large, uncompressed size of the resulting pictures.

Issue 9

This Event type (BlobEvent) should be the same thing used in the recording proposal.

onpictureerror of type EventHandler
In the event of an error taking the picture, a "pictureerror" event will be dispatched instead of a "picture" event. The "pictureerror" is a simple event of type Event.
Methods
takePicture
Temporarily (asynchronously) switches the camera into "high resolution picture mode", applies the settings that are unique to this object to the stream (switches the width/height to those of the photoWidth/photoHeight), and records/encodes an image (using a user-agent determined format) into a Blob object. Finally, queues a task to fire a "picture" event containing the resulting picture Blob instance.
Issue 8

Issue: We could consider providing a hint or setting for the desired picture format?

No parameters.
Return type: void

3.2.2 BlobEvent interface

[Constructor(DOMString type, optional BlobEventInit blobInitDict)]
interface BlobEvent : Event {
    readonly attribute Blob data;
};
Attributes
data of type Blob, readonly
Returns a Blob object whose type attribute indicates the encoding of the blob data. An implementation must return a Blob in a format that is capable of being viewed in an HTML <img> tag.

3.2.3 BlobEventInit dictionary

dictionary BlobEventInit : EventInit {
    Blob data;
};
Dictionary BlobEventInit Members
data of type Blob
A Blob object containing the data to deliver via this event.

3.3 Remote Media Sources

When MediaStreams are transmitted over the network by way of a peer connection, the tracks that are created on the remote side of the MediaStream will have a remote media source attached as the track's source. This source object allows remote-consumers of the MediaStream's tracks to request specific changes to the tracks. These change requests will be serviced by the RTCPeerConnection source object which is streaming the media over the network.

3.3.1 VideoStreamRemoteSource interface

interface VideoStreamRemoteSource : EventTarget {
    readonly attribute unsigned long width;
    readonly attribute unsigned long height;
    readonly attribute float         frameRate;
    readonly attribute float         bitRate;
};
Attributes
width of type unsigned long, readonly
The current video transmission width.
height of type unsigned long, readonly
The current video transmission height.
frameRate of type float, readonly
The current video frames-per-second.
bitRate of type float, readonly
The current video bitRate.

3.3.2 AudioStreamRemoteSource interface

interface AudioStreamRemoteSource : EventTarget {
    readonly attribute float bitRate;
};
Attributes
bitRate of type float, readonly
The current video bitRate.

3.4 Other Settings (out-of-scope in this proposal)

The following settings have been proposed, but are not included in this version to keep the initial set of settings scoped to those that:

  1. cannot be easily computed in post-processing
  2. are not redundant with other settings
  3. are settings found in nearly all devices (common)
  4. can be easily tested for conformance

Each setting also includes a brief explanatory rationale for why it's not included:

  1. horizontalAspectRatio - easily calculated based on width/height in the dimension values
  2. verticalAspectRatio - see horizontalAspectRatio explanation
  3. orientation - can be easily calculated based on the width/height values and the current rotation
  4. aperatureSize - while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
  5. shutterSpeed - see aperatureSize explanation
  6. denoise - may require specification of the algorithm processing or related image processing filter required to implement.
  7. effects - sounds like a v2 or independent feature (depending on the effect).
  8. faceDetection - sounds like a v2 feature. Can also be done using post-processing techniques (though perhaps not as fast...)
  9. antiShake - sounds like a v2 feature.
  10. geoTagging - this can be independently associated with a recorded picture/video/audio clip using the Geolocation API. Automatically hooking up Geolocation to Media Capture sounds like an exercise for v2 given the possible complications.
  11. highDynamicRange - not sure how this can be specified, or if this is just a v2 feature.
  12. skintoneEnhancement - not a particularly common setting.
  13. shutterSound - Can be accomplished by syncing custom audio playback via the <audio> tag if desired. By default, there will be no sound issued.
  14. redEyeReduction - photo-specific setting. (Could be considered if photo-specific settings are introduced.)
  15. meteringMode - photo-specific setting. (Could be considered if photo-specific settings are introduced.)
  16. iso - photo-specific setting. while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
  17. sceneMode - while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
  18. antiFlicker - not a particularly common setting.
  19. zeroShutterLag - this seems more like a hope than a setting. I'd rather just have implementations make the shutter snap as quickly as possible after takePicture, rather than requiring an opt-in/opt-out for this setting.

The following settings may be included by working group decision:

  1. exposure
  2. exposureCompensation (is this the same as exposure?)
  3. autoExposureMode
  4. brightness
  5. contrast
  6. saturation
  7. sharpness
  8. evShift
  9. whiteBalance

4. Changing Stream Source Settings

This proposal simplifies the application of settings over the previous proposal and unifies the setting names with the constraint names and syntax. This unification allows developers to use the same syntax when defining constraints as well as with the settings APIs.

The settings for each track (if available) are all conveniently located on the source object for the respective track. Each setting is defined by a readonly attribute (for example the "width" attribute) which serves as a feature-detection for the given setting, as well as the current value of the setting at any point in time. The constraint name for each setting are the same as the name of the readonly attribute. For example, "photoWidth" is both the name of the setting as well as the name of the constraint. All of the constraints defined in this proposal are listed later.

Reading the current settings are as simple as reading the readonly attribute of the same name. Each setting also has a range of appropriate values (its capabilities), either enumerated values or a range continuum--these are the same ranges/enumerated values that may be used when expressing constraints for the given setting. Retrieving the capabilities of a given setting is done via a getRange API on each source object. Similarly, requesting a change to a setting is done via a set API on each source object. Finally, for symmetry a get method is also defined which reports the current value of any setting.

As noted in prior proposals, camera/microphone settings must be applied asynchronously to ensure that web applications can remain responsive for all device types that may not respond quickly to setting changes. This is especially true for settings communications over a peer connection.

4.1 Expectations around changing settings

Browsers provide a media pipeline from sources to sinks. In a browser, sinks are the <img>, <video> and <audio> tags. Traditional sources include streamed content, files and web resources. The media produced by these sources typically does not change over time - these sources can be considered to be static.

The sinks that display these sources to the user (the actual tags themselves) have a variety of controls for manipulating the source content. For example, an <img> tag scales down a huge source image of 1600x1200 pixels to fit in a rectangle defined with width="400" and height="300".

The getUserMedia API adds dynamic sources such as microphones and cameras - the characteristics of these sources can change in response to application needs. These sources can be considered to be dynamic in nature. A <video> element that displays media from a dynamic source can either perform scaling or it can feed back information along the media pipeline and have the source produce content more suitable for display.

Note

Note: This sort of feedback loop is obviously just enabling an "optimization", but it's a non-trivial gain. This optimization can save battery, allow for less network congestion, etc...

This proposal assumes that MediaStream sinks (such as <video>, <audio>, and even RTCPeerConnection) will continue to have mechanisms to further transform the source stream beyond that which the settings described in this proposal offer. (The sink transformation options, including those of RTCPeerConnection are outside the scope of this proposal.)

The act of changing a setting on a stream's source will, by definition, affect all down-level sinks that are using that source. Many sinks may be able to take these changes in stride, such as the <video> element or RTCPeerConnection. Others like the Recorder API may fail as a result of a source change.

The RTCPeerConnection is an interesting object because it acts simultaneously as both a sink and a source for over-the-network streams. As a sink, it has source transformational capabilities (e.g., lowering bit-rates, scaling-up or down resolutions, adjusting frame-rates), and as a source it may have its own settings changed by a track source that it provides (in this proposal, such sources are the VideoStreamRemoteSource and AudioStreamRemoteSource objects).

To illustrate how changes to a given source impact various sinks, consider the following example. This example only uses width and height, but the same principles apply to any of the settings exposed in this proposal. In the first figure a home client has obtained a video source from its local video camera. The source device's width and height are 800 pixels by 600 pixels, respectively. Three MediaStream objects on the home client contain tracks that use this same source. The three media streams are connected to three different sinks, a <video> element (A), another <video> element (B), and a peer connection (C). The peer connection is streaming the source video to an away client. On the away client there are two media streams with tracks that use the peer connection as a source. These two media streams are connected to two <video> element sinks (Y and Z).

Note that in the current state, all of the sinks on the home client must apply a transformation to the original source's dimensions. A is scaling the video up (resulting in loss of quality), B is scaling the video down, and C is also scaling the video up slightly for sending over the network. On the away client, sink Y is scaling the video way down, while sink Z is not applying any scaling.

Using the settings APIs defined in the next section, the home client's video source is changed to a higher resolution (1920 by 1200 pixels).

Note that the source change immediately effects all of the sinks on home client, but does not impact any of the sinks (or sources) on the away client. With the increase in the home client source video's dimensions, sink A no longer has to perform any scaling, while sink B must scale down even further than before. Sink C (the peer connection) must now scale down the video in order to keep the transmission constant to the away client.

While not shown, an equally valid settings change request could be made of the away client video source (the peer connection on the away client's side). This would not only impact sink Y and Z in the same manner as before, but would also cause re-negotiation with the peer connection on the home client in order to alter the transformation that it is applying to the home client's video source. Such a change would not change anything related to sink A or B or the home client's video source.

Note

Note: This proposal does not define a mechanism by which a change to the away client's video source could automatically trigger a change to the home client's video source. Implementations may choose to make such source-to-sink optimizations as long as they only do so within the constraints established by the application, as the next example describes.

It is fairly obvious that changes to a given source will impact sink consumers. However, in some situations changes to a given sink may also be cause for implementations to adjust the characteristics of a source's stream. This is illustrated in the following figures. In the first figure below, the home client's video source is sending a video stream sized at 1920 by 1200 pixels. The video source is also unconstrained, such that the exact source dimensions are flexible as far as the application is concerned. Two MediaStream objects contain tracks that use this same source, and those MediaStreams are connected to two different <video> element sinks A and B. Sink A has been sized to width="1920" and height="1200" and is displaying the sources video without any transformations. Sink B has been sized smaller and as a result, is scaling the video down to fit its rectangle of 320 pixels across by 200 pixels down.

When the application changes sink A to a smaller dimension (from 1920 to 1024 pixels wide and from 1200 to 768 pixels tall), the browser's media pipeline may recognize that none of its sinks require the higher source resolution, and needless work is being done both on the part of the source and on sink A. In such a case and without any other constraints forcing the source to continue producing the higher resolution video, the media pipeline may change the source resolution:

In the above figure, the home client's video source resolution was changed to the max(sinkA, sinkB) in order to optimize playback. While not shown above, the same behavior could apply to peer connections and other sinks.

4.2 StreamSourceSettings mix-in interface

VideoStreamSource implements StreamSourceSettings;
AudioStreamSource implements StreamSourceSettings;
VideoStreamRemoteSource implements StreamSourceSettings;
AudioStreamRemoteSource implements StreamSourceSettings;
[NoInterfaceObject]
interface StreamSourceSettings {
    (MediaSettingsRange or MediaSettingsList) getRange (DOMString settingName);
    any                                       get (DOMString settingName);
    void                                      set (MediaTrackConstraint setting, optional boolean isMandatory = false);
};

4.2.1 Methods

getRange

Each setting has an appropriate range of values. These may be either value ranges (a continuum of values) or enumerated values but not both. Value ranges include a min and max value, while enumerated values are provided as a list of values. Both types of setting ranges include an "initial" value, which is the value that is expected to be the source device's default value when it is acquired.

MediaSettingsRange objects are returned when a setting is not an enumerated type. This specification will indicate what the range of values must be for each setting. Given that implementations of various hardware may not exactly map to the same range, an implementation should make a reasonable attempt to translate and scale the hardware's setting onto the mapping provided by this specification. If this is not possible due to a hardware setting supporting (for example) fewer levels of granularity, then the implementation should make the device settings min value reflect the min value reported in this specification, and the same for the max value. Then for values in between the min and max, the implementation may round to the nearest supported value and report that value in the setting.

Note

For example, if the setting is fluxCapacitance, and has a specified range from -10 (min) to 10 (max) in this specification, but the implementation's fluxCapacitance hardware setting only supports values of "off" "medium" and "full", then -10 should be mapped to "off", 10 should map to "full", and 0 should map to "medium". A request to change the value to 3 should be rounded down to the closest supported setting (0).

MediaSettingsList objects should order their enumerated values from minimum to maximum where it makes sense, or in the order defined by the enumerated type where applicable.

Setting name Dictionary return type Notes
width MediaSettingsRange The range should span the video source's pre-set width values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
photoWidth MediaSettingsRange The range should span the video source's high-resolution photo-mode pre-set width values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
height MediaSettingsRange The range should span the video source's pre-set height values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
photoHeight MediaSettingsRange The range should span the video source's high-resolution photo-mode pre-set height values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
frameRate MediaSettingsRange The supported range of frame rates on the device. The type of the min/max/initial values are float.
rotation MediaSettingsList The available video rotation options on the source device. The type of the initial/values array is VideoRotationEnum (DOMString).
mirror MediaSettingsList The available video mirror options on the source device. The type of the initial/values array is VideoMirrorEnum (DOMString).
zoom MediaSettingsRange The supported zoom range on the device. The type of the min/max/initial values are float. The initial value is 1. The float value is a scale factor, for example 0.5 is zoomed out by double, while 2.0 is zoomed in by double. Requests should be rounded to the nearest supporting zoom factor by the implementation (when zoom is supported).
focusMode MediaSettingsList The available focus mode options on the source device. The type of the initial/values array is VideoFocusModeEnum (DOMString).
fillLightMode MediaSettingsList The available fill light mode options on the source device. The type of the initial/values array is VideoFillLightModeEnum (DOMString).
gain MediaSettingsRange The supported gain range on the device. The type of the min/max/initial values are unsigned long. The initial value is 50.
bitRate MediaSettingsRange The supported bit rate range on the device. The type of the min/max/initial values are float.
ParameterTypeNullableOptionalDescription
settingNameDOMString??The name of the setting for which the range of expected values should be returned
get
Returns the current value of a given setting. This is equivalent to reading the IDL attribute of the same name on the source object.
ParameterTypeNullableOptionalDescription
settingNameDOMString??The name of the setting for which the current value of that setting should be returned
Return type: any
set

The set API is the mechanism for asynchronously requesting that the source device change the value of a given setting. The API mirrors the syntax used for applying constraints. Generally, the set API will be used to apply specific values to a setting (such as setting the flashMode setting to a specific value), however ranges can also be applied using the same min/max syntax used in constraints (i.e., setting width to a range between 800 and 1200 pixels).

The set API queues requests until the conclusion of the micro-task after which all of the settings requests will be evaluated according to the constraint algorithm, and requests that can be honored will be applied to the source device. Any requests specified using the mandatory parameter that could not be applied must generate a settingserror event. All other non-mandatory requests that could not be applied do not cause any notification to be generated.

For all of the given settings that were changed as a result of a sequence of calls to the set API during a micro-task, one single settingschanged event will be generated containing the names of the settings that changed.

Note

Example: To change the video source's dimensions to any aspect ratio where the height is 768 pixels and the width is at least 300 pixels, would require two calls to set:
set({ width: { min: 300}}, true);
set({ height: 768}, true);

In each case where the setting/constraint does not take an enumerated value, the implementation should attempt to match the value onto the nearest supported value of the source device unless the mandatory flag is provided. In the case of mandatory requests, if the setting cannot be exactly supported as requested, then the setting must fail and generate a settingserror event. Regarding width/height values--if an implementation is able to scale the source video to match the requested mandatory constraints, this need not cause a settingserror (but the result may be weirdly proportioned video).

ParameterTypeNullableOptionalDescription
settingMediaTrackConstraint??A JavaScript object (dictionary) consisting of a single property which is the setting name to change, and whose value is either a primitive value (float/DOMString/etc), or another dictionary consisting of a min and/or max property and associated values.
falseboolean isMandatory =??A flag indicating whether this settings change request should be considered mandatory. If a value of true is provided, then should the settings change fail for some reason, a settingserror event will be raised. Otherwise, only settingschanged event will be dispatched for the settings that were successfully changed. The default, if this flag is not provided, is false
Return type: void

4.2.2 MediaSettingsRange dictionary

dictionary MediaSettingsRange {
    any max;
    any min;
    any initial;
};
Dictionary MediaSettingsRange Members
max of type any
The maximum value of this setting.

The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

min of type any
The minimum value of this setting.

The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

initial of type any
The initial value of this setting. When the object associated with this setting is first made available to the application, the current value of the setting should be set to the initial value. For example, in a browsing scenario, if one web site changes this setting and a subsequent web site gets access to this same setting, the setting should have been reset back to its initial value.

The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

4.2.3 MediaSettingsList dictionary

dictionary MediaSettingsList {
    sequence<any> values;
    any           initial;
};
Dictionary MediaSettingsList Members
values of type sequence<any>
An array of the values of the enumerated type for this setting. Items should be sorted from min (at index 0) to max where applicable, or in the order listed in the enumerated type otherwise.

The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

initial of type any
The initial value of this setting. When the object associated with this setting is first made available to the application, the current value of the setting should be set to the initial value. For example, in a browsing scenario, if one web site changes this setting and a subsequent web site gets access to this same setting, the setting should have been reset back to its initial value.

The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

4.3 Tracking the result of constraint application

4.3.1 MediaSettingsEventHandlers mix-in interface

AudioStreamSource implements MediaSettingsEventHandlers;
VideoStreamSource implements MediaSettingsEventHandlers;
AudioStreamRemoteSource implements MediaSettingsEventHandlers;
VideoStreamRemoteSource implements MediaSettingsEventHandlers;
[NoInterfaceObject]
interface MediaSettingsEventHandlers {
             attribute EventHandler onsettingserror;
             attribute EventHandler onsettingschanged;
};
Attributes
onsettingserror of type EventHandler
Register/unregister for "settingserror" events. The handler should expect to get a MediaSettingsEvent object as its first parameter. The event is fired asynchronously after settings change requests (using the set API have been made with at least one such request using the mandatory flag. The MediaSettingsEvent reports the name of the settings that could not be applied. The "settingschanged" event fires before the "settingserror" event (if any).
onsettingschanged of type EventHandler
Register/unregister for "settingschanged" events. The handler should expect to get a MediaSettingsEvent object as its first parameter. The event is fired asynchronously after the settings change requests are made and the settings have actually changed. The "settingschanged" event fires before the "settingserror" event (if any).

4.3.2 MediaSettingsEvent interface

[Constructor(DOMString type, optional MediaSettingsEventInit eventInitDict)]
interface MediaSettingsEvent : Event {
    readonly attribute DOMString[] settings;
};
Attributes
settings of type array of DOMString, readonly
A list of settings that failed or succeeded (depending on the event type).

4.3.3 MediaSettingsEventInit dictionary

dictionary MediaSettingsEventInit : EventInit {
    sequence<DOMString> settings;
};
Dictionary MediaSettingsEventInit Members
settings of type sequence<DOMString>
List of settings to populate into the MediaSettingsEvent object's settings readonly attribute.

5. Constraints Defined in this Proposal

This proposal defines several constraints for use with video and audio devices.

5.1 Video Constraints

The following constraints are applicable to video devices

5.1.1 VideoConstraints dictionary

dictionary VideoConstraints : MediaTrackConstraintSet {
    (unsigned long or MinMaxULongSubConstraint)                     width;
    (unsigned long or MinMaxULongSubConstraint)                     height;
    (unsigned long or MinMaxULongSubConstraint)                     photoWidth;
    (unsigned long or MinMaxULongSubConstraint)                     photoHeight;
    VideoRotationEnum      rotation;
    VideoMirrorEnum        mirror;
    (float or MinMaxFloatSubConstraint)                     zoom;
    VideoFocusModeEnum     focusMode;
    VideoFillLightModeEnum fillLightMode;
    (float or MinMaxFloatSubConstraint)                     frameRate;
    (float or MinMaxFloatSubConstraint)                     bitRate;
};
Dictionary VideoConstraints Members
width of type unsigned longMinMaxULongSubConstraint
A device that supports the desired width or width range.
height of type unsigned longMinMaxULongSubConstraint
A device that supports the desired height or height range.
photoWidth of type unsigned longMinMaxULongSubConstraint
A device that supports the desired width or width range for high-resolution photo-modes.
photoHeight of type unsigned longMinMaxULongSubConstraint
A device that supports the desired height or height range for high-resolution photo-modes.
rotation of type VideoRotationEnum
A device that supports the desired rotation.
mirror of type VideoMirrorEnum
A device that supports the desired mirroring.
zoom of type floatMinMaxFloatSubConstraint
A device that supports the desired zoom setting.
focusMode of type VideoFocusModeEnum
A device that supports the desired focus mode.
fillLightMode of type VideoFillLightModeEnum
A device that supports the desired fill light (flash) mode.
frameRate of type floatMinMaxFloatSubConstraint
A device that supports the desired frames per second.
bitRate of type floatMinMaxFloatSubConstraint
A device that supports the desired bit rate.

5.2 Audio Constraints

The following constraints are applicable to audio devices

5.2.1 AudioConstraints dictionary

dictionary AudioConstraints : MediaTrackConstraintSet {
    (unsigned long or MinMaxULongSubConstraint) gain;
};
Dictionary AudioConstraints Members
gain of type unsigned longMinMaxULongSubConstraint
A device that supports the desired gain or gain range.

5.3 Common sub-constraint structures

5.3.1 MinMaxULongSubConstraint dictionary

dictionary MinMaxULongSubConstraint {
    unsigned long max;
    unsigned long min;
};
Dictionary MinMaxULongSubConstraint Members
max of type unsigned long
unsigned long min
min of type unsigned long

5.3.2 MinMaxFloatSubConstraint dictionary

dictionary MinMaxFloatSubConstraint {
    float max;
    float min;
};
Dictionary MinMaxFloatSubConstraint Members
max of type float
float min
min of type float

6. Example usage scenarios

The following JavaScript examples demonstrate how the Settings APIs defined in this proposal could be used.

6.1 Getting access to a video and/or audio device (if available)

var audioTrack = (AudioStreamSource.getNumDevices() > 0) ? new AudioStreamTrack() : null;
if (audioTrack)
   audioTrack.onstart = mediaStarted;
var videoTrack = (VideoStreamSource.getNumDevices() > 0) ? new VideoStreamTrack() : null;
if (videoTrack)
   videoTrack.onstart = mediaStarted;
var MS = new MediaStream();
MS.addTrack(audioTrack);
MS.addTrack(videoTrack);
navigator.getUserMedia(MS);

function mediaStarted() {
   // One of the video/audio devices started.
}

6.2 Previewing the local video/audio in HTML5 video tag -- scenario is unchanged

function mediaStarted() {
   // objectURL technique
   document.querySelector("video").src = URL.createObjectURL(MS, { autoRevoke: true }); // autoRevoke is the default
   // direct-assign technique
   document.querySelector("video").srcObject = MS; // Proposed API at this time
}

6.3 Applying resolution constraints

function mediaStarted() {
   var videoDevice = videoTrack.source;
   var maxWidth = videoDevice.getRange("width").max;
   var maxHeight = videoDevice.getRange("height").max;
   // Check for 1080p+ support
   if ((maxWidth >= 1920) && (maxHeight >= 1080)) {
      // See if I need to change the current settings...
      if ((videoDevice.width < 1920) && (videoDevice.height < 1080)) {
         videoDevice.set({ width: maxWidth}, true);
         videoDevice.set({ height: maxHeight}, true);
         videoDevice.onsettingserror = failureToComply;
      }
   }
   else
      failureToComply();
}

function failureToComply(e) {
   if (e)
      console.error("Devices failed to change " + e.settings); // 'width' and/or 'height'
   else
      console.error("Device doesn't support at least 1080p");
}

6.4 Changing zoom in response to user input:

function mediaStarted() {
   setupRange( videoTrack.source );
}

function setupRange(videoDevice) {
   var zoomCaps = videoDevice.getRange("zoom");
   // Check to see if the device supports zooming...
   if (zoomCaps.min != zoomCaps.max) {
      // Set HTML5 range control to min/max values of zoom
      var zoomControl = document.querySelector("input[type=range]");
      zoomControl.min = zoomCaps.min;
      zoomControl.max = zoomCaps.max;
      zoomControl.value = videoDevice.zoom;
      zoomControl.onchange = applySettingChanges;
   }
}

function applySettingChanges(e) {
   videoTrack.source.set({ zoom: parseFloat(e.target.value)}, true);
}

6.5 Adding the local media tracks into a new media stream:

function mediaStarted() {
   return new MediaStream( [ videoTrack, audioTrack ]);
}

6.6 Take a picture, show the picture in an image tag:

function mediaStarted() {
   var videoDevice = videoTrack.source;
   // Check if this device supports a picture mode...
   if (videoDevice.takePicture) {
       videoDevice.onpicture = showPicture;
       // Turn on flash only for the snapshot...if available
       if (videoDevice.fillLightMode != "notavailable")
          videoDevice.set({ fillLightMode: "flash"}, true);
       else
          console.info("Flash not available");
       videoDevice.takePicture();
   }
}

function showPicture(e) {
   var img = document.querySelector("img");
   img.src = URL.createObjectURL(e.data);
}

6.7 Show a newly available device

Note

A newly available device occurs whenever an existing device that was being used by another application (with exclusive access) is relinquished and becomes available for this application to use. Of course, plugging-in a new device also causes a device to become available.

This scenario is not currently possible with this proposal.

6.8 Show all available video devices:

var totalVideoDevices = VideoStreamSource.getNumDevices();
var videoTracksList = [];
for (var i = 0; i < totalVideoDevices; i++) {
   var mediaStream = new MediaStream( new VideoStreamTrack() );
   // Create a video element and add it to the UI
   var videoTag = document.createElement('video');
   videoTag.srcObject = mediaStream;
   document.body.appendChild(videoTag);
   // Request to have the track connected to a source device (queue these up in the for-loop)
   navigator.getUserMedia(mediaStream);
}

7. Changes

This section documents the changes from the prior proposal:

  1. Separated out the Track-type hierarchy from V4 into Track types and Track sources. The sources largely don't use inheritance.
  2. Dropped the device list concept. Instead, simplified the ability to simply find out if there are multiple devices (via the static getNumDevices method).
  3. Made Video and AudioStreamTracks constructable (an idea for synchronous getUserMedia).
  4. PictureDeviceTrack renamed to PictureStreamSource and dropped the semantics of it being a track; it's now just a special type of device source that shares settings with its video source through inheritence.
  5. PictureEvent renamed to the more generic (and re-usable) BlobEvent. (I considered MessageEvent, but that has too many other unrelated properties, and ProgressEvents didn't have a place to expose the data.)
  6. Cleaned up the settings that were previously on Video and AudioStreamTrack. Moved them all to the device sources instead. The only non-settings that remain are AudioStreamTrack's level and VideoStreamTrack's facing values as these have no corresponding settings to change.
  7. Added a few new settings: gain to AudioStreamSource; mirror, photoWidth, and photoHeight to VideoStreamSource; bitRate to Audio/VideoStreamRemoteSource. Dropped dimension.
  8. The rotation setting was changed to an enumerated type.

8. Acknowledgements

I'd like to specially thank Anant Narayanan of Mozilla for collaborating on the new settings design, and EKR for his 2c. Also, thanks to Martin Thomson (Microsoft) for his comments and review.