Proposal: Media Capture and Streams Settings API v5

Abstract

This proposal describes additions and suggested changes to the Media Capture and Streams specification in order to support the goal of device settings retrieval and modification. This proposal incorporates feedback from the W3C TPAC 2012 event and builds on four prior proposals with the same goal [v4] [v3] [v2] [v1].

2. Media Stream Tracks

With changes to getUserMedia to support a synchronous API, this proposal enables developer code to directly create Media Stream Tracks. It also introduces the concept of the "new" readyState for tracks, a state which signals that the specified track is not connected to a source.

All tracks now have a source attribute, which is used to access a source object. The source object can be used to read additional settings about the source (content) of a track and to alter the source (content) of the track. This proposal describes local media device sources (cameras and microphones), and a skeleton description for a remote media device source (tracks that originate from a peer connection). Media device source objects are described in the next section.

Below is the new track hierarchy. It is somewhat simplified due to the exclusion of source objects:

MediaStreamTrack
- VideoStreamTrack
- AudioStreamTrack

2.1 Updating MediaStreamTrack

This section defines MediaStreamTrack in order to add the new "new" state and associated event handlers. The definition is otherwise identical to the current definition except that the defined constants are replaced by strings (using an enumerated type).

2.1.1 `MediaStreamTrack` interface

interface MediaStreamTrack : EventTarget {
             attribute DOMString           id;
    readonly attribute DOMString           kind;
    readonly attribute DOMString           label;
             attribute boolean             enabled;
    readonly attribute TrackReadyStateEnum readyState;
             attribute EventHandler        onstart;
             attribute EventHandler        onmute;
             attribute EventHandler        onunmute;
             attribute EventHandler        onended;
};

Attributes

id of type DOMString

Provides a mechanism for developers to identify this track and to reference it by getTrackById. (This is a preliminary definition, but is expected in the latest editor's draft soon.)

kind of type DOMString, readonly

See kind definition in the current editor's draft.

label of type DOMString, readonly

See label definition in the current editor's draft.

enabled of type boolean

See enabled definition in the current editor's draft.

readyState of type TrackReadyStateEnum, readonly

The track's current state. Tracks start off in the "new" state after being instantiated.

State transitions are as follows:

new -> live The user has approved access to this track and a media device source is now attached and streaming data.
new -> ended The user rejected this track (did not approve its use).
live -> muted The source is temporarily suspended (cannot provide streaming data).
live -> ended The stream has ended (for various reasons).
muted -> live The stream has resumed.
muted -> ended The stream was suspended and will no longer be able to provide any further data.

onstart of type EventHandler

Event handler for the start event. The start event is fired when this track transitions from the "new" state to the "live" state.

Issue 1

Issue: When working with multiple "new" tracks, I found that I wanted to have a more centralized place to be notified when getUserMedia would activate all the tracks in a media stream. Perhaps there's a convenience handler somewhere else, for example on the MediaStream? There's some work flows to consider here before landing a final design...

onmute of type EventHandler

See onmute definition in the current editor's draft.

onunmute of type EventHandler

See onunmute definition in the current editor's draft.

onended of type EventHandler

See onended definition in the current editor's draft.

2.1.2 TrackReadyStateEnum enumeration

enum TrackReadyStateEnum {
    "new",
    "live",
    "muted",
    "ended"
};

Enumeration description
`new`	The track type is new and has not been initialized (connected to a source of any kind). This state implies that the track's label will be the empty string.
`live`	See the definition of the LIVE constant in the current editor's draft.
`muted`	See the definition of the MUTED constant in the current editor's draft.
`ended`	See the definition of the ENDED constant in the current editor's draft.

2.2 Creating Derived Tracks

MediaStreamTrack objects cannot be instantiated directly. To create an instance of a MediaStreamTrack, one of its derived track types may be instantiated directly. These derived types are defined in this section. Each of these track types has general IDL attributes specific to all tracks of the given type as well as a mechanism to obtain the device object that is providing the source for this track.

Note

Note: I'm intentionally keeping these interfaces as sparse as possible. Features of the video/audio tracks that are settings (generally mutable) have been moved to the track's device source instead.

It's important to note that the camera's green light doesn't come on when a new track is created; nor does the user get prompted to enable the camera/microphone. Those actions only happen after the developer has requested that a media stream containing "new" tracks be bound to a source via getUserMedia. Until that point tracks are inert.

2.2.1 `VideoStreamTrack` interface

VideoStreamTrack objects are of kind "video".

Note

Example: VideoStreamTrack objects are instantiated in JavaScript using the new operator:
new VideoStreamTrack();

Issue 2

Issue: It's been suggested that these track constructors can take optional constraints. Such constraints could be stored until activated by getUserMedia to help the user select what characteristics to use when pre-configuring a device source and attaching it to the track. These are not added in this proposal, but could be added later-on.

[Constructor]
interface VideoStreamTrack : MediaStreamTrack {
    readonly attribute VideoFacingEnum    facing;
    readonly attribute VideoStreamSource? source;
};

Attributes

facing of type VideoFacingEnum, readonly: From the user's perspective, this attribute describes whether this camera is pointed toward the user ("user") or away from the user ("environment"). If this information cannot be reliably obtained, for example from a USB external camera, or if the VideoStreamTrack's readyState is "new", the value "unknown" is returned.
source of type VideoStreamSource, readonly, nullable: Returns the VideoStreamSource object providing the source for this track (if available). A VideoStreamSource may be a camera, a peer connection, or a local image or video file. Some VideoStreamTrack sources may not expose a VideoStreamSource object, in which case this property must return null. When a VideoStreamTrack is first created, and while it remains in the "new" state, the source attribute must return null.

2.2.2 `AudioStreamTrack` interface

AudioStreamTrack objects are of kind "audio".

Note

Example: AudioStreamTrack objects are instantiated in JavaScript using the new operator:
new AudioStreamTrack();

[Constructor]
interface AudioStreamTrack : MediaStreamTrack {
    readonly attribute unsigned long      level;
    readonly attribute AudioStreamSource? source;
};

Attributes

level of type unsigned long, readonly: The current level of audio that the microphone is picking up at this moment (if this is an AudioDeviceTrack), or the current level of audio flowing through the track (generally) otherwise. Will return 0 if this track is in the "new" state. The relative strength (amplitude) of the level is proportional to the gain of the audio source device (e.g., to increase the pick-up of the microphone, increase the gain setting).
source of type AudioStreamSource, readonly, nullable: Returns the AudioStreamSource object providing the source for this track (if available). An AudioStreamSource may be provided by a microphone, a peer connection, or a local audio file. Some AudioStreamTrack sources may not expose an AudioStreamSource object, in which case this property must return null. When an AudioStreamTrack is first created, and while it remains in the "new" state, the source attribute must return null.

3. Media Stream Sources

VideoStreamSource and AudioStreamSource objects are instantiated by the user agent to represent a source that is providing the media for a MediaStreamTrack. The association of a source object with a media track occurs asynchronously after permission for use of the track has been requested by getUserMedia. When the user agent has attached the source of a MediaStreamTrack, the source object can be accessed via that track's source attribute.

Note

Note: Some MediaStreamTracks may not provide a source object; for example, if the source is coming from an encrypted media source, or a local file source.

Issue 3

Issue: Need to define whether source objects are singletons. For example, if one track adds an expando property onto a source object, will another track that has that same source see the expando on its source object?

3.1 Local Video and Audio Sources

VideoStreamSource and AudioStreamSource objects are created by the user agent to represent a camera or microphone device/source for which the source attributes can be inspected and/or changed. At the moment these are limited to local cameras, local microphones, and peer connection sources, but additional sources can be defined later (such a local file system sources for images or audio files).

3.1.1 `VideoStreamSource` interface

interface VideoStreamSource : EventTarget {
    readonly attribute unsigned long          width;
    readonly attribute unsigned long          height;
    readonly attribute float                  frameRate;
    readonly attribute VideoRotationEnum      rotation;
    readonly attribute VideoMirrorEnum        mirror;
    readonly attribute float                  zoom;
    readonly attribute VideoFocusModeEnum     focusMode;
    readonly attribute VideoFillLightModeEnum fillLightMode;
    void                 stop ();
    static unsigned long getNumDevices ();
};

Attributes

width of type unsigned long, readonly: The "natural" width (in pixels) of the source of the video flowing through the track. For cameras implementing this interface, this value represents the current setting of the camera's sensor (in terms of number of pixels). This value is independent of the camera's rotation (if the camera's rotation setting is changed, it does not impact this value). For example, consider a camera setting with width of 1024 pixels and height of 768 pixels. If the camera's rotation setting is changed by 90 degrees, the width is still reported as 1024 pixels. However, a <video> element sink used to preview this track would report a width of 768 pixels (the effective width with rotation factored in).
height of type unsigned long, readonly: The "natural" height (in pixels) of the video provided by this source. See the "width" attribute for additional info.
frameRate of type float, readonly: The expected frames per second rate of video provided by this source.
rotation of type VideoRotationEnum, readonly: The current rotation value in use by the camera. If not supported, the property must be initialized to "0".
mirror of type VideoMirrorEnum, readonly: The current image mirroring behavior being applied. If not supported, the property must be initialized to "none".
zoom of type float, readonly: The current zoom scale value in use by the camera. If not supported this property will always return 1.0.
focusMode of type VideoFocusModeEnum, readonly: The camera's current focusMode state. The initial/default value is "auto".
fillLightMode of type VideoFillLightModeEnum, readonly: The camera's current fill light/flash mode.

Methods

stop: Stops this source, which will cause the related track to enter the ended state. Same behavior of the old LocalMediaStream's stop API, but only affects this track source.
No parameters.
Return type: void
getNumDevices, static: Returns the number of video sources that are currently available in this UA. As a static method, this information can be queried without instantiating any VideoStreamTrack or VideoStreamSource objects or without calling getUserMedia.
Issue 4
Issue: This information deliberately adds to the fingerprinting surface of the UA. However, this information could also be obtained via other round-about techniques using getUserMedia. This editor deems it worthwhile directly providing this data as it seems important for determining whether multiple devices of this type are available.

Issue 5
Issue: The ability to be notified when new devices become available has been dropped from this proposal (it was available in v4 via the DeviceList object).

No parameters.
Return type: unsigned long

3.1.2 VideoFacingEnum enumeration

enum VideoFacingEnum {
    "unknown",
    "user",
    "environment"
};

Enumeration description
`unknown`	The relative directionality of the camera cannot be determined by the user agent based on the hardware.
`user`	The camera is facing toward the user (a self-view camera).
`environment`	The camera is facing away from the user (viewing the environment).

3.1.3 VideoRotationEnum enumeration

enum VideoRotationEnum {
    "0",
    "90",
    "180",
    "270"
};

Enumeration description
`0`	No effective rotation applied (default value if no rotation is supported by the device software).
`90`	A rotation of 90 degrees counter-clockwise (270 degrees in a clockwise rotation).
`180`	A rotation of 180 degrees.
`270`	A rotation of 270 degrees counter-clockwise (90 degrees in a clockwise rotation).

3.1.4 VideoMirrorEnum enumeration

enum VideoMirrorEnum {
    "none",
    "horizontal",
    "vertical"
};

Enumeration description
`none`	No effective mirroring is being applied (default value if no mirroring is supported by the device software).
`horizontal`	The image is mirrored along the camera's width value. This setting does not consider the camera's current rotation, so if a 90 degree rotation was also applied to this source, then the "horizontal" mirroring would appear to be a vertical mirroring in a given sink.
`vertical`	The image is mirrored along the camera's height value. This setting does not consider the camera's current rotation, so if a 90 degree rotation was also applied to this source, then the "vertical" mirroring would appear to be a horizontal mirroring in a given sink.

3.1.5 VideoFocusModeEnum enumeration

enum VideoFocusModeEnum {
    "notavailable",
    "auto",
    "manual"
};

Enumeration description
`notavailable`	This camera does not have an option to change focus modes.
`auto`	The camera auto-focuses.
`manual`	The camera must be manually focused.

3.1.6 VideoFillLightModeEnum enumeration

enum VideoFillLightModeEnum {
    "notavailable",
    "auto",
    "off",
    "flash",
    "on"
};

Enumeration description
`notavailable`	This video device does not have an option to change fill light modes (e.g., the camera does not have a flash).
`auto`	The video device's fill light will be enabled when required (typically low light conditions). Otherwise it will be off. Note that `auto` does not guarantee that a flash will fire when `takePicture` is called. Use `flash` to guarantee firing of the flash for the `takePicture` API. `auto` is the initial value.
`off`	The video device's fill light and/or flash will not be used.
`flash`	If the video device is a camera supporting high-resolution photo-mode, this setting will always cause the flash to fire for the `takePicture` API. Otherwise, if the video device does not support this mode, this value is equivalent to `auto`.
`on`	The video device's fill light will be turned on (and remain on) until this setting is changed again, or the underlying track object has ended.

3.1.7 `AudioStreamSource` interface

interface AudioStreamSource : EventTarget {
    readonly attribute unsigned long gain;
    void                 stop ();
    static unsigned long getNumDevices ();
};

Attributes

gain of type unsigned long, readonly: The sensitivity of the microphone. This value must be a whole number between 0 and 100 inclusive. The gain value establishes the maximum threshold of the the microphone's sensitivity. When set to 0, the microphone is essentially off (it will not be able to pick-up any sound). A value of 100 means the microphone is configured for it's maximum gain/sensitivity. When first initialized for this track, the gain value should be set to 50, the initial value.

Methods

stop: Causes this track to enter the ended state. Same behavior of the old LocalMediaStream's stop API, but only for this track source.
No parameters.
Return type: void
getNumDevices, static: Returns the number of potential audio sources that are available in this UA. As a static method, this information can be queried without instantiating any AudioStreamTrack or AudioStreamSource objects or without calling getUserMedia.
Issue 6
Issue: This information deliberately adds to the fingerprinting surface of the UA. However, this information can also be obtained by other round-about techniques using getUserMedia, and is important for determining whether multiple devices of this type are available.

Issue 7
Issue: The ability to be notified when new devices become available has been dropped from this proposal (it was available in v4 via the DeviceList object).

No parameters.
Return type: unsigned long

3.2 Camera sources with "high-resolution picture" modes

The PictureStreamSource derived interface is created by the user agent if the camera source providing the VideoStreamSource supports an optional "high-resolution picture mode" with picture settings that are separate from those of its basic video source (which is usually considered its preview mode).

The PictureStreamSource object presents a set of capabilities and controls for taking high-resolution pictures. The unique settings of this object are only applied at the time when the takePicture API is invoked.

3.2.1 `PictureStreamSource` interface

interface PictureStreamSource : VideoStreamSource {
    readonly attribute unsigned long photoWidth;
    readonly attribute unsigned long photoHeight;
    void takePicture ();
             attribute EventHandler  onpicture;
             attribute EventHandler  onpictureerror;
};

Attributes

photoWidth of type unsigned long, readonly: The width (in pixels) of the configured high-resolution photo-mode's sensor.
photoHeight of type unsigned long, readonly: The height (in pixels) of the configured high-resolution photo-mode's sensor.
onpicture of type EventHandler: Register/unregister for "picture" events. The handler should expect to get a BlobEvent object as its first parameter.
Note
The BlobEvent returns a picture (as a Blob) in a compressed format (for example: PNG/JPEG) rather than a raw ImageData object due to the expected large, uncompressed size of the resulting pictures.

Issue 9
This Event type (BlobEvent) should be the same thing used in the recording proposal.
onpictureerror of type EventHandler: In the event of an error taking the picture, a "pictureerror" event will be dispatched instead of a "picture" event. The "pictureerror" is a simple event of type Event.

Methods

takePicture: Temporarily (asynchronously) switches the camera into "high resolution picture mode", applies the settings that are unique to this object to the stream (switches the width/height to those of the photoWidth/photoHeight), and records/encodes an image (using a user-agent determined format) into a Blob object. Finally, queues a task to fire a "picture" event containing the resulting picture Blob instance.
Issue 8
Issue: We could consider providing a hint or setting for the desired picture format?

No parameters.
Return type: void

3.2.2 `BlobEvent` interface

[Constructor(DOMString type, optional BlobEventInit blobInitDict)]
interface BlobEvent : Event {
    readonly attribute Blob data;
};

Attributes

data of type Blob, readonly: Returns a Blob object whose type attribute indicates the encoding of the blob data. An implementation must return a Blob in a format that is capable of being viewed in an HTML <img> tag.

3.2.3 BlobEventInit dictionary

dictionary BlobEventInit : EventInit {
    Blob data;
};

Dictionary `BlobEventInit` Members

data of type Blob: A Blob object containing the data to deliver via this event.

3.3 Remote Media Sources

When MediaStreams are transmitted over the network by way of a peer connection, the tracks that are created on the remote side of the MediaStream will have a remote media source attached as the track's source. This source object allows remote-consumers of the MediaStream's tracks to request specific changes to the tracks. These change requests will be serviced by the RTCPeerConnection source object which is streaming the media over the network.

3.3.1 `VideoStreamRemoteSource` interface

interface VideoStreamRemoteSource : EventTarget {
    readonly attribute unsigned long width;
    readonly attribute unsigned long height;
    readonly attribute float         frameRate;
    readonly attribute float         bitRate;
};

Attributes

width of type unsigned long, readonly: The current video transmission width.
height of type unsigned long, readonly: The current video transmission height.
frameRate of type float, readonly: The current video frames-per-second.
bitRate of type float, readonly: The current video bitRate.

3.3.2 `AudioStreamRemoteSource` interface

interface AudioStreamRemoteSource : EventTarget {
    readonly attribute float bitRate;
};

Attributes

bitRate of type float, readonly: The current video bitRate.

3.4 Other Settings (out-of-scope in this proposal)

The following settings have been proposed, but are not included in this version to keep the initial set of settings scoped to those that:

cannot be easily computed in post-processing
are not redundant with other settings
are settings found in nearly all devices (common)
can be easily tested for conformance

Each setting also includes a brief explanatory rationale for why it's not included:

horizontalAspectRatio - easily calculated based on width/height in the dimension values
verticalAspectRatio - see horizontalAspectRatio explanation
orientation - can be easily calculated based on the width/height values and the current rotation
aperatureSize - while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
shutterSpeed - see aperatureSize explanation
denoise - may require specification of the algorithm processing or related image processing filter required to implement.
effects - sounds like a v2 or independent feature (depending on the effect).
faceDetection - sounds like a v2 feature. Can also be done using post-processing techniques (though perhaps not as fast...)
antiShake - sounds like a v2 feature.
geoTagging - this can be independently associated with a recorded picture/video/audio clip using the Geolocation API. Automatically hooking up Geolocation to Media Capture sounds like an exercise for v2 given the possible complications.
highDynamicRange - not sure how this can be specified, or if this is just a v2 feature.
skintoneEnhancement - not a particularly common setting.
shutterSound - Can be accomplished by syncing custom audio playback via the <audio> tag if desired. By default, there will be no sound issued.
redEyeReduction - photo-specific setting. (Could be considered if photo-specific settings are introduced.)
meteringMode - photo-specific setting. (Could be considered if photo-specific settings are introduced.)
iso - photo-specific setting. while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
sceneMode - while more common on digital cameras, not particularly common on webcams (major use-case for this feature)
antiFlicker - not a particularly common setting.
zeroShutterLag - this seems more like a hope than a setting. I'd rather just have implementations make the shutter snap as quickly as possible after takePicture, rather than requiring an opt-in/opt-out for this setting.

The following settings may be included by working group decision:

exposure
exposureCompensation (is this the same as exposure?)
autoExposureMode
brightness
contrast
saturation
sharpness
evShift
whiteBalance

4. Changing Stream Source Settings

This proposal simplifies the application of settings over the previous proposal and unifies the setting names with the constraint names and syntax. This unification allows developers to use the same syntax when defining constraints as well as with the settings APIs.

The settings for each track (if available) are all conveniently located on the source object for the respective track. Each setting is defined by a readonly attribute (for example the "width" attribute) which serves as a feature-detection for the given setting, as well as the current value of the setting at any point in time. The constraint name for each setting are the same as the name of the readonly attribute. For example, "photoWidth" is both the name of the setting as well as the name of the constraint. All of the constraints defined in this proposal are listed later.

Reading the current settings are as simple as reading the readonly attribute of the same name. Each setting also has a range of appropriate values (its capabilities), either enumerated values or a range continuum--these are the same ranges/enumerated values that may be used when expressing constraints for the given setting. Retrieving the capabilities of a given setting is done via a getRange API on each source object. Similarly, requesting a change to a setting is done via a set API on each source object. Finally, for symmetry a get method is also defined which reports the current value of any setting.

As noted in prior proposals, camera/microphone settings must be applied asynchronously to ensure that web applications can remain responsive for all device types that may not respond quickly to setting changes. This is especially true for settings communications over a peer connection.

4.1 Expectations around changing settings

Browsers provide a media pipeline from sources to sinks. In a browser, sinks are the <img>, <video> and <audio> tags. Traditional sources include streamed content, files and web resources. The media produced by these sources typically does not change over time - these sources can be considered to be static.

The sinks that display these sources to the user (the actual tags themselves) have a variety of controls for manipulating the source content. For example, an <img> tag scales down a huge source image of 1600x1200 pixels to fit in a rectangle defined with width="400" and height="300".

The getUserMedia API adds dynamic sources such as microphones and cameras - the characteristics of these sources can change in response to application needs. These sources can be considered to be dynamic in nature. A <video> element that displays media from a dynamic source can either perform scaling or it can feed back information along the media pipeline and have the source produce content more suitable for display.

Note

Note: This sort of feedback loop is obviously just enabling an "optimization", but it's a non-trivial gain. This optimization can save battery, allow for less network congestion, etc...

This proposal assumes that MediaStream sinks (such as <video>, <audio>, and even RTCPeerConnection) will continue to have mechanisms to further transform the source stream beyond that which the settings described in this proposal offer. (The sink transformation options, including those of RTCPeerConnection are outside the scope of this proposal.)

The act of changing a setting on a stream's source will, by definition, affect all down-level sinks that are using that source. Many sinks may be able to take these changes in stride, such as the <video> element or RTCPeerConnection. Others like the Recorder API may fail as a result of a source change.

The RTCPeerConnection is an interesting object because it acts simultaneously as both a sink and a source for over-the-network streams. As a sink, it has source transformational capabilities (e.g., lowering bit-rates, scaling-up or down resolutions, adjusting frame-rates), and as a source it may have its own settings changed by a track source that it provides (in this proposal, such sources are the VideoStreamRemoteSource and AudioStreamRemoteSource objects).

To illustrate how changes to a given source impact various sinks, consider the following example. This example only uses width and height, but the same principles apply to any of the settings exposed in this proposal. In the first figure a home client has obtained a video source from its local video camera. The source device's width and height are 800 pixels by 600 pixels, respectively. Three MediaStream objects on the home client contain tracks that use this same source. The three media streams are connected to three different sinks, a <video> element (A), another <video> element (B), and a peer connection (C). The peer connection is streaming the source video to an away client. On the away client there are two media streams with tracks that use the peer connection as a source. These two media streams are connected to two <video> element sinks (Y and Z).

Changing media stream source effects: before the requested change

Note that in the current state, all of the sinks on the home client must apply a transformation to the original source's dimensions. A is scaling the video up (resulting in loss of quality), B is scaling the video down, and C is also scaling the video up slightly for sending over the network. On the away client, sink Y is scaling the video way down, while sink Z is not applying any scaling.

Using the settings APIs defined in the next section, the home client's video source is changed to a higher resolution (1920 by 1200 pixels).

Changing media stream source effects: after the requested change

Note that the source change immediately effects all of the sinks on home client, but does not impact any of the sinks (or sources) on the away client. With the increase in the home client source video's dimensions, sink A no longer has to perform any scaling, while sink B must scale down even further than before. Sink C (the peer connection) must now scale down the video in order to keep the transmission constant to the away client.

While not shown, an equally valid settings change request could be made of the away client video source (the peer connection on the away client's side). This would not only impact sink Y and Z in the same manner as before, but would also cause re-negotiation with the peer connection on the home client in order to alter the transformation that it is applying to the home client's video source. Such a change would not change anything related to sink A or B or the home client's video source.

Note

Note: This proposal does not define a mechanism by which a change to the away client's video source could automatically trigger a change to the home client's video source. Implementations may choose to make such source-to-sink optimizations as long as they only do so within the constraints established by the application, as the next example describes.

It is fairly obvious that changes to a given source will impact sink consumers. However, in some situations changes to a given sink may also be cause for implementations to adjust the characteristics of a source's stream. This is illustrated in the following figures. In the first figure below, the home client's video source is sending a video stream sized at 1920 by 1200 pixels. The video source is also unconstrained, such that the exact source dimensions are flexible as far as the application is concerned. Two MediaStream objects contain tracks that use this same source, and those MediaStreams are connected to two different <video> element sinks A and B. Sink A has been sized to width="1920" and height="1200" and is displaying the sources video without any transformations. Sink B has been sized smaller and as a result, is scaling the video down to fit its rectangle of 320 pixels across by 200 pixels down.

Changing media stream sinks may affect sources: before the requested change

When the application changes sink A to a smaller dimension (from 1920 to 1024 pixels wide and from 1200 to 768 pixels tall), the browser's media pipeline may recognize that none of its sinks require the higher source resolution, and needless work is being done both on the part of the source and on sink A. In such a case and without any other constraints forcing the source to continue producing the higher resolution video, the media pipeline may change the source resolution:

Changing media stream sinks may affect sources: after the requested change

In the above figure, the home client's video source resolution was changed to the max(sinkA, sinkB) in order to optimize playback. While not shown above, the same behavior could apply to peer connections and other sinks.

4.2 `StreamSourceSettings` mix-in interface

VideoStreamSource implements StreamSourceSettings;

AudioStreamSource implements StreamSourceSettings;

VideoStreamRemoteSource implements StreamSourceSettings;

AudioStreamRemoteSource implements StreamSourceSettings;

[NoInterfaceObject]
interface StreamSourceSettings {
    (MediaSettingsRange or MediaSettingsList) getRange (DOMString settingName);
    any                                       get (DOMString settingName);
    void                                      set (MediaTrackConstraint setting, optional boolean isMandatory = false);
};

4.2.1 Methods

getRange

Each setting has an appropriate range of values. These may be either value ranges (a continuum of values) or enumerated values but not both. Value ranges include a min and max value, while enumerated values are provided as a list of values. Both types of setting ranges include an "initial" value, which is the value that is expected to be the source device's default value when it is acquired.

MediaSettingsRange objects are returned when a setting is not an enumerated type. This specification will indicate what the range of values must be for each setting. Given that implementations of various hardware may not exactly map to the same range, an implementation should make a reasonable attempt to translate and scale the hardware's setting onto the mapping provided by this specification. If this is not possible due to a hardware setting supporting (for example) fewer levels of granularity, then the implementation should make the device settings min value reflect the min value reported in this specification, and the same for the max value. Then for values in between the min and max, the implementation may round to the nearest supported value and report that value in the setting.

Note

For example, if the setting is fluxCapacitance, and has a specified range from -10 (min) to 10 (max) in this specification, but the implementation's fluxCapacitance hardware setting only supports values of "off" "medium" and "full", then -10 should be mapped to "off", 10 should map to "full", and 0 should map to "medium". A request to change the value to 3 should be rounded down to the closest supported setting (0).

MediaSettingsList objects should order their enumerated values from minimum to maximum where it makes sense, or in the order defined by the enumerated type where applicable.

Setting name	Dictionary return type	Notes
width	MediaSettingsRange	The range should span the video source's pre-set width values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
photoWidth	MediaSettingsRange	The range should span the video source's high-resolution photo-mode pre-set width values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
height	MediaSettingsRange	The range should span the video source's pre-set height values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
photoHeight	MediaSettingsRange	The range should span the video source's high-resolution photo-mode pre-set height values with min being the smallest width, and max the largest width. The type of the min/max/initial values are unsigned long.
frameRate	MediaSettingsRange	The supported range of frame rates on the device. The type of the min/max/initial values are float.
rotation	MediaSettingsList	The available video rotation options on the source device. The type of the initial/values array is VideoRotationEnum (DOMString).
mirror	MediaSettingsList	The available video mirror options on the source device. The type of the initial/values array is VideoMirrorEnum (DOMString).
zoom	MediaSettingsRange	The supported zoom range on the device. The type of the min/max/initial values are float. The initial value is 1. The float value is a scale factor, for example 0.5 is zoomed out by double, while 2.0 is zoomed in by double. Requests should be rounded to the nearest supporting zoom factor by the implementation (when zoom is supported).
focusMode	MediaSettingsList	The available focus mode options on the source device. The type of the initial/values array is VideoFocusModeEnum (DOMString).
fillLightMode	MediaSettingsList	The available fill light mode options on the source device. The type of the initial/values array is VideoFillLightModeEnum (DOMString).
gain	MediaSettingsRange	The supported gain range on the device. The type of the min/max/initial values are unsigned long. The initial value is 50.
bitRate	MediaSettingsRange	The supported bit rate range on the device. The type of the min/max/initial values are float.

Parameter	Type	Nullable	Optional	Description
settingName	`DOMString`	?	?	The name of the setting for which the range of expected values should be returned

Return type: (MediaSettingsRange or MediaSettingsList)

get

Returns the current value of a given setting. This is equivalent to reading the IDL attribute of the same name on the source object.

Parameter	Type	Nullable	Optional	Description
settingName	`DOMString`	?	?	The name of the setting for which the current value of that setting should be returned

Return type: any

set

The set API is the mechanism for asynchronously requesting that the source device change the value of a given setting. The API mirrors the syntax used for applying constraints. Generally, the set API will be used to apply specific values to a setting (such as setting the flashMode setting to a specific value), however ranges can also be applied using the same min/max syntax used in constraints (i.e., setting width to a range between 800 and 1200 pixels).

The set API queues requests until the conclusion of the micro-task after which all of the settings requests will be evaluated according to the constraint algorithm, and requests that can be honored will be applied to the source device. Any requests specified using the mandatory parameter that could not be applied must generate a settingserror event. All other non-mandatory requests that could not be applied do not cause any notification to be generated.

For all of the given settings that were changed as a result of a sequence of calls to the set API during a micro-task, one single settingschanged event will be generated containing the names of the settings that changed.

Note

Example: To change the video source's dimensions to any aspect ratio where the height is 768 pixels and the width is at least 300 pixels, would require two calls to set:
set({ width: { min: 300}}, true); set({ height: 768}, true);

In each case where the setting/constraint does not take an enumerated value, the implementation should attempt to match the value onto the nearest supported value of the source device unless the mandatory flag is provided. In the case of mandatory requests, if the setting cannot be exactly supported as requested, then the setting must fail and generate a settingserror event. Regarding width/height values--if an implementation is able to scale the source video to match the requested mandatory constraints, this need not cause a settingserror (but the result may be weirdly proportioned video).

Parameter	Type	Nullable	Optional	Description
setting	`MediaTrackConstraint`	?	?	A JavaScript object (dictionary) consisting of a single property which is the setting name to change, and whose value is either a primitive value (float/DOMString/etc), or another dictionary consisting of a `min` and/or `max` property and associated values.
false	`boolean isMandatory =`	?	?	A flag indicating whether this settings change request should be considered mandatory. If a value of `true` is provided, then should the settings change fail for some reason, a `settingserror` event will be raised. Otherwise, only `settingschanged` event will be dispatched for the settings that were successfully changed. The default, if this flag is not provided, is `false`

Return type: void

4.2.2 MediaSettingsRange dictionary

dictionary MediaSettingsRange {
    any max;
    any min;
    any initial;
};

Dictionary `MediaSettingsRange` Members

max of type any: The maximum value of this setting.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
min of type any: The minimum value of this setting.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
initial of type any: The initial value of this setting. When the object associated with this setting is first made available to the application, the current value of the setting should be set to the initial value. For example, in a browsing scenario, if one web site changes this setting and a subsequent web site gets access to this same setting, the setting should have been reset back to its initial value.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

4.2.3 MediaSettingsList dictionary

dictionary MediaSettingsList {
    sequence<any> values;
    any           initial;
};

Dictionary `MediaSettingsList` Members

values of type sequence<any>: An array of the values of the enumerated type for this setting. Items should be sorted from min (at index 0) to max where applicable, or in the order listed in the enumerated type otherwise.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
initial of type any: The initial value of this setting. When the object associated with this setting is first made available to the application, the current value of the setting should be set to the initial value. For example, in a browsing scenario, if one web site changes this setting and a subsequent web site gets access to this same setting, the setting should have been reset back to its initial value.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.

4.3 Tracking the result of constraint application

4.3.1 `MediaSettingsEventHandlers` mix-in interface

AudioStreamSource implements MediaSettingsEventHandlers;

VideoStreamSource implements MediaSettingsEventHandlers;

AudioStreamRemoteSource implements MediaSettingsEventHandlers;

VideoStreamRemoteSource implements MediaSettingsEventHandlers;

[NoInterfaceObject]
interface MediaSettingsEventHandlers {
             attribute EventHandler onsettingserror;
             attribute EventHandler onsettingschanged;
};

Attributes

onsettingserror of type EventHandler: Register/unregister for "settingserror" events. The handler should expect to get a MediaSettingsEvent object as its first parameter. The event is fired asynchronously after settings change requests (using the set API have been made with at least one such request using the mandatory flag. The MediaSettingsEvent reports the name of the settings that could not be applied. The "settingschanged" event fires before the "settingserror" event (if any).
onsettingschanged of type EventHandler: Register/unregister for "settingschanged" events. The handler should expect to get a MediaSettingsEvent object as its first parameter. The event is fired asynchronously after the settings change requests are made and the settings have actually changed. The "settingschanged" event fires before the "settingserror" event (if any).

4.3.2 `MediaSettingsEvent` interface

[Constructor(DOMString type, optional MediaSettingsEventInit eventInitDict)]
interface MediaSettingsEvent : Event {
    readonly attribute DOMString[] settings;
};

Attributes

settings of type array of DOMString, readonly: A list of settings that failed or succeeded (depending on the event type).

4.3.3 `MediaSettingsEventInit` dictionary

dictionary MediaSettingsEventInit : EventInit {
    sequence<DOMString> settings;
};

Dictionary `MediaSettingsEventInit` Members

settings of type sequence<DOMString>: List of settings to populate into the MediaSettingsEvent object's settings readonly attribute.

5. Constraints Defined in this Proposal

This proposal defines several constraints for use with video and audio devices.

5.1 Video Constraints

The following constraints are applicable to video devices

5.1.1 VideoConstraints dictionary

dictionary VideoConstraints : MediaTrackConstraintSet {
    (unsigned long or MinMaxULongSubConstraint)                     width;
    (unsigned long or MinMaxULongSubConstraint)                     height;
    (unsigned long or MinMaxULongSubConstraint)                     photoWidth;
    (unsigned long or MinMaxULongSubConstraint)                     photoHeight;
    VideoRotationEnum      rotation;
    VideoMirrorEnum        mirror;
    (float or MinMaxFloatSubConstraint)                     zoom;
    VideoFocusModeEnum     focusMode;
    VideoFillLightModeEnum fillLightMode;
    (float or MinMaxFloatSubConstraint)                     frameRate;
    (float or MinMaxFloatSubConstraint)                     bitRate;
};

Dictionary `VideoConstraints` Members

width of type unsigned longMinMaxULongSubConstraint: A device that supports the desired width or width range.
height of type unsigned longMinMaxULongSubConstraint: A device that supports the desired height or height range.
photoWidth of type unsigned longMinMaxULongSubConstraint: A device that supports the desired width or width range for high-resolution photo-modes.
photoHeight of type unsigned longMinMaxULongSubConstraint: A device that supports the desired height or height range for high-resolution photo-modes.
rotation of type VideoRotationEnum: A device that supports the desired rotation.
mirror of type VideoMirrorEnum: A device that supports the desired mirroring.
zoom of type floatMinMaxFloatSubConstraint: A device that supports the desired zoom setting.
focusMode of type VideoFocusModeEnum: A device that supports the desired focus mode.
fillLightMode of type VideoFillLightModeEnum: A device that supports the desired fill light (flash) mode.
frameRate of type floatMinMaxFloatSubConstraint: A device that supports the desired frames per second.
bitRate of type floatMinMaxFloatSubConstraint: A device that supports the desired bit rate.

5.2 Audio Constraints

The following constraints are applicable to audio devices

5.2.1 AudioConstraints dictionary

dictionary AudioConstraints : MediaTrackConstraintSet {
    (unsigned long or MinMaxULongSubConstraint) gain;
};

Dictionary `AudioConstraints` Members

gain of type unsigned longMinMaxULongSubConstraint: A device that supports the desired gain or gain range.

5.3 Common sub-constraint structures

5.3.1 `MinMaxULongSubConstraint` dictionary

dictionary MinMaxULongSubConstraint {
    unsigned long max;
    unsigned long min;
};

Dictionary `MinMaxULongSubConstraint` Members

max of type unsigned long: unsigned long min
min of type unsigned long

5.3.2 `MinMaxFloatSubConstraint` dictionary

dictionary MinMaxFloatSubConstraint {
    float max;
    float min;
};

Dictionary `MinMaxFloatSubConstraint` Members

max of type float: float min
min of type float

Proposal: Media Capture and Streams Settings API v5

Editor's Draft 30 November 2012

This is an out-of-date proposal. Click here for a newer version (v6)

Abstract

Table of Contents

1. Remove LocalMediaStream interface

1.1 Rationale

2. Media Stream Tracks

2.1 Updating MediaStreamTrack

2.1.1 MediaStreamTrack interface

Attributes

2.1.2 TrackReadyStateEnum enumeration

2.2 Creating Derived Tracks

2.2.1 VideoStreamTrack interface

Attributes

2.2.2 AudioStreamTrack interface

Attributes

3. Media Stream Sources

3.1 Local Video and Audio Sources

3.1.1 VideoStreamSource interface

Attributes

Methods

3.1.2 VideoFacingEnum enumeration

3.1.3 VideoRotationEnum enumeration

3.1.4 VideoMirrorEnum enumeration

3.1.5 VideoFocusModeEnum enumeration

3.1.6 VideoFillLightModeEnum enumeration

3.1.7 AudioStreamSource interface

Attributes

Methods

3.2 Camera sources with "high-resolution picture" modes

3.2.1 PictureStreamSource interface

Attributes

Methods

3.2.2 BlobEvent interface

Attributes

3.2.3 BlobEventInit dictionary

Dictionary BlobEventInit Members

3.3 Remote Media Sources

3.3.1 VideoStreamRemoteSource interface

Attributes

3.3.2 AudioStreamRemoteSource interface

Attributes

3.4 Other Settings (out-of-scope in this proposal)

4. Changing Stream Source Settings

4.1 Expectations around changing settings

4.2 StreamSourceSettings mix-in interface

4.2.1 Methods

4.2.2 MediaSettingsRange dictionary

Dictionary MediaSettingsRange Members

4.2.3 MediaSettingsList dictionary

Dictionary MediaSettingsList Members

4.3 Tracking the result of constraint application

4.3.1 MediaSettingsEventHandlers mix-in interface

Attributes

4.3.2 MediaSettingsEvent interface

Attributes

4.3.3 MediaSettingsEventInit dictionary

Dictionary MediaSettingsEventInit Members

5. Constraints Defined in this Proposal

5.1 Video Constraints

5.1.1 VideoConstraints dictionary

Dictionary VideoConstraints Members

5.2 Audio Constraints

5.2.1 AudioConstraints dictionary

Dictionary AudioConstraints Members

5.3 Common sub-constraint structures

5.3.1 MinMaxULongSubConstraint dictionary

Dictionary MinMaxULongSubConstraint Members

5.3.2 MinMaxFloatSubConstraint dictionary

Dictionary MinMaxFloatSubConstraint Members

6. Example usage scenarios

6.1 Getting access to a video and/or audio device (if available)

6.2 Previewing the local video/audio in HTML5 video tag -- scenario is unchanged

6.3 Applying resolution constraints

6.4 Changing zoom in response to user input:

6.5 Adding the local media tracks into a new media stream:

6.6 Take a picture, show the picture in an image tag:

6.7 Show a newly available device

6.8 Show all available video devices:

1. Remove `LocalMediaStream` interface

2.1.1 `MediaStreamTrack` interface

2.2.1 `VideoStreamTrack` interface

2.2.2 `AudioStreamTrack` interface

3.1.1 `VideoStreamSource` interface

3.1.7 `AudioStreamSource` interface

3.2.1 `PictureStreamSource` interface

3.2.2 `BlobEvent` interface

Dictionary `BlobEventInit` Members

3.3.1 `VideoStreamRemoteSource` interface

3.3.2 `AudioStreamRemoteSource` interface

4.2 `StreamSourceSettings` mix-in interface

Dictionary `MediaSettingsRange` Members

Dictionary `MediaSettingsList` Members

4.3.1 `MediaSettingsEventHandlers` mix-in interface

4.3.2 `MediaSettingsEvent` interface

4.3.3 `MediaSettingsEventInit` dictionary

Dictionary `MediaSettingsEventInit` Members

Dictionary `VideoConstraints` Members

Dictionary `AudioConstraints` Members

5.3.1 `MinMaxULongSubConstraint` dictionary

Dictionary `MinMaxULongSubConstraint` Members

5.3.2 `MinMaxFloatSubConstraint` dictionary

Dictionary `MinMaxFloatSubConstraint` Members