LocalMediaStream
interfaceIn this proposal, the derived LocalMediaStream interface is removed. Rather than returning a LocalMediaStream instance in the NavigatorUserMediaSuccessCallback, a vanilla MediaStream object is returned. The primary difference is in the tracks contained in that MediaStream object.
Some feedback even suggested re-considering the "at most one video/audio track per request to getUserMedia".
While thinking about these goals and the feedback, I began to consider a few things:
To illustrate this challenge, consider how my prior proposal required a re-hookup of the MediaStream to a video tag consumer:
Note that this process has to bind a completely new LocalMediaStream to the video tag a second time (if re-using the same video tag) only because the second LocalMediaStream object was different than the first.
It is much more efficient for developer code to simply add/remove tracks to a MediaStream that are relevant, without needing to change the consumer of the MediaStream.
MediaStreamTrack
(derived) typesThis proposal consolidates settings directly into the tracks that are provided by devices. However, in order to do this efficiently and in a future-extensible manner, the highly-generic MediaStreamTrack is now extended for specific characteristics of the devices it embodies, resulting in a hierarchy:
MediaStreamTrack objects that are of kind
"video" and that are located in a MediaStream's
videoTracks
list will be instances of a VideoStreamTrack
. The VideoStreamTrack
provides basic (read-only) properties pertinent to all sources of video.
There is no takePicture API on a VideoStreamTrack because a simple frame-grab can be accomplished using a combination of a <video> and <canvas> APIs (takePicture is intended for use with a camera's high-resolution picture mode, not for arbitrary video frame capture).
I'm intentionally keeping this interface as sparse as possible. Features about the video that can be calculated like aspect ratio are not provided.
VideoStreamTrack
interfaceMediaStreamTrack objects that are of kind
"audio" and that are located in a MediaStream's
audioTracks
list will be instances of an AudioStreamTrack
. The AudioStreamTrack
provides basic (read-only) properties pertinent to all sources of audio.
AudioStreamTrack
interfaceVideoDeviceTracks are created by the user agent to represent a camera device that provides local video.
VideoDeviceTrack
interfaceENDED
state. Same behavior of the old LocalMediaStream's
stop API, but only affects this device track.VideoFacingEnum
enumerationThe PictureDeviceTrack interface is created by the user agent if the camera device providing the VideoDeviceTrack supports an optional "high-resolution picture mode" with picture settings different (better) from those of its basic video constraints.
This track is initially available from a VideoDeviceTrack via the pictureTrack
property. This track type
is not present in the video device list (MediaDeviceList
). Likewise, it cannot be stopped directly, and
its VideoStreamTrack inherited attributes reflect the values of its "owning" VideoDeviceTrack.
The PictureDeviceTrack is essentially a specialized VideoStreamTrack (this track type is of kind "video"
).
It may be explicitly added to a videoTracks list (MediaStreamTrackList) in order to output its track video to a <video>
tag, but its preview video stream reflects the owning VideoDeviceTrack's settings, rather than the settings directly
available on this object. Rather the settings of this object are only applied at the time when the takePicture API is
invoked.
PictureDeviceTrack
interfaceCould consider providing a hint or setting for the desired picture format.
Is an "error" event necessary here too?
In the previous proposal, the PictureEvent returned a Canvas ImageData object, however it makes sense to return a compressed format (PNG/JPEG), especially given that picture snapshots will be very high resolution, and ImageData objects are essentially raw images.
PictureEvent
interfaceAudioDeviceTracks are created by the user agent to represent a microphone device that provides local audio.
AudioDeviceTrack
interfaceENDED
state. Same behavior of the old LocalMediaStream's stop
API, but only for this device track.As noted in prior proposals, camera/microphone settings must be applied asynchronously to ensure that web applications can remain responsive for all device types that may not respond quickly to setting changes.
My prior proposals used a monolithic dictionary of settings for inspection and application. This proposal takes a different approach, considering the feedback for more-direct access to settings, expected patterns for settings adjustment (which is generally one setting at at time as initiated by a web application UI), difficulties in understanding what values were read-only vs. writable, and the current already-defined constraint application engine.
Settings are organized into two groups: value ranges (a continuum of values) and enumerated values. Value ranges include a min and max value, while enumerated values are provided in an array with an associated length. Both groups of settings include an "initial" value, which is the value that is expected to be the device's default value when it is acquired.
The key to changing settings in either setting group is the request() API. This is the mechanism for asynchronously requesting that the device change the value of the setting that the setting group is applicable to. The mechanics for applying setting change requests follows exactly the model used when applying constraints at getUserMedia invocation. Each team a request() is made, the user agent begins building up an [internally-represented] constraint structure which is associated with the device making the request (and only that device). For example, if a "width" setting change request is made, the user agent creates a constraint structure equivalent to the following getUserMedia constraint (except that this constraint only applies to the specific device--not all video devices):
{ video: { optional: [ { width: value } ] } }
If this is the only request during this script-execution task, then when control returns to the user agent, this constraint will be committed (i.e., like an indexedDB transaction) and the constraint application logic will evaluate the request making changes to the current device if applicable.
If there is another request during the same script-execution task, it is appended to the optional list. Since order is important in the optional constraints list, the first requested setting has priority over the next.
The request() API also has a flag used to signal to the UA that the requested setting change should be mandatory. In this case, the constraint is added to the mandatory set, and replaces an existing setting in that set if the names collide (last setting wins). My expectation is that if a mandatory constraint cannot be satisfied, then the UA must end that stream as a result of the failure.
Unlike constraints built using dictionaries for getUserMedia, the constraint structures produced by calls to the request() API will always be individual proposed values, rather than min/max ranges. This is because min/max information is already available within the relevant settings, and can be included in calculations before making the call to request(). Therefore, I didn't feel it was necessary to clutter the API surface with that feature.
MediaSettingsRange objects should be used when the setting can generally actually assume a value along a continuum of values. This specification should indicate what the range of values must be for each setting. Given that implementations of various hardware may not exactly map to the same range, an implementation should make a reasonable attempt to translate and scale the hardware's setting onto the mapping provided by this specification. If this is not possible due to a hardware setting supporting (for example) fewer levels of granularity, then the implementation should make the device settings min value reflect the min value reported in this specification, and the same for the max value. Then for values in between the min and max, the implementation may round to the nearest supported value and report that value in the setting.
For example, if the setting is fluxCapacitance, and has a specified range from -10 (min) to 10 (max) in this specification, but the implementation's fluxCapacitance hardware setting only supports values of "off" "medium" and "full", then -10 should be mapped to "off", 10 should map to "full", and 0 should map to "medium". A request to change the value to 3 should be rounded down to the closest supported setting (0).
MediaSettingsList objects should order their enumerated values from minimum to maximum where it makes sense, or in the order defined by the enumerated type where applicable.
MediaSettingsRange
interfaceThe type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
The mandatory parameter defaults to false.
The value parameter type of this method is specific to the setting. Each setting will describe a specific
type. That type must be provided for this paramter. If the type does align, then the implementation
should throw a TypeError
exception.
MediaSettingsList
interfaceindex
ed enumerated item of this setting. Items should be sorted
from min (at index 0) to max where applicable, or in the order listed in the enumerated type otherwise.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
The type of this value is specific to the setting. Each setting will describe a specific type. That type must be returned for this attribute.
The mandatory parameter defaults to false.
The value parameter type of this method is specific to the setting. Each setting will describe a specific
type. That type must be provided for this paramter. If the type does align, then the implementation
should throw a TypeError
exception.
Settings (read/writable) are defined as separate properties from their read-only counterparts. This allows for a variety of benefits:
These are pluralized for compactness and easy identification as a "setting". The more verbose "widthSettings", "horizontalAspectRatioSettings", "orientationSettings", etc., were considered (and may still be considered).
width
- I've used "dimension" for the setting instead, since resolutions of the camera are nearly
always in step-wise pairs of width/height combinations. These are thus an enumerated type rather than
a range continuum of possible width/height (independent) pairs.
height
- see width explanationhorizontalAspectRatio
- easily calculated based on width/height in the dimension valuesverticalAspectRatio
- see horizontalAspectRatio explanationorientation
- can be easily calculated based on the width/height values and the current rotationaperatureSize
- while more common on digital cameras, not particularly common on webcams (major use-case
for this feature)shutterSpeed
- see aperatureSize explanationdenoise
- may require specification of the algorithm processing or related image processing filter required
to implement.
effects
- sounds like a v2 or independent feature (depending on the effect).faceDetection
- sounds like a v2 feature. Can also be done using post-processing techniques (though
perhaps not as fast...)
antiShake
- sounds like a v2 feature.geoTagging
- this can be independently associated with a recorded picture/video/audio clip using the
Geolocation API. Automatically hooking up Geolocation to Media Capture sounds like an exercise for v2
given the possible complications.
highDynamicRange
- not sure how this can be specified, or if this is just a v2 feature.skintoneEnhancement
- not a particularly common setting.shutterSound
- Can be accomplished by syncing custom audio playback via the <audio> tag if desired.
By default, there will be no sound issued.
redEyeReduction
- photo-specific setting. (Could be considered if photo-specific settings
are introduced.)
meteringMode
- photo-specific setting. (Could be considered if photo-specific settings
are introduced.)iso
- photo-specific setting. while more common on digital cameras, not particularly common on webcams (major use-case
for this feature)sceneMode
- while more common on digital cameras, not particularly common on webcams (major use-case
for this feature)antiFlicker
- not a particularly common setting.zeroShutterLag
- this seems more like a hope than a setting. I'd rather just have implementations
make the shutter snap as quickly as possible after takePicture, rather than requiring an opt-in/opt-out
for this setting.
Some of the above settings are available as constraints, and so are included in the proposed set of constraints in the last section.
PictureAndVideoSettings
mix-in interfaceVideoDeviceTrack
implementsPictureAndVideoSettings
;
PictureDeviceTrack
implementsPictureAndVideoSettings
;
Rotation makes me think I could set this to 45 degrees or some such. Maybe there's a better setting name for this. I only want to support right-angles.
In the case that a camera device supports both optical and digital zoom, does it make sense to have just one property? I expect this to be the "digitalZoom" version, which is more common on devices.
fillLight seemed more appropriate a term to use for both cameras and photo settings.
VideoDimensionDict
dictionaryThe following enums had many more values in the prior proposal, but in the interest of testing, I've scoped the initial list to those that seem most easily testable.
VideoFocusModeEnum
enumerationVideoFillLightModeEnum
enumerationVideoDeviceTrack
partial interfaceI wonder if this should just be a MediaSettingsList with the common values of 15, 30, and 60. Are there really any other values coming from hardware?
My previous proposal included a "bassTone" and "trebleTone" setting value, but on reflection, those settings are more relevant to playback than to microphone device settings. Those settings have been removed.
AudioDeviceTrack
partial interfaceMediaConstraintResultEventHandlers
mix-in interfaceAudioDeviceTrack
implementsMediaConstraintResultEventHandlers
;
VideoDeviceTrack
implementsMediaConstraintResultEventHandlers
;
PictureDeviceTrack
implementsMediaConstraintResultEventHandlers
;
MediaDeviceList
implementsMediaConstraintResultEventHandlers
;
ConstraintErrorEvent
interfaceConstraintErrorEventInit
dictionaryOne common problem with all my previous proposals, and with the existing model for using getUserMedia to request access to additional devices, is the problem of discovery of multiple devices. As I understand it, the existing recommendation relies on "guessing" by making a second (or third, etc.) request to getUserMedia for access to additional devices. This model has two primary advantages:
First, it ensures privacy by making sure that each device request could be approved by the user. I say "could" because there is no current requirement that the user agent be involved, especially when re-requesting a device type that was already approved, for example, a second "video" device. I surmise that a request for a different class of device ("audio", when exclusive "video" was previously approved), would be cause for an implementation to ask the user for approval.
Second, it ensure privacy by not leaking any information about additional devices until the code has successfully requested a device.
Unfortunately, this model does not provide a means for discovery of additional devices. Such a discovery mechanism could be trivially added to this proposal in the form of a device-specific "totalDevices" property, but there's an opportunity for considering a solution that both streamlines the usability of multiple devices while maintaining the privacy benefits of the current model.
The device list is such a proposal. The device list offers the following benefits:
A device list is merely a list of all AudioDeviceTrack or VideoDeviceTrack objects that are available to the application. Device lists are
device-type specific, so there is one device list for all AudioDeviceTrack objects and one device list for all VideoDeviceTrack objects.
There is only one instance of each of these lists at any time, and the lists are LIVE (meaning the user agent keeps them up-to-date
at all times). Device track objects are added to the list as soon as they are available to the application (e.g., as soon as they are
plugged-in) A device track object in the device list will have a readyState set to either LIVE
or MUTED
). Device
tracks are removed from the list when they are unplugged, or otherwise disassociated with their device source such that their readyState
changes to ENDED
.
Every non-ended device track object will belong to a device list. Of course, the same device track object may also belong to zero or more
MediaStreamTrackList
objects. The device list provides the one-stop list for all devices of that type regardless of which
MediaStream's (if any) the device track objects also belong to.
MediaDeviceList
interfaceMUTED
and LIVE
.readyState
is in the LIVE
state.No devices (or their settings) are modified by this API. This API only tests the provided constraints against all the device's capabilities and reports a matching device via the "constraintsuccess" event, or no matches via "constrainterror" event.
ENDED
state. Note that before dispatching this event, the device in question is removed from the device list.
DeviceEvent
interfaceThe actual object referenced by the device attribute will be a derived device track object such as an AudioDeviceTrack, VideoDeviceTrack or PictureDeviceTrack.
DeviceEventInit
dictionaryDevice lists are only accessible from an existing device track object. In other words, the device list itself can only be accessed from
one of the devices contained within it (this is an inside-to-outside reference). To help orient the traversal of the list, each device
track object includes a (dynamically updated) device index property. If a given device track transitions to the
DeviceListAccess
mix-in interfaceAudioDeviceTrack
implementsDeviceListAccess
;
VideoDeviceTrack
implementsDeviceListAccess
;
ENDED
state),
then the deviceIndex property returns null to signal that this device is not in the device list any longer.
This proposal defines several constraints for use with video and audio devices.
These constraints are applied against the device's range or set of enumerated possible settings, but do not result in a setting change on the device. To change actual settings, use the request() API on each setting.
The following constraints are applicable to video devices
VideoConstraints
dictionaryThe following constraints are applicable to audio devices
AudioConstraints
dictionaryMinMaxULongSubConstraint
dictionaryMinMaxFloatSubConstraint
dictionaryVideoOrientationEnum
enumerationAs provided in the 3rd version of this proposal, the following JavaScript examples demonstrate how the Settings APIs defined in this proposal could be used.
navigator.getUserMedia({audio: true, video: true}, gotMedia, failedToGetMedia); function gotMedia(mediastream) { // The recieved mediastream is using its initial settings (it's clean) }
function gotMedia(mediastream) { // objectURL technique document.querySelector("video").src = URL.createObjectURL(mediastream, { autoRevoke: true }); // autoRevoke is the default // direct-assign technique document.querySelector("video").srcObject = mediastream; // Proposed API at this time }
function gotMedia(mediastream) { var videoDevice = mediastream.videoTracks[0]; var maxDimensions = videoDevice.dimensions[videoDevice.dimensions.length - 1]; // Check for 1080p+ support if ((maxDimensions.width >= 1920) && (maxDimensions.height >= 1080)) { // See if I need to change the current settings... if ((videoDevice.width < 1920) && (videoDevice.height < 1080)) { videoDevice.dimensions.request(maxDimensions, true); videoDevice.onconstrainterror = failureToComply; } } else failureToComply(); } function failureToComply(e) { if (e) console.error("Device failed to change " + e.mandatoryConstraints[0]); // 'dimension' else console.error("Device doesn't support at least 1080p"); }
function gotMedia(mediastream) { setupRange( mediastream.videoTracks[0] ); } function setupRange(videoDevice) { // Check to see if the device supports zooming... if (videoDevice.zooms) { // Set HTML5 range control to min/max values of zoom var zoomControl = document.querySelector("input[type=range]"); zoomControl.min = videoDevice.zooms.min; zoomControl.max = videoDevice.zooms.max; zoomControl.value = videoDevice.zoom; zoomControl.zoomController = videoDevice.zooms; // Store the setting zoomControl.onchange = applySettingChanges; } } function applySettingChanges(e) { e.target.zoomController.request(parseFloat(e.target.value), true); }
function gotMedia(mediastream) { return new MediaStream( [ mediastream.videoTracks[0], mediastream.audioTracks[0] ]); }
function gotMedia(mediastream) { var videoDevice = mediastream.videoTracks[0]; // Check if this device supports a picture mode... var pictureDevice = videoDevice.pictureTrack; if (pictureDevice) { pictureDevice.onpicture = showPicture; // Turn on flash only for the snapshot...if available if (pictureDevice.fillLightModes) { // If there's an object here, then the flash is supported pictureDevice.fillLightModes.request("on", true); } else console.info("Flash not available"); pictureDevice.takePicture(); } } function showPicture(e) { var img = document.querySelector("img"); img.src = URL.createObjectURL(e.data); }
A newly available device occurs whenever an existing device that was being used by another application (with exclusive access) is relinquished and becomes available for this application to use. Of course, plugging-in a new device also causes a device to become available.
function gotMedia(mediastream) { mediastream.videoTracks[0].devices.addEventListener("deviceadded", enableAndShowNewDevice); } function enableAndShowNewDevice(e) { // Show the new video device as soon as it's available // New device is muted when it first becomes available e.device.enabled = true; var mStream = new MediaStream(e.device); document.querySelector("video").srcObject = mStream; // Using the proposed direct-assignment API }
function gotMedia(mediastream) { var deviceList = mediastream.videoTracks[0].devices; for (var i = 0; i < deviceList.length; i++) { var videoDevice = deviceList[i]; videoDevice.enabled = true; // Create a video element and add it to the UI var videoTag = document.createElement('video'); videoTag.srcObject = new MediaStream([videoDevice]); document.body.appendChild(videoTag); } }