iterating on the scenario and requirements for the video chat use case, based on Joe's review
--- a/reqs/Overview.html Fri Jul 13 13:29:18 2012 +0100
+++ b/reqs/Overview.html Thu Aug 02 14:49:58 2012 +0100
@@ -119,25 +119,29 @@
<p>This section will introduce a number of scenarios involving the use of Web Audio processing or synthesis technologies, and discuss implementation and architectural considerations.</p>
<section>
- <h3>Scenario 1: Video Chat</h3>
- <p>Two or more users have loaded a video communication web application into their browsers, provided by the same service provider, and logged into the service it provides. When one online user selects a peer online user, a 1-1 video communication session between the browsers of the two peers is initiated. If there are more than two participants, and if the participants are using adequate hardware, binaural processing is used to position remote participants.</p>
+ <h3>Scenario 1: Video Chat Application</h3>
+ <p>Three people have joined a three-way conversation through a web application. Each of them see the other two participants in split windows, and hear their voice in sync with the video.</p>
+ <p>The application provides a simple interface to control the incoming audio and video of the other participants: at any time, the user can mute the incoming streams, control the overall sound volume, or mute themselves while continuing to send a live video stream through the application.</p>
+
+ <p>Advanced controls are also available. In the "Audio" option panel, the user has the ability to adapt the incoming sound to their taste through a graphic equalizer interface, as well as a number of filters for voice enhancement, a feature which can be useful between people with hearing difficulties, in imperfect listening environments, or to compensate for poor transmission environments.</p>
- <p>In one version of the service, an option allows users to distort (pitch, speed, other effects) their voice for fun. Such a feature could also be used to protect one participants' privacy in some applications.</p>
-
- <p>During the session, each user can also pause sending of media (audio, video, or both) and mute incoming media. An interface gives each user control over the incoming sound volume from each participant - with an option to have the software do it automatically. Another interface offers user-triggered settings (EQ, filtering) for voice enhancement, a feature which can be useful between people with hearing difficulties, in imperfect listening environments, or to compensate for poor transmission environments.</p>
-
+ <p>Another option allows the user to change the spatialization of the voices of their interlocutors; the default is a binaural mix matching the disposition of split-windows on the screen, but the interface makes it possible to reverse the left-right balance, or make the other participants appear closer or further apart.</p>
+
+ <p>The makers of the chat applications also offer a "fun" version which allows users to distort (pitch, speed, other effects) their voice. They are considering adding the option to the default software, as such a feature could also be used to protect one participants' privacy in some contexts.</p>
+
<h4>Notes and Implementation Considerations</h4>
<ol>
- <li><p>This scenario is a good example of the need for audio capture (from line in, internal microphone or other inputs). We expect this to be provided by <a href="http://www.w3.org/TR/html-media-capture/" title="HTML Media Capture">HTML Media Capture</a>.</p></li>
- <li><p>This scenario is heavily inspired from <a href="http://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06#section-4.2.1">the first scenario in WebRTC's Use Cases and Requirements document</a>. Most of the technology described by this scenario should be covered by the <a href="http://www.w3.org/TR/webrtc/" title="WebRTC 1.0: Real-time Communication Between Browsers">Web Real-Time Communication API</a>. The scenario illustrates, however, a technical requirement for processing of the audio signal at both ends (capture of the user's voice and output of its correspondents' conversation).</p></li>
<li><p>The processing capabilities needed by this scenario include:</p>
<ul>
- <li><em>Controlling the gain</em> (mute, pause and volume) of several audio sources</li>
+ <li>Mixing and spatialization of several sound sources</li>
+ <li><em>Controlling the gain</em> (mute and volume control) of several audio sources</li>
<li><em>Filtering</em> (EQ, voice enhancement)</li>
- <li>Pitch, speed distortion</li>
+ <li>Modifying the pitch and speed of sound sources</li>
</ul>
</li>
+ <li><p>This scenario is also a good example of the need for audio capture (from line in, internal microphone or other inputs). We expect this to be provided by <a href="http://www.w3.org/TR/html-media-capture/" title="HTML Media Capture">HTML Media Capture</a>.</p></li>
+ <li><p>The <a href="http://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06#section-4.2.1">first scenario in WebRTC's Use Cases and Requirements document</a> has been a strong inspiration for this scenario. Most of the technology, described above should be covered by the <a href="http://www.w3.org/TR/webrtc/" title="WebRTC 1.0: Real-time Communication Between Browsers">Web Real-Time Communication API</a>. The scenario illustrates, however, the need to integrate audio processing with the handling of RTC streams, with a technical requirement for processing of the audio signal at both ends (capture of the user's voice and output of its correspondents' conversation).</p></li>
</ol>
</section>