--- a/reqs/Overview.html Thu Aug 30 08:49:36 2012 +0100
+++ b/reqs/Overview.html Thu Aug 30 10:50:00 2012 +0100
@@ -146,8 +146,8 @@
<section>
<h3>3D game with music and convincing sound effects</h3>
- <p>A user is playing a 3D first-person adventure game on their mobile device. The game is built entirely using open web technologies, but includes rich, convincing sound.</p>
- <p>As soon as the user starts the game, a musical background starts, loops seamlessly, and transitions smoothly from one music track to another as the player enters a house.</p>
+ <p>A commuter is playing a 3D first-person adventure game on their mobile device. The game is built entirely using open web technologies, and includes rich, convincing sound piped through the commuter's stereo headphones.</p>
+ <p>As soon as the game starts, a musical background starts, loops seamlessly, and transitions smoothly from one music track to another as the player enters a house.</p>
<p>While walking in a corridor, the player can hear the muffled sound of a ticking grandfather's clock. Following the direction of the sound and entering a large hall, the sound of the clock becomes clear, reverberating in the large hall. At any time, the sound of the clock spatialized in real-time based on the position of the player's character in the room (relative to the clock) and the current camera angle.</p>
<p>As the soundscape changes, bringing a more somber, scary atmosphere to the scene, the player equips a firearm. Suddenly, a giant snake springs from behind a corner, its hissing becoming a little louder as the snake turns its head towards the player. The weapon fires at the touch of a key, and the player can hear the sound of bullets in near-perfect synchronization with the firing, as well as the sound of bullets ricocheting against walls. The sounds are played immediately after the player presses the key, but the action and video frame rate can remain smooth even when a lot of sounds (bullets being fired, echoing and ricocheting, sound of the impacts, etc) are played at the same time. The snake is now dead, and many flies gather around it, and around the player's character, buzzing and zooming in the virtual space of the room.</p>
@@ -155,8 +155,8 @@
<ol>
<li><p>Developing the soundscape for a game as the one described above can benefit from a <em>modular, node based approach</em> to audio processing. In our scenario, some of the processing needs to happen for a number of sources at the same time (e.g room effects) while others (e.g mixing and spatialization) need to happen on a per-source basis. A graph-based API makes it very easy to envision, construct and control the necessary processing architecture, in ways that would be possible with other kinds of APIs, but more difficult to implement. The fundamental <code>AudioNode</code> construct in the Web Audio API supports this approach.</p></li>
- <li><p>While a single looping music background can be created today with the <a href="http://www.w3.org/TR/html5/the-audio-element.html#the-audio-element" title="4.8.7 The audio element — HTML5">HTML5 <audio> element</a>, the ability to transition smoothly from one musical background to another requires additional capabilities that are found in the Web Audio API including <em>sample-accurate playback scheduling</em> and <em>automated cross-fading of multiple sources</em>. Related API features include <code>AudioBufferSourceNode.noteOn()</code> and <code>AudioParam.setValueAtTime()</code>.</p></li>
- <li><p>The scenario illustrates many aspects of the creation of a credible soundscape. The game character is evolving in a virtual three-dimensional environment and the soundscape is at all times spatialized: a <em>panning model</em> can be used to spatialize sound sources in the game (<code>AudioPanningNode</code>); <em>obstruction / occlusion</em> modelling is used to muffle the sound of the clock going through walls, and the sound of flies buzzing around would need <em>Doppler Shift</em> simulation to sound believable (also supported by <code>AudioPanningNode</code>). The listener's position is part of this 3D model as well (<code>AudioListener</code>).</p></li>
+ <li><p>While a single looping music background can be created today with the <a href="http://www.w3.org/TR/html5/the-audio-element.html#the-audio-element" title="4.8.7 The audio element — HTML5">HTML5 <audio> element</a>, the ability to transition smoothly from one musical background to another requires additional capabilities that are found in the Web Audio API including <em>sample-accurate playback scheduling</em> and <em>automated cross-fading of multiple sources</em>. Related API features include <code>AudioBufferSourceNode.start()</code> and <code>AudioParam.setValueAtTime()</code>.</p></li>
+ <li><p>The scenario illustrates many aspects of the creation of a credible soundscape. The game character is evolving in a virtual three-dimensional environment and the soundscape is at all times spatialized: a <em>panning model</em> can be used to spatialize sound sources in the game (<code>AudioPanningNode</code>); <em>obstruction / occlusion</em> modeling is used to muffle the sound of the clock going through walls, and the sound of flies buzzing around would need <em>Doppler Shift</em> simulation to sound believable (also supported by <code>AudioPanningNode</code>). The listener's position is part of this 3D model as well (<code>AudioListener</code>).</p></li>
<li><p>As the soundscape changes from small room to large hall, the game benefits from the <em>simulation of acoustic spaces</em>, possibly through the use of a <em>convolution engine</em> for high quality room effects as supported by <code>ConvolverNode</code> in the Web Audio API.</p></li>
<li><p>Many sounds in the scenario are triggered by events in the game, and would need to be played with low latency. The sound of the bullets as they are fired and ricochet against the walls, in particular, illustrate a requirement for <em>basic polyphony</em> and <em>high-performance playback and processing of many sounds</em>. These are supported by the general ability of the Web Audio API to include many sound-generating nodes with independent scheduling and high-throughput native algorithms.</p></li>
</ol>
@@ -168,7 +168,7 @@
<h3>Online music production tool</h3>
- <p>A user creates a musical composition from audio media clips using a web-based Digital Audio Workstation (DAW) application.</p>
+ <p>A music enthusiast creates a musical composition from audio media clips using a web-based Digital Audio Workstation (DAW) application.</p>
<p>Audio "clips" are arranged on a timeline representing multiple tracks of audio. Each track's volume, panning, and effects
may be controlled separately. Individual tracks may be muted or soloed to preview various combination of tracks at a given moment.
@@ -193,7 +193,7 @@
<li><p>Building such an application may only be reasonably possible if the technology enables the control of audio with acceptable performance, in particular for <em>real-time processing</em> and control of audio parameters and <em>sample accurate scheduling of sound playback</em>. Because performance is such a key aspect of this scenario, it should probably be possible to control the buffer size of the underlying Audio API: this would allow users with slower machines to pick a larger buffer setting that does not cause clicks and pops in the audio stream.</p></li>
- <li><p>The ability to visualise the samples and their processing benefits from <em>real-time time-domain and frequency analysis</em>, as supplied by the Web Audio API's <code>RealtimeAnalyzerNode</code>.</p></li>
+ <li><p>The ability to visualize the samples and their processing benefits from <em>real-time time-domain and frequency analysis</em>, as supplied by the Web Audio API's <code>RealtimeAnalyzerNode</code>.</p></li>
<li><p> Clips must be able to be loaded into memory for fast playback. The Web Audio API's <code>AudioBuffer</code> and <code>AudioBufferSourceNode</code> interfaces address this requirement.</p></li>
<li><p>The ability to schedule both audio clip playback and effects parameter value changes in advance is essential to support automated mixdown</p></li>
<li><p> To export an audio file, the audio rendering pipeline must be able to yield buffers of sample frames directly, rather than being forced to an audio device destination. Built-in codecs to translate these buffers to standard audio file output formats are also desirable.</p></li>
@@ -216,15 +216,15 @@
<li>a remote microphone for a remote guest</li>
</ul>
- <p>A simple mixer lets the broadcaster control the volume, pan and effects processing for each local or remote audio source, blending them into a single stereo output mix that is broadcast as the show's content. Indicators track the level of each active source. This mixer also incorporates some automatic features to make the broadcaster's life easier, including ducking of prerecorded audio sources when any local or remote microphone source is active. Muting(unmuting) of sources causes an automatic fast volume fade-out(in) to avoid audio transients. The broadcaster can hear a live monitor mix through headphones, with an adjustable level for monitoring their local microphone.</p>
+ <p>A simple mixer lets the broadcaster control the volume, pan and effects processing for each local or remote audio source, blending them into a single stereo output mix that is broadcast as the show's content. Indicators track the level of each active source. This mixer also incorporates some automatic features to make the broadcaster's life easier, including ducking of prerecorded audio sources when any local or remote microphone source is active. Muting (un-muting) of sources causes an automatic fast volume fade-out(in) to avoid audio transients. The broadcaster can hear a live monitor mix through headphones, with an adjustable level for monitoring their local microphone.</p>
<p>The application is aware of when prerecorded audio is playing in the mix, and each audio track's descriptive metadata is shown to the audience in synchronization with what they are hearing.</p>
<p>The guest interface supports a single live audio source from a choice of any local microphone.</p>
- <p>The audience interface delivers the channel's broadcast mix, but also offers basic volume and EQ control plus the ability to pause/rewind/resume the live stream. Optionally, the user can slow down the content of the audio without changing its pitch, for example to aid in understanding a foreign language.</p>
+ <p>The audience interface delivers the channel's broadcast mix, but also offers basic volume and EQ control plus the ability to pause/rewind/resume the live stream. Optionally, the listener can slow down the content of the audio without changing its pitch, for example to aid in understanding a foreign language.</p>
- <p>An advanced feature would give the audience control over the mix itself. The mix of tracks and sources created by the broadcaster would be a default, but the listener would have the abolity to create a different mix. For instance, in the case of a radio play with a mix of voices, sound effects and music, the listener could be offered an interface to control the relative volume of the voices to effects and music, or create a binaural mix tailored specifically to their taste. Such a feature would provide valuable personalisation of the radio experience, as well as significant accessibility enhancements.</p>
+ <p>An advanced feature would give the audience control over the mix itself. The mix of tracks and sources created by the broadcaster would be a default, but the listener would have the ability to create a different mix. For instance, in the case of a radio play with a mix of voices, sound effects and music, the listener could be offered an interface to control the relative volume of the voices to effects and music, or create a binaural mix tailored specifically to their taste. Such a feature would provide valuable personalisation of the radio experience, as well as significant accessibility enhancements.</p>
<h4>Notes and Implementation Considerations</h4>
<ol>
@@ -252,8 +252,8 @@
<h3>Music Creation Environment with Sampled Instruments</h3>
- <p>A user is employing a web-based application to create and edit a musical composition with live synthesized playback. The user interface for composing can take a number of forms including conventional Western notation and a piano-roll style display. The document can be sonically rendered on demand as a piece of music, <em>i.e.</em> a series of precisely timed, pitched and modulated audio events (notes).
- <p>The user occasionally stops editing and wishes to hear playback of some or all of the score they are working on to take stock of their work. At this point the program performs sequenced playback of some portion of the document. Some simple effects such as instrument panning and room reverb are also applied for a more realistic and satisfying effect.</p>
+ <p>A composer is employing a web-based application to create and edit a musical composition with live synthesized playback. The user interface for composing can take a number of forms including conventional Western notation and a piano-roll style display. The document can be sonically rendered on demand as a piece of music, <em>i.e.</em> a series of precisely timed, pitched and modulated audio events (notes).
+ <p>The musician occasionally stops editing and wishes to hear playback of some or all of the score they are working on to take stock of their work. At this point the program performs sequenced playback of some portion of the document. Some simple effects such as instrument panning and room reverb are also applied for a more realistic and satisfying effect.</p>
<p>Compositions in this editor employ a set of instrument samples, i.e. a pre-existing library of recorded audio snippets. Any given snippet is a brief audio recording of a note played on an instrument with some specific and known combination of pitch, dynamics and articulation. The combinations in the library are necessarily limited in number to avoid bandwidth and storage overhead. During playback, the editor must simulate the sound of each instrument playing its part in the composition. This is done by transforming the available pre-recorded samples from their original pitch, duration and volume to match the characteristics prescribed by each note in the composed music. These per-note transformations must also be scheduled to be played at the times prescribed by the composition.</p>
<p>During playback a moving cursor indicates the exact point in the music that is being heard at each moment.</p>
<p>At some point the user exports an MP3 or WAV file from the program for some other purpose. This file contains the same audio rendition of the score that is played interactively when the user requested it earlier.</p>
@@ -261,7 +261,7 @@
<h4>Notes and Implementation Considerations</h4>
<ol>
<li><p> Instrument samples must be able to be loaded into memory for fast processing during music rendering. These pre-loaded audio snippets must have a one-to-many relationship with objects in the Web Audio API representing specific notes, to avoid duplicating the same sample in memory for each note in a composition that is rendered with it. The API's <code>AudioBuffer</code> and <code>AudioBufferSourceNode</code> interfaces address this requirement.</p></li>
- <li><p>It must be possible to schedule large numbers of individual events over a long period of time, each of which is a transformation of some original audio sample, without degrading real-time browser performance. A graph-based approach such as that in the Web Audio API makes the construction of any given transformation practical, by supporting simple recipes for creating subgraphs built around a sample's pre-loaded <code>AudioBuffer</code>. These subgraphs can be constructed and scheduled to be played in the future. In one approach to supporting longer compositions, the construction and scheduling of future events can be kept "topped up" via periodic timer callbacks, to avoid the overhead of creating huge graphs all at once.</p></li>
+ <li><p>It must be possible to schedule large numbers of individual events over a long period of time, each of which is a transformation of some original audio sample, without degrading real-time browser performance. A graph-based approach such as that in the Web Audio API makes the construction of any given transformation practical, by supporting simple recipes for creating sub-graphs built around a sample's pre-loaded <code>AudioBuffer</code>. These subgraphs can be constructed and scheduled to be played in the future. In one approach to supporting longer compositions, the construction and scheduling of future events can be kept "topped up" via periodic timer callbacks, to avoid the overhead of creating huge graphs all at once.</p></li>
<li><p>A given sample must be able to be arbitrarily transformed in pitch and volume to match a note in the music. <code>AudioBufferSourceNode</code>'s <code>playbackRate</code> attribute provides the pitch-change capability, while <code>AudioGainNode</code> allows the volume to be adjusted.</p></li>
<li><p>A given sample must be able to be arbitrarily transformed in duration (without changing its pitch) to match a note in the music. <code>AudioBufferSourceNode</code>'s looping parameters provide sample-accurate start and end loop points, allowing a note of arbitrary duration to be generated even though the original recording may be brief.</p></li>
<li><p>Looped samples by definition do not have a clean ending. To avoid an abrupt glitchy cutoff at the end of a note, a gain and/or filter envelope must be applied. Such envelopes normally follow an exponential trajectory during key time intervals in the life cycle of a note. The <code>AudioParam</code> features of the Web Audio API in conjunction with <code>AudioGainNode</code> and <code>BiquadFilterNode</code> support this requirement.</p></li>
@@ -282,14 +282,14 @@
<p>Once the correct match is reached, The DJ would be able to start playing the track in the main audio output, either immediately or by slowly changing the volume controls for each track. She uses a cross fader to let the new song blend into the old one, and eventually goes completely across so only the new song is playing. This gives the illusion that the song never ended.</p>
- <p>At the other end, fans listening to the set would be able to watch a video of the DJ mixing, accompanied by a graphic visualization of the music, picked from a variety of choices: spectrum analysis, level-meter view or a number of 2D or 3D abstract visualisations displayed either next to or overlaid on the DJ video.</p>
+ <p>At the other end, fans listening to the set would be able to watch a video of the DJ mixing, accompanied by a graphic visualization of the music, picked from a variety of choices: spectrum analysis, level-meter view or a number of 2D or 3D abstract visualizations displayed either next to or overlaid on the DJ video.</p>
<h4>Notes and Implementation Considerations</h4>
<ol>
<li>As in many other scenarios in this document, it is expected that APIs such as the <a href="http://www.w3.org/TR/webrtc/" title="WebRTC 1.0: Real-time Communication Between Browsers">Web Real-Time Communication API</a> will be used for the streaming of audio and video across a number of clients.</li>
<li>
<p>One of the specific requirements illustrated by this scenario is the ability to have two different outputs for the sound: one for the headphones, and one for the music stream sent to all the clients. With the typical web-friendly hardware, this would be difficult or impossible to implement by considering both as audio destinations, since they seldom have or allow two sound outputs to be used at the same time. And indeed, in the current Web Audio API draft, a given <code>AudioContext</code> can only use one <code>AudioDestinationNode</code> as destination.</p>
- <p>However, if we consider that the headphones are the audio output, and that the streaming DJ set is not a typical audio destination but an outgoing <code>MediaStream</code> passed on to the WebRTC API, it should be possible to implement this scenario, sending output to both headphones and the stream and gradually sending sound from one to the other without affecting theexact state of playback and processing of a source. With the Web Audio API, this can be achieved by using the <code>createMediaStreamDestination()</code> interface.</p>
+ <p>However, if we consider that the headphones are the audio output, and that the streaming DJ set is not a typical audio destination but an outgoing <code>MediaStream</code> passed on to the WebRTC API, it should be possible to implement this scenario, sending output to both headphones and the stream and gradually sending sound from one to the other without affecting the exact state of playback and processing of a source. With the Web Audio API, this can be achieved by using the <code>createMediaStreamDestination()</code> interface.</p>
</li>
<li>This scenario makes heavy usage of audio analysis capabilities, both for automation purposes (beat detection and beat matching) and visualization (spectrum, level and other abstract visualization modes).</li>
<li>The requirement for pitch/speed change are not currently covered by the Web Audio API's native processing nodes. Such processing would probably have to be handled with custom processing nodes.</li>
@@ -299,7 +299,7 @@
<section>
<h3>Playful sonification of user interfaces</h3>
- <p>A child is visiting a social website designed for kids. The playful, colorful HTML interface is accompanied by sound effects played as the child hovers or clicks on some of the elements of the page. For example, when filling in a form the sound of a typewriter can be heard as the child types in the form field. Some of the sounds are spatialized and have a different volume depending on where and how the child interacts with the page. When an action triggers a download visualised with a progress bar, a gradually rising pitch sound accompanies the download and another sound (ping!) is played when the download is complete.</p>
+ <p>A child is visiting a social website designed for kids. The playful, colorful HTML interface is accompanied by sound effects played as the child hovers or clicks on some of the elements of the page. For example, when filling in a form the sound of a typewriter can be heard as the child types in the form field. Some of the sounds are spatialized and have a different volume depending on where and how the child interacts with the page. When an action triggers a download visualized with a progress bar, a gradually rising pitch sound accompanies the download and another sound (ping!) is played when the download is complete.</p>
<h4>Notes and Implementation Considerations</h4>
<ol>
@@ -313,13 +313,13 @@
<section>
<h3>Podcast on a flight</h3>
- <p>A user is subscribed to a podcast, and has previously
+ <p>A traveler is subscribed to a podcast, and has previously
downloaded an audio book on his device using the podcast's
web-based application. The audio files are stored locally on
- the device, giving simple and convenient access to episodic
+ his device, giving simple and convenient access to episodic
content whenever the user wishes to listen.</p>
- <p>The user is sitting in an airplane for a 2-hour flight. The user opens
+ <p>Sitting in an airplane for a 2-hour flight, he opens
the podcast application in his HTML browser and sees that the episode he has
selected lasts 3 hours. The application offers a speed-up feature that allows
the speech to be delivered at a faster than normal speed without pitch distortion
@@ -348,7 +348,7 @@
<p>While editing, the online tool must ensure that the audio and video playback are synchronized, allowing the editor to insert audio samples at the right time. As the length of one of the songs is slightly different from the video segment she is matching it with, she can synchronize the two by slightly speeding up or slowing down the audio track. The final soundtrack is mixed down into the final soundtrack, added to the video as a replacement for the original audio track, and synced with the video track.</p>
- <p>Once the audio description and commentary are recorded, the film, displayed in a HTML web page, can be played with its original audio track (embedded in the video container) or with any of the audio commentary tracks loaded from a different source and synchronised with the video playback. When there's audio on the commentary track, the main track volume is reduced (ducked) gradually and smoothly brought back to full volume when the commentary / description track is silent. The visitor can switch between audio tracks on the fly, without affecting the video playback. Pausing the video playback also pauses the commentary track, which then remains in sync when playback resumes.</p>
+ <p>Once the audio description and commentary are recorded, the film, displayed in a HTML web page, can be played with its original audio track (embedded in the video container) or with any of the audio commentary tracks loaded from a different source and synchronized with the video playback. When there's audio on the commentary track, the main track volume is reduced (ducked) gradually and smoothly brought back to full volume when the commentary / description track is silent. The visitor can switch between audio tracks on the fly, without affecting the video playback. Pausing the video playback also pauses the commentary track, which then remains in sync when playback resumes.</p>
<h4>Notes and Implementation Considerations</h4>