audio: changeset 101:484560a4e887

--- a/reqs/Overview.html	Fri Aug 03 15:04:58 2012 -0400
+++ b/reqs/Overview.html	Sat Aug 04 08:45:54 2012 -0400
@@ -229,107 +229,49 @@
   
       <h4>Notes and Implementation Considerations</h4>
       <ol>
-        <li>As with the Video Chat Application scenario, streaming and local device discovery and access within this scenario are handled by the <a href="http://www.w3.org/TR/webrtc/" title="WebRTC 1.0: Real-time Communication Between Browsers">Web Real-Time Communication API</a>. The local audio processing in this scenario highlights the requirement that <em>RTC streams and Web Audio be tightly integrated</em>. Incoming MediaStreams must be able to be exposed as audio sources, and audio destinations must be able to yield an outgoing RTC stream. For example, the broadcaster's browser employs a set of incoming MediaStreams from microphones, remote participants, etc., locally processes their audio through a graph of <code>AudioNodes</code>, and directs the output to an outgoing MediaStream representing the live mix for the show.</li>
-        <li>Building this application requires the application of <em>gain control</em>, <em>panning</em>, <em>audio effects</em> and <em>blending</em> of multiple <em>mono and stereo audio sources</em> to yield a stereo mix. Some relevant features in the API include <code>AudioGainNode</code>, <code>ConvolverNode</code>, <code>AudioPannerNode</code>.</li>
+        <li><p>As with the Video Chat Application scenario, streaming and local device discovery and access within this scenario are handled by the <a href="http://www.w3.org/TR/webrtc/" title="WebRTC 1.0: Real-time Communication Between Browsers">Web Real-Time Communication API</a>. The local audio processing in this scenario highlights the requirement that <em>RTC streams and Web Audio be tightly integrated</em>. Incoming MediaStreams must be able to be exposed as audio sources, and audio destinations must be able to yield an outgoing RTC stream. For example, the broadcaster's browser employs a set of incoming MediaStreams from microphones, remote participants, etc., locally processes their audio through a graph of <code>AudioNodes</code>, and directs the output to an outgoing MediaStream representing the live mix for the show.</p></li>
+        <li><p>Building this application requires the application of <em>gain control</em>, <em>panning</em>, <em>audio effects</em> and <em>blending</em> of multiple <em>mono and stereo audio sources</em> to yield a stereo mix. Some relevant features in the API include <code>AudioGainNode</code>, <code>ConvolverNode</code>, <code>AudioPannerNode</code>.</p></li>
         
-        <li><em>Noise gating</em> (suppressing output when a source's level falls below some minimum threshold) is highly desirable for microphone inputs to avoid stray room noise being included in the broadcast mix. This could be implemented as a custom algorithm using a <code>JavaScriptAudioNode</code>.</li>
-        <li>To drive the visual feedback to the broadcaster on audio source activity and to control automatic ducking, this scenario needs a way to easily <em>detect the time-averaged signal level</em> on a given audio source.</li>
-        <li>Ducking affects the level of multiple audio sources at once, which implies the ability to associate a single <em>dynamic audio parameter</em> to the gain associated with these sources' signal paths.  The specification's <code>AudioGain</code> interface provides this.</li>
-        <li>Smooth muting requires the ability to <em>smoothly automate gain changes</em> over a time interval, without using browser-unfriendly coding techniques like tight loops or high-frequency callbacks. The <em>parameter automation</em> features associated with <code>AudioParam</code> are useful for this kind of feature.</li>
-        <li>Pausing and resuming the show on the audience side implies the ability to <em>buffer data received from audio sources</em> in the processing graph, and also to <em>send buffered data to audio destinations</em>.</li>
-        <li>The functionality for audio speed changing, a custom algorithm, requires the ability to <em>create custom audio transformations</em> using a browser programming language (e.g. <code>JavaScriptAudioNode</code>).</li>
-        <li>There is a standard way to access a set of <em>metadata properties for media resources</em> with the following W3C documents:
-          <ul><li> <a href="http://www.w3.org/TR/mediaont-10/" title="http://www.w3.org/TR/mediaont-10/">Ontology for Media Resources 1.0</a>. This document defines a core set of metadata properties for media resources, along with their mappings to elements from a set of existing metadata formats.
-          </li><li> <a href="http://www.w3.org/TR/mediaont-api-1.0/" title="http://www.w3.org/TR/mediaont-api-1.0/">API for Media Resources 1.0</a>. This API provides developers with a convenient access to metadata information stored in different metadata formats. It provides means to access the set of metadata properties defined in the Ontology for Media Resources 1.0 specification. 
-          </li></ul>
-        </li>
+        <li><p><em>Noise gating</em> (suppressing output when a source's level falls below some minimum threshold) is highly desirable for microphone inputs to avoid stray room noise being included in the broadcast mix. This could be implemented as a custom algorithm using a <code>JavaScriptAudioNode</code>.</p></li>
+        <li><p>To drive the visual feedback to the broadcaster on audio source activity and to control automatic ducking, this scenario needs a way to easily <em>detect the time-averaged signal level</em> on a given audio source.</p></li>
+        <li><p>Ducking affects the level of multiple audio sources at once, which implies the ability to associate a single <em>dynamic audio parameter</em> to the gain associated with these sources' signal paths.  The specification's <code>AudioGain</code> interface provides this.</p></li>
+        <li><p>Smooth muting requires the ability to <em>smoothly automate gain changes</em> over a time interval, without using browser-unfriendly coding techniques like tight loops or high-frequency callbacks. The <em>parameter automation</em> features associated with <code>AudioParam</code> are useful for this kind of feature.</p></li>
+        <li><p>Pausing and resuming the show on the audience side implies the ability to <em>buffer data received from audio sources</em> in the processing graph, and also to <em>send buffered data to audio destinations</em>.</p></li>
+        <li><p>The functionality for audio speed changing, a custom algorithm, requires the ability to <em>create custom audio transformations</em> using a browser programming language (e.g. <code>JavaScriptAudioNode</code>).</p></li>
+        <li><p>There is a standard way to access a set of <em>metadata properties for media resources</em> with the following W3C documents:
+          <ul><li><p> <a href="http://www.w3.org/TR/mediaont-10/" title="http://www.w3.org/TR/mediaont-10/">Ontology for Media Resources 1.0</a>. This document defines a core set of metadata properties for media resources, along with their mappings to elements from a set of existing metadata formats.
+          </p></li><li><p> <a href="http://www.w3.org/TR/mediaont-api-1.0/" title="http://www.w3.org/TR/mediaont-api-1.0/">API for Media Resources 1.0</a>. This API provides developers with a convenient access to metadata information stored in different metadata formats. It provides means to access the set of metadata properties defined in the Ontology for Media Resources 1.0 specification. 
+          </p></li></ul>
+        </p></li>
       </ol>
     </section>
       
-      <section>      
-      <h3>UC 5: writing music on the web </h3>
-      <p>A user is employing a web-based application to create and edit a musical score written in conventional Western notation, guitar tablature or a beat grid. The score is complex, several minutes long, and incorporates multiple instrument sounds.
-      </p><p>When the user starts up the application, it initializes quickly. There is no long pause to load large volumes of audio media. In fact, because the user has run this application before and some assets have been preloaded, there is almost no wait at all. The score promptly appears on the screen as a set of interactive objects including measures, notes, clefs, and many other musical symbols.
-      </p><p>In the course of using such an application, the user often selects existing notes and enters new ones.  As they do so, the program provides audio feedback on their actions by providing individual, one-shot playback of the manipulated notes.
-      </p><p>The user occasionally stops editing and wishes to hear playback of some or all of the score they are working on to take stock of their work. At this point the program performs sequenced playback of a portion of the document. The playback is a rich audio realization of the score, as a set of notes and other sonic events performed with a high degree of musical accuracy in terms of rhythm, pitch, dynamics, timbre, articulation, and so on. Some simple effects such as instrument panning and room reverb are applied for a more realistic and satisfying effect.
-      </p><p>During playback a moving cursor indicates the exact point in the music that is being heard at each moment.
-      </p><p>The user decides to add a new part to the score. In doing so, the program plays back samples of various alternative instrumental sounds for feedback purposes, to assist the user in selecting the desired sound for use within the score.
-      </p><p>At some point the user exports an MP3 or WAV file from the program for some other purpose. This file contains the same audio rendition of the score that is played interactively when the user requested it earlier.
-      </p>
-      <h4>UC5 — Priority </h4>
-      <pre> <i>Priority: <b>LOW</b></i>
-      </pre>
-      <p>… consensus reached during the teleconference on <a href="http://www.w3.org/2012/02/13-audio-minutes" title="http://www.w3.org/2012/02/13-audio-minutes">13 Feb 2012</a>. 
-      </p>
-      <p>General consensus that while this is an interesting use case, there is no clamor to facilitate it entirely and urgently.</p>
-
-      <h4>UC5 — Other Requirements TBA? </h4>
-      <p>The context of a music writing application introduces some additional high level requirements on a sequencing/synthesis subsystem:
-      </p>
-      <ul><li> It is necessary to coordinate visual display with sequenced playback of the document, such as a moving cursor or highlighting effect applied to notes. This implies the need to programmatically determine the exact time offset of the sound being physically rendered through the computer's audio output channel. This time offset must, in turn, have a well-defined relationship to time offsets in prior API requests to schedule various notes at various times.
-      </li></ul>
-      <ul><li> It is necessary to be able to stop and start all types of playback with low latency. To avoid sudden clicks, pops or other artifacts it should be possible to apply a fade out curve of an arbitrary length.  
-      </li></ul>
-      <ul><li> Sequenced playback must make light enough demands on browser processing to support user interaction with the music writing application without degrading the experience.
-      </li></ul>
-      <ul><li> Initialization of resources required for sequenced playback must not impose an unacceptable startup delay on the application. 
-      </li></ul>
-      <ul><li> To export an audio file, it is highly desirable (to allow faster-than-real-time rendering, for example) that the audio rendering pipeline be able to yield buffers of sample frames directly, rather than being forced to an audio device destination. Built-in codecs to translate these buffers to standard audio file output formats are also desirable.
-      </li></ul>
-      
-      </section>
-      
       <section>
       
       
-      <h3>UC 6: wavetable synthesis of a virtual music instrument </h3>
-      <p>Of necessity this use case is a bit more abstract, since it describes a component that can underlie a number of the other use cases in this document including UC 2, UC 3 and UC 5.
-      </p><p>A user is employing an application that requires polyphonic, multi-timbral performance of musical events loosely referred to as "notes", where a note is a synthesized instrument sound. The class of such applications includes examples such as:
-      </p>
-      <ul><li> sheet music editors playing back a score document
-      </li><li> novel music creation environments, e.g. beat grids or "virtual instruments" interpreting touch gestures in real time to control sound generation
-      </li><li> games that generate or stitch together musical sequences to accompany the action (see UC 2)
-      </li><li> applications that include the ability to render standard MIDI files as sound (see UC 3)
-      </li></ul>
-      <p>To make this use case more specific and more useful, we'll stipulate that the synthesizer is implemented using several common data structures and software components:
-      </p>
-      <ul><li> A time-ordered list data structure of some kind, specifying a set of musical events or "notes" in terms of high-level parameters rather than audio samples. This data may be dynamically generated by the application (as in UC 5) or defined in advance by a document such as a MIDI file. A single list entry supplies a tuple of parameters, including:
-      <ul><li> onset time
-      </li><li> duration
-      </li><li> instrument choice
-      </li><li> volume
-      </li><li> pitch
-      </li><li> articulation
-      </li><li> time-varying modulation 
-      </li></ul>
-      </li></ul>
-      <ul><li> A set of "instrument definition" data structures referenced by the above notes. These definitions are data structures that, when coupled with the information in a single note, suffice to generate a sample-accurate rendering of that note alone. 
-      </li></ul>
-      <ul><li> A set of "performance parameters" that govern output at a high level, such as gain or EQ settings affecting the mix of various instruments in the performance.
-      </li></ul>
-      <ul><li> A "music synthesizer" procedure that interprets all of the above structures and employs the HTML5 Audio API to realize a complete musical performance of the notes, the instrument definitions and the performance parameters, in real time.
-      </li></ul>
-      <p>Nailing down the nature of an instrument definition turns is crucial for narrowing the requirements further. This use case therefore that an instrument definition is a "wavetable instrument", which is an approach that uses a small number of short samples in conjunction with looping, frequency shifting, envelopes and modulators. This is a good choice because it is ubiquitous in the software synthesis world and it's a little more demanding in terms of requirements than direct algorithmic generation of waveforms or FM synthesis.
-      </p><p>Our use case's wavetable instrument includes the following elements:
-      </p>
-      <ul><li> A list of audio buffers (commonly called "root samples") containing recorded instrument sounds at specific pitch levels. Each buffer is associated with a  "target pitch/velocity range", meaning that a note whose pitch and velocity fall within the given range will utilize this root sample, with an appropriate gain and sample-rate adjustment, as its sound source.
-      </li><li> Looping parameters providing sample-accurate start and end loop points within a root sample, so that it can generate a note of arbitrary duration even though the root sample is compact. This intra-sample looping ability is absolutely required for most instruments, except for short-duration percussion sounds.
-      </li><li> Parameters that determine a note-relative gain envelope based on the note's parameters. Such an envelope normally follows an exponential trajectory during key intervals in the lifetime of the note known as attack (a short period following onset), decay (an arbitrarily long period following attack), and release (a short period following the end of the note). After the primary sound source, this envelope is the most musically salient attribute of an instrument.
-      </li><li> Parameters that similarly derive an LP filter cutoff-frequency envelope, similarly based on the note parameters. This is often used to enhance the gain envelope, as many physical instruments include more high-frequency components at the start of a note.
-      </li><li> Parameters that supply additional modulation during the course of the note to the pitch-shift or attenuation applied to its sound source. Often the modulation is a low-frequency triangle wave, or a single exponential ramp between points. This technique supplies interpretive and articulatory effects like vibrato or glissando.
-      </li></ul>
-      <h4>UC6 — Priority </h4>
-      <pre> <i>Priority: <b>HIGH</b></i>
-      </pre>
-      <p>… <a href="http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0259.html" title="http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0259.html">Under discussion</a>. 
-      </p><p>From the input gathered so far, there seems to be a reasonable amount of interest in the capabilities detailed in this use case.
-      </p>
+      <h3>Scenario 5: Music Creation Environment with Sampled Instruments</h3>
+      <p>A user is employing a web-based application to create and edit a musical composition. The user interface for composing can take a number of forms ranging from a beat grid or piano-roll display to conventional Western notation. Whatever the visual representation, the key idea of the scenario is that the user is editing a document that is sonically rendered as a series of precisely timed and modulated audio events (notes) that collectively make up a piece of music.</p>
+      <p>The user occasionally stops editing and wishes to hear playback of some or all of the score they are working on to take stock of their work. At this point the program performs sequenced playback of some portion of the document. Some simple effects such as instrument panning and room reverb are also applied for a more realistic and satisfying effect.</p>
+      <p>Compositions in this editor employ a set of instrument samples, i.e. a pre-existing library of recorded audio snippets. Any given snippet is a brief audio recording of a note played on an instrument with some specific and known combination of pitch, dynamics and articulation. The combinations in the library are necessarily limited in number to avoid bandwidth and storage overhead. During playback, the editor must simulate the sound of each instrument playing its part in the composition.  This is done by transforming the available pre-recorded samples from their original pitch, duration and volume to match the characteristics prescribed by each note in the composed music.  These per-note transformations must also be scheduled to be played at the times prescribed by the composition.</p>
+      <p>During playback a moving cursor indicates the exact point in the music that is being heard at each moment.</p>
+      <p>At some point the user exports an MP3 or WAV file from the program for some other purpose. This file contains the same audio rendition of the score that is played interactively when the user requested it earlier.</p>
+
+      <h4>Notes and Implementation Considerations</h4>
+      <ol>
+        <li><p> Instrument samples must be able to be loaded into memory for fast processing during music rendering. These pre-loaded audio snippets must have a one-to-many relationship with objects in the API representing specific notes, to avoid duplicating the same sample in memory for each note in a composition that is rendered with it. The API's <code>AudioBuffer</code> and <code>AudioBufferSourceNode</code> interfaces address this requirement.</p></li>
+        <li><p>It must be possible to schedule large numbers of individual events over a long period of time, each of which is a transformation of some original audio sample, without degrading real-time browser performance. The API's graph-based approach makes the construction of any given transformation practical, by supporting simple recipes for creating subgraphs built around a sample's pre-loaded <code>AudioBuffer</code>.  These subgraphs can be constructed and scheduled to be played in the future. In one approach to supporting longer compositions, the construction and scheduling of future events can be kept "topped up" via periodic timer callbacks, to avoid the overhead of creating huge graphs all at once.</p></li>
+        <li><p>A given sample must be able to be arbitrarily transformed in pitch and volume to match a note in the music. <code>AudioBufferSourceNode</code>'s <code>playbackRate</code> attribute provides the pitch-change capability, while <code>AudioGainNode</code> allows the volume to be adjusted.</p></li>
+        <li><p>A given sample must be able to be arbitrarily transformed in duration (without changing its pitch) to match a note in the music. <code>AudioBufferSourceNode</code>'s looping parameters provide sample-accurate start and end loop points, allowing a note of arbitrary duration to be generated even though the original recording may be brief.</p></li>
+        <li><p>Looped samples by definition do not have a clean ending. To avoid an abrupt glitchy cutoff at the end of a note, a gain and/or filter envelope must be applied. Such envelopes normally follow an exponential trajectory during key time intervals in the life cycle of a note. The <code>AudioParam</code> features of the API in conjunction with <code>AudioGainNode</code> and <code>BiquadFilterNode</code> support this requirement.</p></li>
+        <li><p> It is necessary to coordinate visual display with sequenced playback of the document, such as a moving cursor or highlighting effect applied to notes. This implies the need to programmatically determine the exact time offset within the performance of the sound being currently rendered through the computer's audio output channel. This time offset must, in turn, have a well-defined relationship to time offsets in prior API requests to schedule various notes at various times. The API provides such a capability in the <code>AudioContext.currentTime</code> attribute.</p></li>
+        <li><p> To export an audio file, the audio rendering pipeline must be able to yield buffers of sample frames directly, rather than being forced to an audio device destination. Built-in codecs to translate these buffers to standard audio file output formats are also desirable.</p></li>
+        <li><p>Typical per-channel effects such as stereo pan control must be readily available (<code>AudioPannerNode</code>).</p></li>
+        <li><p>Typical master bus effects such as room reverb must be readily available (<code>ConvolverNode</code>).</p></li>
+      </ol>
 
     </section>
-    
+
     <section>
-
-
       <h3>UC 7: Audio / Music Visualization </h3>
       <p>A user is playing back audio or video media from the webpage of their favorite artist or a popular online music streaming service. The visualization responds to the audio in real-time and can be enjoyed by the user(s) in a leisurely setting such as: at home, a bar/restaurant/lobby, or traveling with an HTML5 capable mobile device. The visualization layers can be written using complimentary web technologies such as the WebGL Canvas, where 3D objects are synchronized with the audio and mixed with Video and other web content using JavaScript.
       </p><p>The webpage can presents a graphic visualization layers such as:
author	Joe Berkovitz <joe@noteflight.com>
	Sat, 04 Aug 2012 08:45:54 -0400
changeset 101	484560a4e887
parent 100	769706d7cb99
child 102	fc5a41473884