add convolution reverb design document
authorcrogers@google.com
Tue, 13 Sep 2011 12:49:11 -0700
changeset 6 dadd018d7a34
parent 5 79fa66345278
child 7 3fc08c5eca76
add convolution reverb design document
webaudio/convolution.html
webaudio/fft-convolver.png
webaudio/reverb-convolver.png
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/webaudio/convolution.html	Tue Sep 13 12:49:11 2011 -0700
@@ -0,0 +1,184 @@
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+  <title>Convolution Architecture</title>
+
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+
+
+  <body>
+
+   <h3>Convolution Reverb</h3>
+<p>
+    A <a href="http://en.wikipedia.org/wiki/Convolution_reverb">convolution reverb</a> can be used to simulate an acoustic space with very high quality.
+      It can also be used as the basis for creating a vast number of unique and interesting special effects.  This technique is widely used
+      in modern professional audio and motion picture production, and is an excellent choice to create room effects in a game engine.
+</p>
+    <p>
+    Creating a well-optimized real-time convolution engine is one of the more challenging parts of the Web Audio API implementation.
+  When convolving an input audio stream of unknown (or theoretically infinite) length, the <a href="http://en.wikipedia.org/wiki/Overlap-add_method">overlap-add</a> approach is used, chopping the
+  input stream into pieces of length L, performing the convolution on each piece, then re-constructing the output signal by delaying each result and summing.
+</p>
+
+  <h4>Overlap-Add Convolution</h4>
+  <img src="http://upload.wikimedia.org/wikipedia/commons/7/77/Depiction_of_overlap-add_algorithm.png">
+  
+  
+  
+  <p>
+    <br>
+    Direct convolution is far too computationally expensive due to the extremely long impulse responses typically used.  Therefore an approach using
+    <a href="http://en.wikipedia.org/wiki/FFT">FFTs</a> must be used.  But naively doing a standard overlap-add FFT convolution using an FFT of size N with L=N/2, where N is chosen to be at least twice the length of the convolution kernel (zero-padding the kernel) to perform each convolution operation in the diagram above would incur a substantial input to output pipeline latency on the order of L samples.  Because of the enormous audible delay, this simple method cannot be used.  Aside from the enormous delay, the size N of the FFT could be extremely large.  For example, with an impulse response of 10 seconds at 44.1Khz, N would equal 1048576 (2^20).
+      This would take a very long time to evaluate.  Furthermore, such large FFTs are not practical due to substantial phase errors.
+
+    </p>
+
+
+    <h4>Optimizations and Tricks</h4>
+
+
+    <p>
+    There exist several clever tricks which break the impulse response into smaller pieces, performing separate convolutions, then
+    combining the results (exploiting the property of linearity).  The best ones use a divide and conquer approach using different size FFTs and a direct
+    convolution for the initial (leading) portion of the impulse response to achieve a zero-latency output.    There are additional optimizations which can be done exploiting the
+    fact that the tail of the reverb typically contains very little or no high-frequency energy.  For this part, the convolution may be done at a lower sample-rate...
+    </p>
+
+    <p>
+    Performance can be quite good, easily
+    done in real-time without creating undo stress on modern mid-range CPUs.  A multi-threaded implementation is really required if low (or zero) latency is required because of the way the buffering / processing chunking works.  Achieving good performance requires a highly optimized FFT algorithm.
+    </p>
+
+    <h4>Multi-channel convolution</h4>
+    <p>
+      It should be noted that a convolution reverb typically involves two convolution operations, with separate impulse responses for the left and right channels in 
+      the stereo case.  For 5.1 surround, at least five separate convolution operations are necessary to generate output for each of the five channels.
+    </p>
+
+    <h4>Impulse Responses</h4>
+    <p>
+      Similar to other assets such as JPEG images, WAV sound files, MP4 videos, shaders, and geometry, impulse responses can be considered as multi-media assets.  As with these other
+      assets, they require work to produce, and the high-quality ones are considered valuable.  For example, a company called Audio Ease makes a fairly expensive ($500 - $1000)
+      product called <a href="http://www.audioease.com/Pages/Altiverb/AltiverbMain.html">Altiverb</a> 
+      containing several nicely recorded impulse responses along with a convolution reverb engine.
+      </p>
+
+
+    <h3>Convolution Engine Implementation</h3>
+    <p>
+    
+    
+    <h4>FFTConvolver (short convolutions)</h4>
+
+    <p>
+    The <code>FFTConvolver</code> is able to do short convolutions with the FFT size N being at least twice as large as the
+    length of the short impulse response.  It incurs a latency of N/2 sample-frames.  Because of this latency and performance considerations,
+    it is not suitable for long convolutions.  Multiple instances of this building block can be used to perform extremely long convolutions.
+    </p>
+    
+    <img src="fft-convolver.png">
+    <br><br><br>
+
+    <h4>ReverbConvolver (long convolutions)</h4> 
+    The <code>ReverbConvolver</code> is able to perform extremely long real-time convolutions on a single audio channel.
+    It uses multiple <code>FFTConvolver</code> objects as well as an input buffer and an accumulation buffer.  Note that it's
+    possible to get a multi-threaded implementation by exploiting the parallelism.  Also note that the leading sections of the long
+    impulse response are processed in the real-time thread for minimum latency.  In theory it's possible to get zero latency if the
+    very first FFTConvolver is replaced with a DirectConvolver (not using a FFT).
+    
+
+    <br>
+    <img src="reverb-convolver.png"> 
+
+    <br>
+    <h4>Reverb Effect (with matrixing)</h4>
+    <img src="reverb-matrixing.png">
+
+    </p>
+
+<br><br>
+
+
+    <h3>Recording Impulse Responses</h3>
+    </a>
+
+
+    <img src="impulse-response.png">
+    <br><br>
+
+    The most <a href="http://pcfarina.eng.unipr.it/Public/Papers/226-AES122.pdf">modern</a>
+     and accurate way to record the impulse response of a real acoustic space is to use
+    a long exponential sine sweep.  The test-tone can be as long as 20 or 30 seconds, or longer.
+
+
+
+
+      Several recordings of the
+    test tone played through a speaker can be made with microphones placed and oriented at various positions in the room.  It's important
+    to document speaker placement/orientation, the types of microphones, their settings, placement, and orientations for each recording taken.
+
+    <p>
+    Post-processing is required for each of these recordings by performing an inverse-convolution with the test tone,
+    yielding the impulse response of the room with the corresponding microphone placement.  These impulse responses are then
+    ready to be loaded into the convolution reverb engine to re-create the sound of being in the room.
+    </p>
+
+    <h3>Tools</h3>
+    Two command-line tools have been written:
+    </p>
+     <code>generate_testtones</code> generates an exponential sine-sweep test-tone and its inverse.  Another
+    tool <code>convolve</code> was written for post-processing.  With these tools, anybody with recording equipment can record their own impulse responses.
+      To test the tools in practice, several recordings were made in a warehouse space with interesting
+    acoustics.  These were later post-processed with the command-line tools.
+    </p>
+
+    <pre>
+
+    % generate_testtones -h
+    Usage: generate_testtone
+    	[-o /Path/To/File/To/Create] Two files will be created: .tone and .inverse
+    	[-rate <sample rate>] sample rate of the generated test tones
+    	[-duration <duration>] The duration, in seconds, of the generated files
+    	[-min_freq <min_freq>] The minimum frequency, in hertz, for the sine sweep
+
+    % convolve -h
+    Usage: convolve input_file impulse_response_file output_file
+        </pre>
+
+
+
+
+    <br>
+
+
+    <h3>Recording Setup</h3>
+    <img src="recording-setup.png">
+    <br><br>
+
+    Audio Interface: Metric Halo Mobile I/O 2882 
+
+    <br><br><br><br>
+
+
+    <img src="microphones-speaker.png">
+    <br><br>
+
+    <img src="microphone.png">
+    <img src="speaker.png">
+    <br><br>
+    Microphones: AKG 414s, Speaker: Mackie HR824
+
+    <br><br><br>
+
+    <h3>The Warehouse Space</h3>
+
+    <img src="warehouse.png">
+    <br><br>
+
+    </div>
+
+
+
+
+  </body>
+  </html>
Binary file webaudio/fft-convolver.png has changed
Binary file webaudio/reverb-convolver.png has changed