Write more about audio.

author Steinar H. Gunderson <sgunderson@bigfoot.com>

Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)

committer Steinar H. Gunderson <sgunderson@bigfoot.com>

Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)
author Steinar H. Gunderson <sgunderson@bigfoot.com>
Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)
committer Steinar H. Gunderson <sgunderson@bigfoot.com>
Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)
diff --git a/audio.rst b/audio.rst

index 2aedd366252354165e8e56a858800a6301dce6da..e1b8d19100d91aa28a1b85c3822cb97a5cd192f0 100644 (file)
--- a/audio.rst
+++ b/audio.rst
@@ -45,13 +45,84 @@ can do. (In fact, simple mode constructs a multichannel setup
  behind-the-scenes and then runs the multichannel audio code.)
  
  
+Audio meters
+------------
+
+.. image:: images/level-meters.png
+
+When setting overall audio levels, there are two important goals:
+To keep a reasonable **perceived loudness**, and to **avoid clipping**.
+Both are more subtle to measure than one would initially assume,
+and there are many ways to misstep. In particular, pretty much any
+naïve way of measuring loudness will fail; human hearing is, for instance,
+much more sensitive in some frequencies than others.
+
+`EBU R128 <https://tech.ebu.ch/loudness>`_ provides solid solutions
+to both problems. It specifies a precise algorithm to calculate a
+both *momentary* loudness (over short and medium time intervals;
+Nageru uses the short measurement), and a *loudness range* over an
+arbitrary amount of time. The loudness is measured in LU (loudness
+units), which is a relative unit very much like decibels; there's
+also LUFS (loudness unit relative to full scale), which is number of
+LU compared to a given reference.
+
+EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU;
+if you keep your stream within this and don't have a huge range
+in general, it will have a reasonable loudness on most viewers'
+setups. The left meter shows the momentary loudness (over the short
+400 ms intervals), and the right meter shows the loudness range,
+with the target shown as a box. If you are within the target,
+the box turns green; otherwise, it is red. Both meters show
+1 LU as one segment, with the highest value being +9 LU
+(compared to the reference level) and the lowest being -18 LU.
+
+Even if the overall loudness is correct, one needs to avoid clipping;
+if samples go outside the allowed range, it will sound as clicking
+or popping (or if many do, as extreme distortion). However,
+just measuring the value of every single sample is not good enough;
+since the client might do its own resampling and processing,
+we also need to account for *inter-sample peaks*. Nageru, in line
+with R128 recommendations, oversamples the audio by 4x and writes
+the highest peak (in dBFS) below the left meter. Anything above
+the R128 limit of -0.1 dBFS will make the meter turn red to alert
+the operator that clipping has occurred. (In practice, this should
+rarely happen due to the limiter; see the next section.)
+
+You can click the reset (RST) button to reset all the meters, including
+the peak measurement.
+
+Finally, the very top contains a **correlation meter** measuring
+the correlation between the left and right channel, which is
+useful for checking the stereo image. It goes from -1 at the very
+left (the channels are exact opposites of each other), via 0 in
+the middle (the channels are totally uncorrelated), to +1 at
+the very right (the channels are exactly the same). All of these
+are indications of common issues:
+
+  * A correlation meter that sits at exactly zero typically means
+    either the left or the both channel (or both) is silent.
+  * A correlation meter that sits at exactly +1 typically means
+    you are sending a mono stream. This could be intentional
+    (if you e.g. have only a single microphone), but if not,
+    it could indicate either a loose connector or stereo channels
+    panned wrong.
+  * Finally, a correlation meter that sits at negative values
+    for longer periods of time indicate that one of the channels
+    is inverted (the phase is wrong), and could sound odd on
+    speaker setups. However, certain kinds of reverb or other
+    effects could also cause this, so it could be benign.
+
+A healthy stereo stream will usually have a correlation somewhere
+around 0.7–0.8, and this section is marked in green.
+
+
  The audio strip
  ---------------
  
-.. image:: images/basic-ui.png
+.. image:: images/audio-strip.png
  
-The audio strip contains the processing chain for the audio from
-start to end. Note that by default, everything is enabled;
+The audio strip contains controls for the processing chain for the audio from
+start to end, left to right. Note that by default, everything is enabled;
  if you have a premade audio mix that you are confident that you
  want 1:1 into the stream, you can start Nageru with the “--flat-audio”
  flag, that instead starts with everything disabled.
@@ -64,11 +135,43 @@ noise that is not related to the speaker's voice. (If you were
  producing music, you'd probably want it there to make room for
  music *under* it, but the you'd want it higher than the default 120 Hz.)
  
-(TODO: write more)
-
+Next comes a chain of no less than four compressors. They are
+based on the same basic structure, but have very different settings,
+and fill very different roles.
+
+The first compressor is the **gain staging**, or auto-leveler;
+it is very slow, with 500 ms attack time and 20 second release time.
+Its purpose is to set the overall level for the next compressor
+in the chain (so that it is slightly over its threshold);
+if you have a pretty consistent input signal, you can uncheck
+the “Auto” box and just set a static value manually.
+
+The second compressor is the **actual compressor**. It is much
+faster, with typical voice settings (5 ms attack, 40 ms release).
+It has the effect of making the voice sound a bit tighter,
+more level and overall better; if you have multiple things
+in the mix, it will also bring them somewhat closer together.
+(In general, a compressor gives the signal less dynamic range
+by making it quieter, which allows you to gain it more up in
+a later stage, so that it can get louder overall. It's a bit
+paradoxical if you're not used to it.)
+
+You can adjust the threshold if you wish, or disable the compressor
+altogether if your signal is already mastered. Note that if the
+gain staging is not set so that this compressor gets an input signal
+that's loud enough, it won't do anything to it.
+
+At this point, the mastering section begins; for simple audio,
+the distinction won't matter, but for multichannel, the previous
+effects are separate per-bus and the remaining are applied
+after the mix. (More on this below.) The mastering section begins
+with a **limiter**, basically a compressor with very high ratio.
+It's there as an emergency brake for really loud compressors
+that got through the other compressors—a classic example is a
+speaker suddenly coughing, or a very loud bass drum. This prevents
+both clipping and blowing out the speakers' ears.
  
-Audio meters
-------------
+(TODO: write more)
  
  
  Multichannel mode
author	Steinar H. Gunderson <sgunderson@bigfoot.com>
	Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)
committer	Steinar H. Gunderson <sgunderson@bigfoot.com>
	Wed, 9 Nov 2016 00:58:55 +0000 (01:58 +0100)