Write more about audio.

[nageru-docs] / audio.rst
diff --git a/audio.rst b/audio.rst

index 050f8aae19bbfbd81c2ecd53c7d359f1ae318c16..65fe9af432a20b130fd5ba81901eeebd87051720 100644 (file)
--- a/audio.rst
+++ b/audio.rst
@@ -1,2 +1,276 @@
-Audio (not written yet)
-=======================
+Audio
+=====
+
+Audio is the most important part of video. It is also the most
+neglected part in most amateur productions; it is easy to care
+about full-HD productions but never remember to give the speaker
+a microphone. Your stream can live with blurry or murky pictures,
+but it cannot live with people not hearing what's being said.
+
+Nageru aims to give the operator meaningful, useful controls for
+processing and mixing audio, with a focus on voice. There are two
+modes for audio processing, namely *simple* and *multichannel*;
+they are selectable from the audio menu.
+
+Be aware that a mix that sounds good on a PA system will not
+necessarily sound good on a stream; PA systems often have rather
+different audio characteristics than a set of home speakers or
+headphones, and there will also frequently be other sounds in the
+room that remove some of the typical “dryness”. However, for simple
+use, reusing such a mix isn't the worst choice you can make.
+
+
+Simple mode
+-----------
+
+**Simple** audio mode is the default, and was the only mode available
+up until Nageru 1.4.0. Despite its name, it contains a powerful
+audio processing chain; however, in many cases, you won't need to
+understand or twiddle any of the knobs availale.
+
+Simple mode allows input from only a single source, and that source
+has to be one of the capture cards. (You choose which one by right-clicking
+on its channel and selecting it as audio source.) The two typical
+cases where this is useful are:
+
+  * When you simply take in audio from one of the cameras,
+    possibly by way of external microphone, or
+  * When you have an external mixer and can embed its output
+    in one of the video inputs.
+
+If you want more than one audio source at a time, or if you want
+to use ALSA inputs, you will need to use multichannel mode; it is
+more complicated, but it is a strict superset of what the simple mode
+can do. (In fact, simple mode constructs a multichannel setup
+behind-the-scenes and then runs the multichannel audio code.)
+
+
+.. _audio-meters:
+
+Audio meters
+------------
+
+.. image:: images/level-meters.png
+
+When setting overall audio levels, there are two important goals:
+To keep a reasonable **perceived loudness**, and to **avoid clipping**.
+Both are more subtle to measure than one would initially assume,
+and there are many ways to misstep. In particular, pretty much any
+naïve way of measuring loudness will fail; human hearing is, for instance,
+much more sensitive in some frequencies than others.
+
+`EBU R128 <https://tech.ebu.ch/loudness>`_ provides solid solutions
+to both problems. It specifies a precise algorithm to calculate a
+both *momentary* loudness (over short and medium time intervals;
+Nageru uses the short measurement), and a *loudness range* over an
+arbitrary amount of time. The loudness is measured in LU (loudness
+units), which is a relative unit very much like decibels; there's
+also LUFS (loudness unit relative to full scale), which is number of
+LU compared to a given reference.
+
+EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU;
+if you keep your stream within this and don't have a huge range
+in general, it will have a reasonable loudness on most viewers'
+setups. The left meter shows the momentary loudness (over the short
+400 ms intervals), and the right meter shows the loudness range,
+with the target shown as a box. If you are within the target,
+the box turns green; otherwise, it is red. Both meters show
+1 LU as one segment, with the highest value being +9 LU
+(compared to the reference level) and the lowest being -18 LU.
+
+Even if the overall loudness is correct, one needs to avoid clipping;
+if samples go outside the allowed range, it will sound as clicking
+or popping (or if many do, as extreme distortion). However,
+just measuring the value of every single sample is not good enough;
+since the client might do its own resampling and processing,
+we also need to account for *inter-sample peaks*. Nageru, in line
+with R128 recommendations, oversamples the audio by 4x and writes
+the highest peak (in dBFS) below the left meter. Anything above
+the R128 limit of -0.1 dBFS will make the meter turn red to alert
+the operator that clipping has occurred. (In practice, this should
+rarely happen due to the limiter; see the next section.)
+
+You can click the reset (RST) button to reset all the meters, including
+the peak measurement.
+
+Finally, the very top contains a **correlation meter** measuring
+the correlation between the left and right channel, which is
+useful for checking the stereo image. It goes from -1 at the very
+left (the channels are exact opposites of each other), via 0 in
+the middle (the channels are totally uncorrelated), to +1 at
+the very right (the channels are exactly the same). All of these
+are indications of common issues:
+
+  * A correlation meter that sits at exactly zero typically means
+    either the left or the both channel (or both) is silent.
+  * A correlation meter that sits at exactly +1 typically means
+    you are sending a mono stream. This could be intentional
+    (if you e.g. have only a single microphone), but if not,
+    it could indicate either a loose connector or stereo channels
+    panned wrong.
+  * Finally, a correlation meter that sits at negative values
+    for longer periods of time indicate that one of the channels
+    is inverted (the phase is wrong), and could sound odd on
+    speaker setups. However, certain kinds of reverb or other
+    effects could also cause this, so it could be benign.
+
+A healthy stereo stream will usually have a correlation somewhere
+around 0.7–0.8, and this section is marked in green.
+
+
+The audio strip
+---------------
+
+.. image:: images/audio-strip.png
+
+The audio strip contains controls for the processing chain for the audio from
+start to end, left to right. Note that by default, everything is enabled;
+if you have a premade audio mix that you are confident that you
+want 1:1 into the stream, you can start Nageru with the “--flat-audio”
+flag, that instead starts with everything disabled.
+
+The first step in the pipeline is a **lo-cut** (or equivalently,
+highpass) filter. The exact cutoff frequency is a bit a matter
+of taste (and also depends on the speaker), but the main point
+is that it gets rid of low-frequency hum and a lot of the background
+noise that is not related to the speaker's voice. (If you were
+producing music, you'd probably want it there to make room for
+music *under* it, but the you'd want it higher than the default 120 Hz.)
+
+Next comes a chain of no less than four compressors. They are
+based on the same basic structure, but have very different settings,
+and fill very different roles.
+
+The first compressor is the **gain staging**, or auto-leveler;
+it is very slow, with 500 ms attack time and 20 second release time.
+Its purpose is to set the overall level for the next compressor
+in the chain (so that it is slightly over its threshold);
+if you have a pretty consistent input signal, you can uncheck
+the “Auto” box and just set a static value manually.
+
+The second compressor is the **actual compressor**. It is much
+faster, with typical voice settings (5 ms attack, 40 ms release).
+It has the effect of making the voice sound a bit tighter,
+more level and overall better; if you have multiple things
+in the mix, it will also bring them somewhat closer together.
+(In general, a compressor gives the signal less dynamic range
+by making it quieter, which allows you to gain it more up in
+a later stage, so that it can get louder overall. It's a bit
+paradoxical if you're not used to it.)
+
+You can adjust the threshold if you wish, or disable the compressor
+altogether if your signal is already mastered. Note that if the
+gain staging is not set so that this compressor gets an input signal
+that's loud enough, it won't do anything to it.
+
+At this point, the mastering section begins; for simple audio,
+the distinction won't matter, but for multichannel, the previous
+effects are separate per-bus and the remaining are applied
+after the mix. (More on this below.) The mastering section begins
+with a **limiter**, basically a compressor with very high ratio.
+It's there as an emergency brake for really loud compressors
+that got through the other compressors—a classic example is a
+speaker suddenly coughing, or a very loud bass drum. This prevents
+both clipping and blowing out the speakers' ears.
+
+At this point, the audio signal is *almost* where we'd like it
+to be, but the overall sound level might not be quite right.
+All the previous compressors have been working in the objective
+domain, but as explained in the :ref:`previous section <audio-meters>`,
+this does not necessarily correspond to the desired overall
+audio loudness. (Their default levels have been calibrated so
+that they end up around 0 LU for typical speech content,
+but they could easily miss by a few LU in many cases.)
+
+Thus, there's a final **makeup gain** at the end to compensate
+for these issues. When the “Auto” checkbox is ticked, which is
+by default, it will very slowly (filter constant of 30 seconds)
+adjust itself so that the overall level goes toward 0 LU,
+ie., the reference level. It is so slow because the R128 calculations
+inherently must go over a certain amount of time (what we want
+to change with this gain is the *overall* sound level,
+not the *immediate* one). In periods where the makeup gain is
+far off, such as when the stream is all silent, it doesn't update
+at all. As with the other knobs, you can uncheck the “Auto”
+checkbox and tune this yourself if you want to.
+
+
+Multichannel mode
+-----------------
+
+**Multichannel mode** expands on simple audio mode by allowing you
+to have multiple *buses* of audio. (In a sense, it could more accurately
+be called “multibus mode” instead, but the name would be too confusing.)
+A bus in Nageru is a pair of channels (left/right), sourced from
+a video capture or ALSA card. The channel mapping is flexible; my USB
+sound card has 18 channels, for instance, and you can use that to make
+several buses. Each bus has a name (for instance, something like
+“Blue microphone” or “Speaker PC”), which is just for convenience;
+Nageru doesn't care what you write here, but the labels are useful
+for the operator.
+
+Input mappings
+''''''''''''''
+
+.. image:: images/input-mapping.png
+
+The input mapping dialog should be pretty much self-explanatory;
+you can use the + button to add a new bus, and the - button to remove
+the currently selected one (you select by clicking on it). The up and
+down buttons rearrange the order by moving the currently selected bus
+up or down, if possible.
+
+Because mappings can be tedious to setup, you wouldn't want to set up
+a complicated one every time you started Nageru. Therefore, mappings
+can be saved and loaded from disk; the stored file is a
+`protocol buffer <https://developers.google.com/protocol-buffers/>`_
+in textual format. You can also load one at start with the
+“--input-mapping” parameter, which also implies multichannel mode
+(--multichannel).
+
+Nageru strives to keep the mapping consistent even
+in the face of a changed environment—for instance, if you unplug and
+replug a USB sound card, Nageru will attempt to keep your buses mapped to
+that card still mapped. (While the card unplugged, the main display will show
+the relevant buses as “(disconnected)”.) Similarly, if an ALSA device
+is taken by another program on startup and cannot be accessed by Nageru,
+it will mark it as “(busy)” and try again in the background. However,
+there are edge cases where Nageru simply cannot do the right thing,
+for instance if you unplug two identical cards and plug them back
+in the reverse order; USB cards don't carry any kind of serial number
+or other forms of unique identification.
+
+
+The audio views
+'''''''''''''''
+
+.. image:: images/audio-view-selector.png
+
+Once multichannel mode is active, a little selector shows up to the right,
+just below the level meters. The arrows (or equivalently, the PgUp/PgDown
+keys on the keyboard) allow you to select between two views:
+
+  * In the **compact audio view** (which is the default), each bus is
+    represented only by its label, its peak meter (see below) and its
+    fader. This takes up little screen estate, and allows the video channels
+    to be visible. This is the typical view you'd use once you've set up
+    everything and are actually doing live video editing; the controls
+    from the full audio view are still in effect, but you cannot see or
+    interact with them.
+
+  * The **full audio view** contains a lot more controls, but leaves no
+    room for the video channels. These are useful when you are doing initial
+    setup of your mix, or if you want to go back and tune something.
+    The full audio view will be described in detail in the following section;
+    the interpretation of the corresponding controls in the compact audio view
+    is the same.
+
+.. image:: images/audio-bus-controls.png
+
+There's one set each of these controls for every bus.
+
+(TODO: write more)
+
+
+MIDI control
+------------