git.sesse.net Git - nageru-docs/blob - audio.rst

   1 Audio
   2 =====
   3
   4 Audio is the most important part of video. It is also the most
   5 neglected part in most amateur productions; it is easy to care
   6 about full-HD productions but never remember to give the speaker
   7 a microphone. Your stream can live with blurry or murky pictures,
   8 but it cannot live with people not hearing what's being said.
   9
  10 Nageru aims to give the operator meaningful, useful controls for
  11 processing and mixing audio, with a focus on voice. There are two
  12 modes for audio processing, namely *simple* and *multichannel*;
  13 they are selectable from the audio menu.
  14
  15 Be aware that a mix that sounds good on a PA system will not
  16 necessarily sound good on a stream; PA systems often have rather
  17 different audio characteristics than a set of home speakers or
  18 headphones, and there will also frequently be other sounds in the
  19 room that remove some of the typical “dryness”. However, for simple
  20 use, reusing such a mix isn't the worst choice you can make.
  21
  22
  23 Simple mode
  24 -----------
  25
  26 **Simple** audio mode is the default, and was the only mode available
  27 up until Nageru 1.4.0. Despite its name, it contains a powerful
  28 audio processing chain; however, in many cases, you won't need to
  29 understand or twiddle any of the knobs availale.
  30
  31 Simple mode allows input from only a single source, and that source
  32 has to be one of the capture cards. (You choose which one by right-clicking
  33 on its channel and selecting it as audio source.) The two typical
  34 cases where this is useful are:
  35
  36   * When you simply take in audio from one of the cameras,
  37     possibly by way of external microphone, or
  38   * When you have an external mixer and can embed its output
  39     in one of the video inputs.
  40
  41 If you want more than one audio source at a time, or if you want
  42 to use ALSA inputs, you will need to use multichannel mode; it is
  43 more complicated, but it is a strict superset of what the simple mode
  44 can do. (In fact, simple mode constructs a multichannel setup
  45 behind-the-scenes and then runs the multichannel audio code.)
  46
  47
  48 .. _audio-meters:
  49
  50 Audio meters
  51 ------------
  52
  53 .. image:: images/level-meters.png
  54
  55 When setting overall audio levels, there are two important goals:
  56 To keep a reasonable **perceived loudness**, and to **avoid clipping**.
  57 Both are more subtle to measure than one would initially assume,
  58 and there are many ways to misstep. In particular, pretty much any
  59 naïve way of measuring loudness will fail; human hearing is, for instance,
  60 much more sensitive in some frequencies than others.
  61
  62 `EBU R128 <https://tech.ebu.ch/loudness>`_ provides solid solutions
  63 to both problems. It specifies a precise algorithm to calculate a
  64 both *momentary* loudness (over short and medium time intervals;
  65 Nageru uses the short measurement), and a *loudness range* over an
  66 arbitrary amount of time. The loudness is measured in LU (loudness
  67 units), which is a relative unit very much like decibels; there's
  68 also LUFS (loudness unit relative to full scale), which is number of
  69 LU compared to a given reference.
  70
  71 EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU;
  72 if you keep your stream within this and don't have a huge range
  73 in general, it will have a reasonable loudness on most viewers'
  74 setups. The left meter shows the momentary loudness (over the short
  75 400 ms intervals), and the right meter shows the loudness range,
  76 with the target shown as a box. If you are within the target,
  77 the box turns green; otherwise, it is red. Both meters show
  78 1 LU as one segment, with the highest value being +9 LU
  79 (compared to the reference level) and the lowest being -18 LU.
  80
  81 Even if the overall loudness is correct, one needs to avoid clipping;
  82 if samples go outside the allowed range, it will sound as clicking
  83 or popping (or if many do, as extreme distortion). However,
  84 just measuring the value of every single sample is not good enough;
  85 since the client might do its own resampling and processing,
  86 we also need to account for *inter-sample peaks*. Nageru, in line
  87 with R128 recommendations, oversamples the audio by 4x and writes
  88 the highest peak (in dBFS) below the left meter. Anything above
  89 the R128 limit of -0.1 dBFS will make the meter turn red to alert
  90 the operator that clipping has occurred. (In practice, this should
  91 rarely happen due to the limiter; see the next section.)
  92
  93 You can click the reset (RST) button to reset all the meters, including
  94 the peak measurement.
  95
  96 Finally, the very top contains a **correlation meter** measuring
  97 the correlation between the left and right channel, which is
  98 useful for checking the stereo image. It goes from -1 at the very
  99 left (the channels are exact opposites of each other), via 0 in
 100 the middle (the channels are totally uncorrelated), to +1 at
 101 the very right (the channels are exactly the same). All of these
 102 are indications of common issues:
 103
 104   * A correlation meter that sits at exactly zero typically means
 105     either the left or the both channel (or both) is silent.
 106   * A correlation meter that sits at exactly +1 typically means
 107     you are sending a mono stream. This could be intentional
 108     (if you e.g. have only a single microphone), but if not,
 109     it could indicate either a loose connector or stereo channels
 110     panned wrong.
 111   * Finally, a correlation meter that sits at negative values
 112     for longer periods of time indicate that one of the channels
 113     is inverted (the phase is wrong), and could sound odd on
 114     speaker setups. However, certain kinds of reverb or other
 115     effects could also cause this, so it could be benign.
 116
 117 A healthy stereo stream will usually have a correlation somewhere
 118 around 0.7–0.8, and this section is marked in green.
 119
 120
 121 The audio strip
 122 ---------------
 123
 124 .. image:: images/audio-strip.png
 125
 126 The audio strip contains controls for the processing chain for the audio from
 127 start to end, left to right. Note that by default, everything is enabled;
 128 if you have a premade audio mix that you are confident that you
 129 want 1:1 into the stream, you can start Nageru with the “--flat-audio”
 130 flag, that instead starts with everything disabled.
 131
 132 The first step in the pipeline is a **lo-cut** (or equivalently,
 133 highpass) filter. The exact cutoff frequency is a bit a matter
 134 of taste (and also depends on the speaker), but the main point
 135 is that it gets rid of low-frequency hum and a lot of the background
 136 noise that is not related to the speaker's voice. (If you were
 137 producing music, you'd probably want it there to make room for
 138 music *under* it, but the you'd want it higher than the default 120 Hz.)
 139
 140 Next comes a chain of no less than four compressors. They are
 141 based on the same basic structure, but have very different settings,
 142 and fill very different roles.
 143
 144 The first compressor is the **gain staging**, or auto-leveler;
 145 it is very slow, with 500 ms attack time and 20 second release time.
 146 Its purpose is to set the overall level for the next compressor
 147 in the chain (so that it is slightly over its threshold);
 148 if you have a pretty consistent input signal, you can uncheck
 149 the “Auto” box and just set a static value manually.
 150
 151 The second compressor is the **actual compressor**. It is much
 152 faster, with typical voice settings (5 ms attack, 40 ms release).
 153 It has the effect of making the voice sound a bit tighter,
 154 more level and overall better; if you have multiple things
 155 in the mix, it will also bring them somewhat closer together.
 156 (In general, a compressor gives the signal less dynamic range
 157 by making it quieter, which allows you to gain it more up in
 158 a later stage, so that it can get louder overall. It's a bit
 159 paradoxical if you're not used to it.)
 160
 161 You can adjust the threshold if you wish, or disable the compressor
 162 altogether if your signal is already mastered. Note that if the
 163 gain staging is not set so that this compressor gets an input signal
 164 that's loud enough, it won't do anything to it.
 165
 166 At this point, the mastering section begins; for simple audio,
 167 the distinction won't matter, but for multichannel, the previous
 168 effects are separate per-bus and the remaining are applied
 169 after the mix. (More on this below.) The mastering section begins
 170 with a **limiter**, basically a compressor with very high ratio.
 171 It's there as an emergency brake for really loud compressors
 172 that got through the other compressors—a classic example is a
 173 speaker suddenly coughing, or a very loud bass drum. This prevents
 174 both clipping and blowing out the speakers' ears.
 175
 176 At this point, the audio signal is *almost* where we'd like it
 177 to be, but the overall sound level might not be quite right.
 178 All the previous compressors have been working in the objective
 179 domain, but as explained in the :ref:`previous section <audio-meters>`,
 180 this does not necessarily correspond to the desired overall
 181 audio loudness. (Their default levels have been calibrated so
 182 that they end up around 0 LU for typical speech content,
 183 but they could easily miss by a few LU in many cases.)
 184
 185 Thus, there's a final **makeup gain** at the end to compensate
 186 for these issues. When the “Auto” checkbox is ticked, which is
 187 by default, it will very slowly (filter constant of 30 seconds)
 188 adjust itself so that the overall level goes toward 0 LU,
 189 ie., the reference level. It is so slow because the R128 calculations
 190 inherently must go over a certain amount of time (what we want
 191 to change with this gain is the *overall* sound level,
 192 not the *immediate* one). In periods where the makeup gain is
 193 far off, such as when the stream is all silent, it doesn't update
 194 at all. As with the other knobs, you can uncheck the “Auto”
 195 checkbox and tune this yourself if you want to.
 196
 197
 198 Multichannel mode
 199 -----------------
 200
 201 **Multichannel mode** expands on simple audio mode by allowing you
 202 to have multiple *buses* of audio. (In a sense, it could more accurately
 203 be called “multibus mode” instead, but the name would be too confusing.)
 204 A bus in Nageru is a pair of channels (left/right), sourced from
 205 a video capture or ALSA card. The channel mapping is flexible; my USB
 206 sound card has 18 channels, for instance, and you can use that to make
 207 several buses. Each bus has a name (for instance, something like
 208 “Blue microphone” or “Speaker PC”), which is just for convenience;
 209 Nageru doesn't care what you write here, but the labels are useful
 210 for the operator.
 211
 212 Input mappings
 213 ''''''''''''''
 214
 215 .. image:: images/input-mapping.png
 216
 217 The input mapping dialog should be pretty much self-explanatory;
 218 you can use the + button to add a new bus, and the - button to remove
 219 the currently selected one (you select by clicking on it). The up and
 220 down buttons rearrange the order by moving the currently selected bus
 221 up or down, if possible.
 222
 223 Because mappings can be tedious to setup, you wouldn't want to set up
 224 a complicated one every time you started Nageru. Therefore, mappings
 225 can be saved and loaded from disk; the stored file is a
 226 `protocol buffer <https://developers.google.com/protocol-buffers/>`_
 227 in textual format. You can also load one at start with the
 228 “--input-mapping” parameter, which also implies multichannel mode
 229 (--multichannel).
 230
 231 Nageru strives to keep the mapping consistent even
 232 in the face of a changed environment—for instance, if you unplug and
 233 replug a USB sound card, Nageru will attempt to keep your buses mapped to
 234 that card still mapped. (While the card unplugged, the main display will show
 235 the relevant buses as “(disconnected)”.) Similarly, if an ALSA device
 236 is taken by another program on startup and cannot be accessed by Nageru,
 237 it will mark it as “(busy)” and try again in the background. However,
 238 there are edge cases where Nageru simply cannot do the right thing,
 239 for instance if you unplug two identical cards and plug them back
 240 in the reverse order; USB cards don't carry any kind of serial number
 241 or other forms of unique identification.
 242
 243
 244 The audio views
 245 '''''''''''''''
 246
 247 .. image:: images/audio-view-selector.png
 248
 249 Once multichannel mode is active, a little selector shows up to the right,
 250 just below the level meters. The arrows (or equivalently, the PgUp/PgDown
 251 keys on the keyboard) allow you to select between two views:
 252
 253   * In the **compact audio view** (which is the default), each bus is
 254     represented only by its label, its peak meter (see below) and its
 255     fader. This takes up little screen estate, and allows the video channels
 256     to be visible. This is the typical view you'd use once you've set up
 257     everything and are actually doing live video editing; the controls
 258     from the full audio view are still in effect, but you cannot see or
 259     interact with them.
 260
 261   * The **full audio view** contains a lot more controls, but leaves no
 262     room for the video channels. These are useful when you are doing initial
 263     setup of your mix, or if you want to go back and tune something.
 264     The full audio view will be described in detail in the following section;
 265     the interpretation of the corresponding controls in the compact audio view
 266     is the same.
 267
 268 .. image:: images/audio-bus-controls.png
 269
 270 There's one set each of these controls for every bus.
 271
 272 (TODO: write more)
 273
 274
 275 MIDI control
 276 ------------