git.sesse.net Git - nageru-docs/blob - audio.rst

   1 Audio
   2 =====
   3
   4 Audio is the most important part of video. It is also the most
   5 neglected part in most amateur productions; it is easy to care
   6 about full-HD productions but never remember to give the speaker
   7 a microphone. Your stream can live with blurry or murky pictures,
   8 but it cannot live with people not hearing what's being said.
   9
  10 Nageru aims to give the operator meaningful, useful controls for
  11 processing and mixing audio, with a focus on voice. There are two
  12 modes for audio processing, namely *simple* and *multichannel*;
  13 they are selectable from the audio menu.
  14
  15 Be aware that a mix that sounds good on a PA system will not
  16 necessarily sound good on a stream; PA systems often have rather
  17 different audio characteristics than a set of home speakers or
  18 headphones, and there will also frequently be other sounds in the
  19 room that remove some of the typical “dryness”. However, for simple
  20 use, reusing such a mix isn't the worst choice you can make.
  21
  22
  23 Simple mode
  24 -----------
  25
  26 **Simple** audio mode is the default. Despite its name, it contains a powerful
  27 audio processing chain; however, in many cases, you won't need to
  28 understand or twiddle any of the knobs available.
  29
  30 Simple mode allows input from only a single source, and that source
  31 has to be one of the capture cards. (You choose which one by right-clicking
  32 on its channel and selecting it as audio source.) The two typical
  33 cases where this is useful are:
  34
  35   * When you simply take in audio from one of the cameras,
  36     possibly by way of external microphone, or
  37   * When you have an external mixer and can embed its output
  38     in one of the video inputs.
  39
  40 If you want more than one audio source at a time, or if you want
  41 to use ALSA inputs, you will need to use multichannel mode; it is
  42 more complicated, but it is a strict superset of what the simple mode
  43 can do. (In fact, simple mode constructs a multichannel setup
  44 behind-the-scenes and then runs the multichannel audio code.)
  45
  46
  47 .. _audio-meters:
  48
  49 Audio meters
  50 ------------
  51
  52 .. image:: images/level-meters.png
  53
  54 When setting overall audio levels, there are two important goals:
  55 To keep a reasonable **perceived loudness**, and to **avoid clipping**.
  56 Both are more subtle to measure than one would initially assume,
  57 and there are many ways to misstep. In particular, pretty much any
  58 naïve way of measuring loudness will fail; human hearing is, for instance,
  59 much more sensitive in some frequencies than others.
  60
  61 `EBU R128 <https://tech.ebu.ch/loudness>`_ provides solid solutions
  62 to both problems. It specifies a precise algorithm to calculate a
  63 both *momentary* loudness (over short and medium time intervals;
  64 Nageru uses the short measurement), and a *loudness range* over an
  65 arbitrary amount of time. The loudness is measured in LU (loudness
  66 units), which is a relative unit very much like decibels; there's
  67 also LUFS (loudness unit relative to full scale), which is number of
  68 LU compared to a given reference.
  69
  70 EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU;
  71 if you keep your stream within this and don't have a huge range
  72 in general, it will have a reasonable loudness on most viewers'
  73 setups. The left meter shows the momentary loudness (over the short
  74 400 ms intervals), and the right meter shows the loudness range,
  75 with the target shown as a box. If you are within the target,
  76 the box turns green; otherwise, it is red. Both meters show
  77 1 LU as one segment, with the highest value being +9 LU
  78 (compared to the reference level) and the lowest being -18 LU.
  79
  80 Even if the overall loudness is correct, one needs to avoid clipping;
  81 if samples go outside the allowed range, it will sound as clicking
  82 or popping (or if many do, as extreme distortion). However,
  83 just measuring the value of every single sample is not good enough;
  84 since the client might do its own resampling and processing,
  85 we also need to account for *inter-sample peaks*. Nageru, in line
  86 with R128 recommendations, oversamples the audio by 4x and writes
  87 the highest peak (in dBFS) below the left meter. Anything above
  88 the R128 limit of -0.1 dBFS will make the meter turn red to alert
  89 the operator that clipping has occurred. (In practice, this should
  90 rarely happen due to the limiter; see the next section.)
  91
  92 You can click the reset (RST) button to reset all the meters, including
  93 the peak measurement.
  94
  95 Finally, the very top contains a **correlation meter** measuring
  96 the correlation between the left and right channel, which is
  97 useful for checking the stereo image. It goes from -1 at the very
  98 left (the channels are exact opposites of each other), via 0 in
  99 the middle (the channels are totally uncorrelated), to +1 at
 100 the very right (the channels are exactly the same). All of these
 101 are indications of common issues:
 102
 103   * A correlation meter that sits at exactly zero typically means
 104     either the left or the both channel (or both) is silent.
 105   * A correlation meter that sits at exactly +1 typically means
 106     you are sending a mono stream. This could be intentional
 107     (if you e.g. have only a single microphone), but if not,
 108     it could indicate either a loose connector or stereo channels
 109     panned wrong.
 110   * Finally, a correlation meter that sits at negative values
 111     for longer periods of time indicate that one of the channels
 112     is inverted (the phase is wrong), and could sound odd on
 113     speaker setups. However, certain kinds of reverb or other
 114     effects could also cause this, so it could be benign.
 115
 116 A healthy stereo stream will usually have a correlation somewhere
 117 around 0.7–0.8, and this section is marked in green.
 118
 119 .. _audio-strip:
 120
 121 The audio strip
 122 ---------------
 123
 124 .. image:: images/audio-strip.png
 125
 126 The audio strip contains controls for the processing chain for the audio from
 127 start to end, left to right. Note that by default, everything is enabled;
 128 if you have a pre-made audio mix that you are confident that you
 129 want 1:1 into the stream, you can start Nageru with the “--flat-audio”
 130 flag, that instead starts with everything disabled.
 131
 132 The first step in the pipeline is a **lo-cut** (or equivalently,
 133 highpass) filter. The exact cutoff frequency is a bit a matter
 134 of taste (and also depends on the speaker), but the main point
 135 is that it gets rid of low-frequency hum and a lot of the background
 136 noise that is not related to the speaker's voice. (If you were
 137 producing music, you'd probably want it there to make room for
 138 music *under* it, but then you'd want it higher than the default 120 Hz.)
 139
 140 Next comes a chain of no less than four compressors. They are
 141 based on the same basic structure, but have very different settings,
 142 and fill very different roles.
 143
 144 The first compressor is the **gain staging**, or auto-leveler;
 145 it is very slow, with 500 ms attack time and 20 second release time.
 146 Its purpose is to set the overall level for the next compressor
 147 in the chain (so that it is slightly over its threshold);
 148 if you have a pretty consistent input signal, you can uncheck
 149 the “Auto” box and just set a static value manually.
 150
 151 The second compressor is the **actual compressor**. It is much
 152 faster, with typical voice settings (5 ms attack, 40 ms release).
 153 It has the effect of making the voice sound a bit tighter,
 154 more level and overall better; if you have multiple things
 155 in the mix, it will also bring them somewhat closer together.
 156 (In general, a compressor gives the signal less dynamic range
 157 by making it quieter, which allows you to gain it more up in
 158 a later stage, so that it can get louder overall. It's a bit
 159 paradoxical if you're not used to it.)
 160
 161 You can adjust the threshold if you wish, or disable the compressor
 162 altogether if your signal is already mastered. Note that if the
 163 gain staging is not set so that this compressor gets an input signal
 164 that's loud enough, it won't do anything to it.
 165
 166 At this point, the mastering section begins; for simple audio,
 167 the distinction won't matter, but for multichannel, the previous
 168 effects are separate per-bus and the remaining are applied
 169 after the mix. (More on this below.) The mastering section begins
 170 with a **limiter**, basically a compressor with very high ratio.
 171 It's there as an emergency brake for really loud sounds
 172 that got through the other compressors—a classic example is a
 173 speaker suddenly coughing, or a very loud bass drum. This prevents
 174 both clipping and blowing out the speakers' ears.
 175
 176 At this point, the audio signal is *almost* where we'd like it
 177 to be, but the overall sound level might not be quite right.
 178 All the previous compressors have been working in the objective
 179 domain, but as explained in the :ref:`previous section <audio-meters>`,
 180 this does not necessarily correspond to the desired overall
 181 audio loudness. (Their default levels have been calibrated so
 182 that they end up around 0 LU for typical speech content,
 183 but they could easily miss by a few LU in many cases.)
 184
 185 Thus, there's a final **makeup gain** at the end to compensate
 186 for these issues. When the “Auto” checkbox is ticked, which is
 187 by default, it will very slowly (filter constant of 30 seconds)
 188 adjust itself so that the overall level goes toward 0 LU,
 189 ie., the reference level. It is so slow because the R128 calculations
 190 inherently must go over a certain amount of time (what we want
 191 to change with this gain is the *overall* sound level,
 192 not the *immediate* one). In periods where the makeup gain is
 193 far off, such as when the stream is all silent, it doesn't update
 194 at all. As with the other knobs, you can uncheck the “Auto”
 195 checkbox and tune this yourself if you want to.
 196
 197
 198 Multichannel mode
 199 -----------------
 200
 201 **Multichannel mode** expands on simple audio mode by allowing you
 202 to have multiple *buses* of audio. (In a sense, it could more accurately
 203 be called “multibus mode” instead, but the name would be too confusing.)
 204 A bus in Nageru is a pair of channels (left/right), sourced from
 205 a video capture or ALSA card. The channel mapping is flexible; my USB
 206 sound card has 18 channels, for instance, and you can use that to make
 207 several buses. Each bus has a name (for instance, something like
 208 “Blue microphone” or “Speaker PC”), which is just for convenience;
 209 Nageru doesn't care what you write here, but the labels are useful
 210 for the operator.
 211
 212
 213 Input mappings
 214 ''''''''''''''
 215
 216 .. image:: images/input-mapping.png
 217
 218 The input mapping dialog should be pretty much self-explanatory;
 219 you can use the + button to add a new bus, and the - button to remove
 220 the currently selected one (you select by clicking on it). The up and
 221 down buttons rearrange the order by moving the currently selected bus
 222 up or down, if possible. Note that you can create a mono bus by
 223 assigning the same input channel to the left and right inputs.
 224
 225 Because mappings can be tedious to setup, you wouldn't want to set up
 226 a complicated one every time you started Nageru. Therefore, mappings
 227 can be saved and loaded from disk; the stored file is a
 228 `protocol buffer <https://developers.google.com/protocol-buffers/>`_
 229 in textual format. You can also load one at start with the
 230 “--input-mapping” parameter, which also implies multichannel mode
 231 (--multichannel).
 232
 233 Nageru strives to keep the mapping consistent even
 234 in the face of a changed environment—for instance, if you unplug and
 235 replug a USB sound card, Nageru will attempt to keep your buses mapped to
 236 that card still mapped. (While the card unplugged, the main display will show
 237 the relevant buses as “(disconnected)”.) Similarly, if an ALSA device
 238 is taken by another program on startup and cannot be accessed by Nageru,
 239 it will mark it as “(busy)” and try again in the background. However,
 240 there are edge cases where Nageru simply cannot do the right thing,
 241 for instance if you unplug two identical cards and plug them back
 242 in the reverse order; USB cards don't carry any kind of serial number
 243 or other forms of unique identification.
 244
 245 .. _audio-views:
 246
 247 The audio views
 248 '''''''''''''''
 249
 250 .. image:: images/audio-view-selector.png
 251
 252 Once multichannel mode is active, the audio view selector (up to the right,
 253 just below the level meters) gains a third option. The arrows (or equivalently, the PgUp/PgDown
 254 keys on the keyboard) allow you to select between those three views:
 255
 256   * In the **compact audio view** (which is the default), each bus is
 257     represented only by its label, its peak meter (see below) and its
 258     fader. This takes up little screen estate, and allows the video channels
 259     to be visible. This is the typical view you'd use once you've set up
 260     everything and are actually doing live video editing; the controls
 261     from the full audio view are still in effect, but you cannot see or
 262     interact with them.
 263
 264   * The **video grid display**  does not have any audio controls,
 265     but tries to use as much screen estate as possible on the video channels
 266     only. In particular, it can put the channels in multiple rows if that
 267     facilitates larger previews, which can be useful if you have many channels.
 268
 269   * The **full audio view** (only available in multichannel mode) contains a lot more controls, but leaves no
 270     room for the video channels. These are useful when you are doing initial
 271     setup of your mix, or if you want to go back and tune something.
 272     The full audio view will be described in detail in the following section;
 273     the interpretation of the corresponding controls in the compact audio view
 274     is the same.
 275
 276 .. image:: images/audio-bus-controls.png
 277
 278 There's one set each of these controls for every bus. The most
 279 important parts of the mix are given the most screen estate,
 280 so even though the way through the signal chain is left-to-right
 281 top-to-bottom, we'll go over it in the opposite direction.
 282
 283 By far the most important part is the audio level, so the **fader** naturally is
 284 very prominent. (Note that the scale is nonlinear; you want more resolution
 285 in the most important area.) Changing a fader with the mouse or keyboard is
 286 possible, and probably most people will be doing that, but Nageru also
 287 supports USB faders (see :ref:`midi-control`). There's a mute button
 288 if you just want to silence a bus temporarily; it has exactly the same
 289 effect as pulling the fader all the way down, ie., it will make the bus
 290 go all silent.
 291
 292 Then there's the **peak meter** to the left of that. For each bus, unlike
 293 for the meters used for mastering (see :ref:`audio-meters`),
 294 you don't want to know loudness; you want to know recording levels,
 295 so this is a peak meter, *not* a loudness meter. (There's some holdoff
 296 so you can see the actual peaks over a short period.) In particular,
 297 you don't want the bus to send clipped data to the master
 298 (which would happen if you set it too high); Nageru can handle
 299 this situation pretty well (unlike most digital mixers, it mixes in
 300 full 32-bit floating-point so there's no internal clipping,
 301 and the limiter described in :ref:`audio-strip` will usually save you)
 302 but it's still not a good place to be in, so if you peak,
 303 the **historical peak label** under the meter will go red if it happens.
 304 If you want to reset it, click on it using the mouse.
 305
 306 The peak meter doubles as an input peak check during
 307 setup; if you turn off all the effects and set the fader to neutral, you can
 308 see if the input hits peak or not, and then adjust it down. Left and right
 309 channel are shown separately, so you can see if they are approximately
 310 the same level or even completely mono.
 311
 312 The **compressor** is well-known from the simple audio mode, but in this view,
 313 it also has a **reduction meter**, so that you can see whether it kicks in or not.
 314 (This is also nonlinear, and each step is marked with number of decibels
 315 the compressor had to reduce the signal.) Most casual users
 316 would want to just leave the gain staging and compressor settings alone, but
 317 a skilled audio engineer will know how to adjust these to each speaker's
 318 antics—some speak at a pretty even volume and thus can get a bit of
 319 headroom, while some are much more variable and need tighter settings.
 320
 321 Nearly at the top (and nearly first in the chain), there's the EQ section. The **lo-cut** is again
 322 well-known from the simple audio mode (the filter is separate for each
 323 bus, the cutoff **frequency** is the same across all buses),
 324 but there's now also a simple **three-band EQ** per bus. Ask the speaker
 325 to talk normally for a bit, and tweak the controls until it sounds good.
 326 People have different voices and different ways of holding the microphone,
 327 and if you have a reasonable ear, you can use the EQ to your advantage to
 328 make them sound a little more even on the stream. Either that, or just
 329 put it in neutral, and the entire EQ code will be bypassed.
 330
 331 Finally (or, well, first), there's the **stereo width** knob.
 332 At the default, 100%, it makes no change to the signal, but if you turn it
 333 to 0% (at the middle), the signal becomes perfect mono. Between these two,
 334 there's a range where the channels leak partially over into each other.
 335 This can be useful if you have a very hard-panned signal (say, two microphones
 336 that point in diametrically opposite directions), which can sound odd when
 337 the listener is using headphones. Going further to the left, at -100%, the
 338 left and right channels are exactly swapped and between -100% and 0% is again
 339 a reversion with partial leaking. The range between -100% and 0%
 340 is for convenience only, as you could achieve the same effect by swapping the
 341 two channels in the input mapping. Note that the entire control is grayed out
 342 if the signal is provably mono (ie., the same input channel is mapped to both
 343 left and right).
 344
 345
 346 .. _midi-control:
 347
 348 MIDI controllers
 349 ----------------
 350
 351 If you are doing audio work beyond just setting up a mix and letting it
 352 stay there, dragging controls with the mouse can feel limiting. There's
 353 a wide range of controllers out there that have physical faders and knobs
 354 you can twiddle for a much more tactical feel; all the way up from about
 355 $50 to more than $5000. (For reference, Nageru has been tested with the
 356 `Akai MIDImix <http://www.akaipro.com/product/midimix>`_ and the
 357 `Korg nanoKONTROL2 <http://www.korg.com/us/products/computergear/nanokontrol2/>`_,
 358 and both work fine, although the nanoKONTROL2 needs some one-time Korg-specific
 359 SysEx commands before the lights and buttons will work with Nageru.)
 360 Nageru supports these in multichannel mode only.
 361
 362 For historical reasons, these speak the MIDI protocol as if they
 363 were instruments, and thus, Nageru refers to them as **MIDI controllers**.
 364 However, you won't really notice; they come with USB plugs to transport
 365 the MIDI data, so you just plug them in and have Nageru automatically
 366 talk to them. (For simplicity, Nageru will assume *any* MIDI device
 367 connected to your machine is such a controller.)
 368
 369 Since different controllers have different numbers of faders, knobs,
 370 buttons and lights, you will need to make a mapping. However, just like
 371 with the audio input mapping, this can be done once and then saved
 372 to disk for later loading. (You can load a given mapping on startup
 373 using the “--midi-mapping” flag.) The dialog, loaded with the included
 374 preset for the Akai MIDImix, looks like this:
 375
 376 .. image:: images/midi-controller-setup.png
 377
 378 There are three types of controls, which correspond to different types
 379 of MIDI events:
 380
 381   * **Controllers** map directly to MIDI controllers (the value in the
 382     dialog is the controller number), which are continuous
 383     values that can take on values from 0 to 127. (Unfortunately, MIDI
 384     was made in the 80s, where 7-bit precision was seen as enough.)
 385     They are typically used for faders and knobs.
 386
 387   * **Buttons** are one-shot events that map to MIDI note-on events,
 388     and the value in the dialog is the MIDI note number (also from
 389     0 to 127). They are similar to mouse buttons in that they don't
 390     have an on or off state (the MIDI note-off events are ignored).
 391     A typical example would be a mute button that can be pressed to
 392     either mute or unmute a channel.
 393
 394   * **Lights** are *output* events where Nageru can send feedback
 395     to the controller (and by extension, the user), represented by
 396     MIDI note-on and note-off events (to turn the light on or off).
 397     A typical example would be a mute light, that is on when a
 398     channel is muted.
 399
 400 In addition, each event can be *per-bus* or *global*. It can be a bit
 401 confusing that even the global events can be set once per-bus,
 402 but this is merely a convenience, allowing you to bind multiple
 403 physical controls to the same global controller; for global controllers,
 404 the bus number(s) you use for your mapping do not matter.
 405
 406 The combination of controller type and per-bus/global constitutes
 407 a **mapping group**, clearly marked and collapsible in the UI.
 408
 409
 410 Creating and updating mappings
 411 ''''''''''''''''''''''''''''''
 412
 413 Unless you have a reference sheet for your MIDI controller, specifying which
 414 controller and number numbers the different physical knobs and faders
 415 emit, inputting these numbers by hand can be a frustrating procedure.
 416 (Actually, even with a reference sheet, it probably is.) Thus, the preferred
 417 way is by autosensing; select the given mapping with the mouse
 418 and use the control you want to bind it to, and Nageru automatically
 419 fills it in.
 420
 421 Also, most devices support many channels, with very similar structure
 422 in their controller and/or note numbers. Once you've filled out one
 423 and then started filling out another one, Nageru can guess for you;
 424 if it thinks it can make a reasonable guess (ie., find a consistent
 425 offset from its left or right neighbor), the “Guess bus” and/or
 426 “Guess group” buttons will be clickable. This can save considerable
 427 amounts of time, although it is advisable to check Nageru's guess for
 428 at least the first guessed channel. In particular, some controllers
 429 do not have a consistent offset between channels on all the controllers
 430 (making “Guess bus” give the wrong answer), just on the controller groups,
 431 so there, you must limit yourself to guessing only a single controller
 432 group (using “Guess group”).
 433
 434 Lights currently cannot be learned, so some trial and error is needed.
 435 (However, if there are buttons associated with the light, a good place
 436 to start is using the same note number.) However, just like the input
 437 controllers, they can be guessed once you have all the mapping you want
 438 for a neighboring bus and partial information about the current one.
 439
 440
 441 Controller banks, and UI visibility
 442 '''''''''''''''''''''''''''''''''''
 443
 444 Many MIDI controllers do not have enough faders and knobs for every
 445 Nageru function you might want to control; some even contain only
 446 one fader or one knob. Thus, Nageru supports assigning a physical
 447 control to multiple functions, through **controller banks**.
 448 If a mapping is assigned to a controller bank, it is only active
 449 when that bank is active. The act of switching banks is in itself
 450 an action that can be initiated from the MIDI controller; in fact,
 451 that is currently the only way to switch them.
 452
 453 A typical example would be having a knob that in bank 1 is assigned
 454 to gain, and in bank 2 to cutoff (which happens to be a global control,
 455 as described in the previous section). This way, one can switch between
 456 the two banks and have both functions accessible from the MIDI controller.
 457 Similarly, buttons can be reused by assigning them to multiple banks.
 458
 459 Note that when switching banks, the associated controller(s) is
 460 *not* immediately updated; this happens only when you move the control.
 461 Otherwise, a bank switch would cause a host of unwanted changes,
 462 as it is unlikely that you would want the control in the exact
 463 same position for the two controllers. (There is a similar problem
 464 when starting up Nageru for the first time, where the controllers
 465 are not necessarily in the place matching Nageru's startup settings.)
 466 Some more expensive controllers support *motorized faders*, where
 467 the host can tell the control to move to the right place
 468 and thus solve the problem, but Nageru does not currently support them.
 469
 470 .. image:: images/highlight.png
 471
 472 To help you know which bank is active (or even that you have a MIDI
 473 controller connected at all), the currently mapped controller have
 474 a green **activity highlight**. When you switch banks, the highlight
 475 also updates—a controller is only highlighted if its mapping is
 476 active in the currently selected bank. This way, it is easy to see
 477 which controllers are currently controllable by MIDI, and which ones
 478 that are not.