git.sesse.net Git - nageru-docs/blob - audio.rst

   1 Audio
   2 =====
   3
   4 Audio is the most important part of video. It is also the most
   5 neglected part in most amateur productions; it is easy to care
   6 about full-HD productions but never remember to give the speaker
   7 a microphone. Your stream can live with blurry or murky pictures,
   8 but it cannot live with people not hearing what's being said.
   9
  10 Nageru aims to give the operator meaningful, useful controls for
  11 processing and mixing audio, with a focus on voice. There are two
  12 modes for audio processing, namely *simple* and *multichannel*;
  13 they are selectable from the audio menu.
  14
  15 Be aware that a mix that sounds good on a PA system will not
  16 necessarily sound good on a stream; PA systems often have rather
  17 different audio characteristics than a set of home speakers or
  18 headphones, and there will also frequently be other sounds in the
  19 room that remove some of the typical “dryness”. However, for simple
  20 use, reusing such a mix isn't the worst choice you can make.
  21
  22
  23 Simple mode
  24 -----------
  25
  26 **Simple** audio mode is the default, and was the only mode available
  27 up until Nageru 1.4.0. Despite its name, it contains a powerful
  28 audio processing chain; however, in many cases, you won't need to
  29 understand or twiddle any of the knobs availale.
  30
  31 Simple mode allows input from only a single source, and that source
  32 has to be one of the capture cards. (You choose which one by right-clicking
  33 on its channel and selecting it as audio source.) The two typical
  34 cases where this is useful are:
  35
  36   * When you simply take in audio from one of the cameras,
  37     possibly by way of external microphone, or
  38   * When you have an external mixer and can embed its output
  39     in one of the video inputs.
  40
  41 If you want more than one audio source at a time, or if you want
  42 to use ALSA inputs, you will need to use multichannel mode; it is
  43 more complicated, but it is a strict superset of what the simple mode
  44 can do. (In fact, simple mode constructs a multichannel setup
  45 behind-the-scenes and then runs the multichannel audio code.)
  46
  47
  48 Audio meters
  49 ------------
  50
  51 .. image:: images/level-meters.png
  52
  53 When setting overall audio levels, there are two important goals:
  54 To keep a reasonable **perceived loudness**, and to **avoid clipping**.
  55 Both are more subtle to measure than one would initially assume,
  56 and there are many ways to misstep. In particular, pretty much any
  57 naïve way of measuring loudness will fail; human hearing is, for instance,
  58 much more sensitive in some frequencies than others.
  59
  60 `EBU R128 <https://tech.ebu.ch/loudness>`_ provides solid solutions
  61 to both problems. It specifies a precise algorithm to calculate a
  62 both *momentary* loudness (over short and medium time intervals;
  63 Nageru uses the short measurement), and a *loudness range* over an
  64 arbitrary amount of time. The loudness is measured in LU (loudness
  65 units), which is a relative unit very much like decibels; there's
  66 also LUFS (loudness unit relative to full scale), which is number of
  67 LU compared to a given reference.
  68
  69 EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU;
  70 if you keep your stream within this and don't have a huge range
  71 in general, it will have a reasonable loudness on most viewers'
  72 setups. The left meter shows the momentary loudness (over the short
  73 400 ms intervals), and the right meter shows the loudness range,
  74 with the target shown as a box. If you are within the target,
  75 the box turns green; otherwise, it is red. Both meters show
  76 1 LU as one segment, with the highest value being +9 LU
  77 (compared to the reference level) and the lowest being -18 LU.
  78
  79 Even if the overall loudness is correct, one needs to avoid clipping;
  80 if samples go outside the allowed range, it will sound as clicking
  81 or popping (or if many do, as extreme distortion). However,
  82 just measuring the value of every single sample is not good enough;
  83 since the client might do its own resampling and processing,
  84 we also need to account for *inter-sample peaks*. Nageru, in line
  85 with R128 recommendations, oversamples the audio by 4x and writes
  86 the highest peak (in dBFS) below the left meter. Anything above
  87 the R128 limit of -0.1 dBFS will make the meter turn red to alert
  88 the operator that clipping has occurred. (In practice, this should
  89 rarely happen due to the limiter; see the next section.)
  90
  91 You can click the reset (RST) button to reset all the meters, including
  92 the peak measurement.
  93
  94 Finally, the very top contains a **correlation meter** measuring
  95 the correlation between the left and right channel, which is
  96 useful for checking the stereo image. It goes from -1 at the very
  97 left (the channels are exact opposites of each other), via 0 in
  98 the middle (the channels are totally uncorrelated), to +1 at
  99 the very right (the channels are exactly the same). All of these
 100 are indications of common issues:
 101
 102   * A correlation meter that sits at exactly zero typically means
 103     either the left or the both channel (or both) is silent.
 104   * A correlation meter that sits at exactly +1 typically means
 105     you are sending a mono stream. This could be intentional
 106     (if you e.g. have only a single microphone), but if not,
 107     it could indicate either a loose connector or stereo channels
 108     panned wrong.
 109   * Finally, a correlation meter that sits at negative values
 110     for longer periods of time indicate that one of the channels
 111     is inverted (the phase is wrong), and could sound odd on
 112     speaker setups. However, certain kinds of reverb or other
 113     effects could also cause this, so it could be benign.
 114
 115 A healthy stereo stream will usually have a correlation somewhere
 116 around 0.7–0.8, and this section is marked in green.
 117
 118
 119 The audio strip
 120 ---------------
 121
 122 .. image:: images/audio-strip.png
 123
 124 The audio strip contains controls for the processing chain for the audio from
 125 start to end, left to right. Note that by default, everything is enabled;
 126 if you have a premade audio mix that you are confident that you
 127 want 1:1 into the stream, you can start Nageru with the “--flat-audio”
 128 flag, that instead starts with everything disabled.
 129
 130 The first step in the pipeline is a **lo-cut** (or equivalently,
 131 highpass) filter. The exact cutoff frequency is a bit a matter
 132 of taste (and also depends on the speaker), but the main point
 133 is that it gets rid of low-frequency hum and a lot of the background
 134 noise that is not related to the speaker's voice. (If you were
 135 producing music, you'd probably want it there to make room for
 136 music *under* it, but the you'd want it higher than the default 120 Hz.)
 137
 138 Next comes a chain of no less than four compressors. They are
 139 based on the same basic structure, but have very different settings,
 140 and fill very different roles.
 141
 142 The first compressor is the **gain staging**, or auto-leveler;
 143 it is very slow, with 500 ms attack time and 20 second release time.
 144 Its purpose is to set the overall level for the next compressor
 145 in the chain (so that it is slightly over its threshold);
 146 if you have a pretty consistent input signal, you can uncheck
 147 the “Auto” box and just set a static value manually.
 148
 149 The second compressor is the **actual compressor**. It is much
 150 faster, with typical voice settings (5 ms attack, 40 ms release).
 151 It has the effect of making the voice sound a bit tighter,
 152 more level and overall better; if you have multiple things
 153 in the mix, it will also bring them somewhat closer together.
 154 (In general, a compressor gives the signal less dynamic range
 155 by making it quieter, which allows you to gain it more up in
 156 a later stage, so that it can get louder overall. It's a bit
 157 paradoxical if you're not used to it.)
 158
 159 You can adjust the threshold if you wish, or disable the compressor
 160 altogether if your signal is already mastered. Note that if the
 161 gain staging is not set so that this compressor gets an input signal
 162 that's loud enough, it won't do anything to it.
 163
 164 At this point, the mastering section begins; for simple audio,
 165 the distinction won't matter, but for multichannel, the previous
 166 effects are separate per-bus and the remaining are applied
 167 after the mix. (More on this below.) The mastering section begins
 168 with a **limiter**, basically a compressor with very high ratio.
 169 It's there as an emergency brake for really loud compressors
 170 that got through the other compressors—a classic example is a
 171 speaker suddenly coughing, or a very loud bass drum. This prevents
 172 both clipping and blowing out the speakers' ears.
 173
 174 (TODO: write more)
 175
 176
 177 Multichannel mode
 178 -----------------
 179
 180
 181 MIDI control
 182 ------------