X-Git-Url: https://git.sesse.net/?a=blobdiff_plain;f=audio.rst;h=65fe9af432a20b130fd5ba81901eeebd87051720;hb=d7d47cba9a794efe23d4ac53f108628ca221935f;hp=2aedd366252354165e8e56a858800a6301dce6da;hpb=403d734c4564e560a85e329fccaf93700b4c1794;p=nageru-docs diff --git a/audio.rst b/audio.rst index 2aedd36..65fe9af 100644 --- a/audio.rst +++ b/audio.rst @@ -45,13 +45,86 @@ can do. (In fact, simple mode constructs a multichannel setup behind-the-scenes and then runs the multichannel audio code.) +.. _audio-meters: + +Audio meters +------------ + +.. image:: images/level-meters.png + +When setting overall audio levels, there are two important goals: +To keep a reasonable **perceived loudness**, and to **avoid clipping**. +Both are more subtle to measure than one would initially assume, +and there are many ways to misstep. In particular, pretty much any +naïve way of measuring loudness will fail; human hearing is, for instance, +much more sensitive in some frequencies than others. + +`EBU R128 `_ provides solid solutions +to both problems. It specifies a precise algorithm to calculate a +both *momentary* loudness (over short and medium time intervals; +Nageru uses the short measurement), and a *loudness range* over an +arbitrary amount of time. The loudness is measured in LU (loudness +units), which is a relative unit very much like decibels; there's +also LUFS (loudness unit relative to full scale), which is number of +LU compared to a given reference. + +EBU R128 specifies a *target loudness* (0 LU) of -23 LUFS +/- 1 LU; +if you keep your stream within this and don't have a huge range +in general, it will have a reasonable loudness on most viewers' +setups. The left meter shows the momentary loudness (over the short +400 ms intervals), and the right meter shows the loudness range, +with the target shown as a box. If you are within the target, +the box turns green; otherwise, it is red. Both meters show +1 LU as one segment, with the highest value being +9 LU +(compared to the reference level) and the lowest being -18 LU. + +Even if the overall loudness is correct, one needs to avoid clipping; +if samples go outside the allowed range, it will sound as clicking +or popping (or if many do, as extreme distortion). However, +just measuring the value of every single sample is not good enough; +since the client might do its own resampling and processing, +we also need to account for *inter-sample peaks*. Nageru, in line +with R128 recommendations, oversamples the audio by 4x and writes +the highest peak (in dBFS) below the left meter. Anything above +the R128 limit of -0.1 dBFS will make the meter turn red to alert +the operator that clipping has occurred. (In practice, this should +rarely happen due to the limiter; see the next section.) + +You can click the reset (RST) button to reset all the meters, including +the peak measurement. + +Finally, the very top contains a **correlation meter** measuring +the correlation between the left and right channel, which is +useful for checking the stereo image. It goes from -1 at the very +left (the channels are exact opposites of each other), via 0 in +the middle (the channels are totally uncorrelated), to +1 at +the very right (the channels are exactly the same). All of these +are indications of common issues: + + * A correlation meter that sits at exactly zero typically means + either the left or the both channel (or both) is silent. + * A correlation meter that sits at exactly +1 typically means + you are sending a mono stream. This could be intentional + (if you e.g. have only a single microphone), but if not, + it could indicate either a loose connector or stereo channels + panned wrong. + * Finally, a correlation meter that sits at negative values + for longer periods of time indicate that one of the channels + is inverted (the phase is wrong), and could sound odd on + speaker setups. However, certain kinds of reverb or other + effects could also cause this, so it could be benign. + +A healthy stereo stream will usually have a correlation somewhere +around 0.7–0.8, and this section is marked in green. + + The audio strip --------------- -.. image:: images/basic-ui.png +.. image:: images/audio-strip.png -The audio strip contains the processing chain for the audio from -start to end. Note that by default, everything is enabled; +The audio strip contains controls for the processing chain for the audio from +start to end, left to right. Note that by default, everything is enabled; if you have a premade audio mix that you are confident that you want 1:1 into the stream, you can start Nageru with the “--flat-audio” flag, that instead starts with everything disabled. @@ -64,16 +137,140 @@ noise that is not related to the speaker's voice. (If you were producing music, you'd probably want it there to make room for music *under* it, but the you'd want it higher than the default 120 Hz.) -(TODO: write more) +Next comes a chain of no less than four compressors. They are +based on the same basic structure, but have very different settings, +and fill very different roles. +The first compressor is the **gain staging**, or auto-leveler; +it is very slow, with 500 ms attack time and 20 second release time. +Its purpose is to set the overall level for the next compressor +in the chain (so that it is slightly over its threshold); +if you have a pretty consistent input signal, you can uncheck +the “Auto” box and just set a static value manually. -Audio meters ------------- +The second compressor is the **actual compressor**. It is much +faster, with typical voice settings (5 ms attack, 40 ms release). +It has the effect of making the voice sound a bit tighter, +more level and overall better; if you have multiple things +in the mix, it will also bring them somewhat closer together. +(In general, a compressor gives the signal less dynamic range +by making it quieter, which allows you to gain it more up in +a later stage, so that it can get louder overall. It's a bit +paradoxical if you're not used to it.) + +You can adjust the threshold if you wish, or disable the compressor +altogether if your signal is already mastered. Note that if the +gain staging is not set so that this compressor gets an input signal +that's loud enough, it won't do anything to it. + +At this point, the mastering section begins; for simple audio, +the distinction won't matter, but for multichannel, the previous +effects are separate per-bus and the remaining are applied +after the mix. (More on this below.) The mastering section begins +with a **limiter**, basically a compressor with very high ratio. +It's there as an emergency brake for really loud compressors +that got through the other compressors—a classic example is a +speaker suddenly coughing, or a very loud bass drum. This prevents +both clipping and blowing out the speakers' ears. + +At this point, the audio signal is *almost* where we'd like it +to be, but the overall sound level might not be quite right. +All the previous compressors have been working in the objective +domain, but as explained in the :ref:`previous section `, +this does not necessarily correspond to the desired overall +audio loudness. (Their default levels have been calibrated so +that they end up around 0 LU for typical speech content, +but they could easily miss by a few LU in many cases.) + +Thus, there's a final **makeup gain** at the end to compensate +for these issues. When the “Auto” checkbox is ticked, which is +by default, it will very slowly (filter constant of 30 seconds) +adjust itself so that the overall level goes toward 0 LU, +ie., the reference level. It is so slow because the R128 calculations +inherently must go over a certain amount of time (what we want +to change with this gain is the *overall* sound level, +not the *immediate* one). In periods where the makeup gain is +far off, such as when the stream is all silent, it doesn't update +at all. As with the other knobs, you can uncheck the “Auto” +checkbox and tune this yourself if you want to. Multichannel mode ----------------- +**Multichannel mode** expands on simple audio mode by allowing you +to have multiple *buses* of audio. (In a sense, it could more accurately +be called “multibus mode” instead, but the name would be too confusing.) +A bus in Nageru is a pair of channels (left/right), sourced from +a video capture or ALSA card. The channel mapping is flexible; my USB +sound card has 18 channels, for instance, and you can use that to make +several buses. Each bus has a name (for instance, something like +“Blue microphone” or “Speaker PC”), which is just for convenience; +Nageru doesn't care what you write here, but the labels are useful +for the operator. + +Input mappings +'''''''''''''' + +.. image:: images/input-mapping.png + +The input mapping dialog should be pretty much self-explanatory; +you can use the + button to add a new bus, and the - button to remove +the currently selected one (you select by clicking on it). The up and +down buttons rearrange the order by moving the currently selected bus +up or down, if possible. + +Because mappings can be tedious to setup, you wouldn't want to set up +a complicated one every time you started Nageru. Therefore, mappings +can be saved and loaded from disk; the stored file is a +`protocol buffer `_ +in textual format. You can also load one at start with the +“--input-mapping” parameter, which also implies multichannel mode +(--multichannel). + +Nageru strives to keep the mapping consistent even +in the face of a changed environment—for instance, if you unplug and +replug a USB sound card, Nageru will attempt to keep your buses mapped to +that card still mapped. (While the card unplugged, the main display will show +the relevant buses as “(disconnected)”.) Similarly, if an ALSA device +is taken by another program on startup and cannot be accessed by Nageru, +it will mark it as “(busy)” and try again in the background. However, +there are edge cases where Nageru simply cannot do the right thing, +for instance if you unplug two identical cards and plug them back +in the reverse order; USB cards don't carry any kind of serial number +or other forms of unique identification. + + +The audio views +''''''''''''''' + +.. image:: images/audio-view-selector.png + +Once multichannel mode is active, a little selector shows up to the right, +just below the level meters. The arrows (or equivalently, the PgUp/PgDown +keys on the keyboard) allow you to select between two views: + + * In the **compact audio view** (which is the default), each bus is + represented only by its label, its peak meter (see below) and its + fader. This takes up little screen estate, and allows the video channels + to be visible. This is the typical view you'd use once you've set up + everything and are actually doing live video editing; the controls + from the full audio view are still in effect, but you cannot see or + interact with them. + + * The **full audio view** contains a lot more controls, but leaves no + room for the video channels. These are useful when you are doing initial + setup of your mix, or if you want to go back and tune something. + The full audio view will be described in detail in the following section; + the interpretation of the corresponding controls in the compact audio view + is the same. + +.. image:: images/audio-bus-controls.png + +There's one set each of these controls for every bus. + +(TODO: write more) + MIDI control ------------