For complex themes, building the multitude of chains one might need
has become very bothersome, with tricky Lua scripting and non-typesafe
multidimensional tables.
To alleviate this somewhat, we introduce a concept called Scenes.
A Scene is pretty much an EffectChain with a better name and significantly
more functionality. In particular, scenes don't consist of single Effects;
they consist of blocks, which can hold any number of alternatives for
Effects. On finalize, we will instantiate all possible variants of
EffectChains behind-the-scenes, like the Lua code used to have to do itself,
but this is transparent to the theme.
In particular, this means that inputs are much more flexible. Instead of
having to make separate chains for regular inputs, deinterlaced inputs,
video inputs and CEF inputs, you now just make an input, and can connect
any type to it runtime (or “display”, as it's now called). Output is also
flexible; by default, any scene will get both Y'CbCr and RGBA versions
compiled. (In both cases, you can make non-flexible versions to reduce
the number of different instantiations. This can be a good idea in
complex chains.)
This also does away with the concept of the prepare function for a chain;
any effect settings are snapshotted when you return from get_scene() (the new
name for get_chain(), obviously), so you don't need to worry about capturing
anything or get threading issues like you used to.
All existing themes will continue to work unmodified for the time being,
but it is strongly recommended to migrate from EffectChain to Scene.
Defer creation of effects until they are added to a chain.
Right now, this is pretty much a no-op, but it is a necessary building
block for making auto-chains (chains that can specialize depending on
the conditions).
After we made the different ImageInput instances share OpenGL textures,
some of them would assume they could output sRGB. Fix by the same way
we've done video textures, ie., use sRGBSwitchingInput.
Keeping the pure OpenGL path does not seem to actually buy us any
compatibility, so we simplify things a bit for an upcoming fix.
Strangely enough, this does not seem to be in any OpenGL standard.
Sampler objects, which we also require, is, though.
Make the ImageInput cache store textures, not images.
This saves on a lot of texture memory when the same image is used
in multiple chains, but perhaps more importantly, will allow us to
decouple ImageInputs from which images they show later.
Fairly untested, but should work both on single-track export and
on realtime output. No audio stretching or pitch shift, so only
plays when we're at regular speed. Note: There's no monitor output yet,
so the Futatabi operator will be deaf. There are also no VU bars.
Use vaCreateImage + vaPutImage instead of vaDeriveImage.
Seemingly, this largely fixes the L3 issues I've been seeing, taking
CPU usage down from ~2.1–2.2 to ~1.4 cores.
A test run with eight full 1080p59.94 inputs demonstrates that it can
be done without the GPU keeling over, although there are some issues with
VA-API threading.
When uploading MJPEG data to VA-API, do it directly into the buffer.
Besides the obvious of spending less time copying, this has two positive effects:
- The VA-API thread is no longer a choke point; uploading can happen from
multiple cores.
- With one copy less, we seem to be reducing L3 cache pressure a bit;
at some point between five and six 1080p sources, we “fall off a cliff”
wrt. the L3 and start thrashing. This doesn't fix the issue, but alleviates
it somewhat.
All in all, we seem to go down from ~2.6 to ~2.1–2.2 cores used with one
720p channel and five 1080p channels. I haven't tried saturating channels
yet to see how many we can actually encode.
This is, surprisingly, the most useful for VA-API decodes; they can
have long latency at 1080p, and Futatabi's dropping scheme sometimes
caused massive unfairness. Our system doesn't pipeline all that
nicely, so just having multiple threads was the simplest solution.
The risk is that we now access VA-API from multiple threads, which
has a tendency to tickle bugs, but we'll see.
Of course, for CPU decoding, you will also benefit.
For FFmpeg inputs, add an option for playing as fast as possible.
This is intended for live streams, where setting rate 2.0 or similar
would cause it to spew errors and keep resetting the clock. This mode
is automatically activated if rate >= 10.0.
This reportedly gives much more stable delay, as the timer is sampled
when the audio actually arrives in the kernel. Patch by Yann Dubreuil,
from the BreizhCamp repository.
Make it possible to queue and play clips with no cue-out set (infinite clips). Note that you can only have one of them in the clip list for the time being.