Fix some jerkiness when playing back with no interpolation.
When hitting a frame exactly, we'd choose [pts_of_previous_frame, pts]
as our candidate range, and always pick the lower. This would make us
jerk around a lot, which would be a particular problem with audio.
Fix by taking a two-pronged approach: First, if we hit a frame exactly,
just send [pts,pts]. Second, activate snapping even when interpolation
is off.
We still have a problem when playing back audio in the case of dropped
video frames (we'll output the same audio twice), but better audio
handling is somewhat outside the scope of this commit.
Support disabling optional effects if a given other effect is _enabled_.
There are some restrictions (see the comments), but this is generally
useful if two effects are mutually exclusive, e.g., an overlay that can
be at one of many different points in the chain.
Make it possible for the theme to override the status line.
This is done by declaring a function format_status_line, which receives
the text that would normally be there (disk space left), as well as the
length of the current recording file in seconds. It can then return
whatever it would like.
My own code, but inspired by a C++ patch by Alex Thomazo in the Breizhcamp
repository (which did it by hardcoding a different status line in C++).
Make it possible to call set_channel_name() for live and preview.
The use case for this is if you want to copy the channel name to
preview or similar. Does not affect the legacy channel_name()
callback (it is still guaranteed never to get 0 or 1).
Probably doesn't affect the analyzer; I haven't tested.
Add disable_if_always_disabled() to Block objects.
This allows the theme to specify that a given effect only makes sense
if another effect is enabled; e.g. a crop that only makes sense if
immediately followed by a resize. This can cut down the number of
instantiations in some cases.
Also change so that 0 is no longer always the canonical choice;
if disabling a block is a possibility, that is. In situations with
things disabling each other transitively, this could reduce the
number of instantiations further.
This allows you to prune away entire sections of the chain; the typical
case is if you have an OverlayEffect(a, b) and want to disable that.
In the disabled versions of the chain, the OverlayEffect will be replaced
with an IdentityEffect that passes through a only and leaves the entire
subgraph under b noninstantiated.
For complex themes, building the multitude of chains one might need
has become very bothersome, with tricky Lua scripting and non-typesafe
multidimensional tables.
To alleviate this somewhat, we introduce a concept called Scenes.
A Scene is pretty much an EffectChain with a better name and significantly
more functionality. In particular, scenes don't consist of single Effects;
they consist of blocks, which can hold any number of alternatives for
Effects. On finalize, we will instantiate all possible variants of
EffectChains behind-the-scenes, like the Lua code used to have to do itself,
but this is transparent to the theme.
In particular, this means that inputs are much more flexible. Instead of
having to make separate chains for regular inputs, deinterlaced inputs,
video inputs and CEF inputs, you now just make an input, and can connect
any type to it runtime (or “display”, as it's now called). Output is also
flexible; by default, any scene will get both Y'CbCr and RGBA versions
compiled. (In both cases, you can make non-flexible versions to reduce
the number of different instantiations. This can be a good idea in
complex chains.)
This also does away with the concept of the prepare function for a chain;
any effect settings are snapshotted when you return from get_scene() (the new
name for get_chain(), obviously), so you don't need to worry about capturing
anything or get threading issues like you used to.
All existing themes will continue to work unmodified for the time being,
but it is strongly recommended to migrate from EffectChain to Scene.
Defer creation of effects until they are added to a chain.
Right now, this is pretty much a no-op, but it is a necessary building
block for making auto-chains (chains that can specialize depending on
the conditions).
After we made the different ImageInput instances share OpenGL textures,
some of them would assume they could output sRGB. Fix by the same way
we've done video textures, ie., use sRGBSwitchingInput.
Keeping the pure OpenGL path does not seem to actually buy us any
compatibility, so we simplify things a bit for an upcoming fix.
Strangely enough, this does not seem to be in any OpenGL standard.
Sampler objects, which we also require, is, though.
Make the ImageInput cache store textures, not images.
This saves on a lot of texture memory when the same image is used
in multiple chains, but perhaps more importantly, will allow us to
decouple ImageInputs from which images they show later.
Fairly untested, but should work both on single-track export and
on realtime output. No audio stretching or pitch shift, so only
plays when we're at regular speed. Note: There's no monitor output yet,
so the Futatabi operator will be deaf. There are also no VU bars.
Use vaCreateImage + vaPutImage instead of vaDeriveImage.
Seemingly, this largely fixes the L3 issues I've been seeing, taking
CPU usage down from ~2.1–2.2 to ~1.4 cores.
A test run with eight full 1080p59.94 inputs demonstrates that it can
be done without the GPU keeling over, although there are some issues with
VA-API threading.
When uploading MJPEG data to VA-API, do it directly into the buffer.
Besides the obvious of spending less time copying, this has two positive effects:
- The VA-API thread is no longer a choke point; uploading can happen from
multiple cores.
- With one copy less, we seem to be reducing L3 cache pressure a bit;
at some point between five and six 1080p sources, we “fall off a cliff”
wrt. the L3 and start thrashing. This doesn't fix the issue, but alleviates
it somewhat.
All in all, we seem to go down from ~2.6 to ~2.1–2.2 cores used with one
720p channel and five 1080p channels. I haven't tried saturating channels
yet to see how many we can actually encode.
This is, surprisingly, the most useful for VA-API decodes; they can
have long latency at 1080p, and Futatabi's dropping scheme sometimes
caused massive unfairness. Our system doesn't pipeline all that
nicely, so just having multiple threads was the simplest solution.
The risk is that we now access VA-API from multiple threads, which
has a tendency to tickle bugs, but we'll see.
Of course, for CPU decoding, you will also benefit.