We read about twice as many as we should have; the others were
probably just set to 0.0, which has no effect but still burns
arithmetic, unless your driver happens to optimize very aggressively
for this (which I don't think anyone does anymore).
Properly restore the LC_NUMERIC locale after finalizing.
There were two issues here:
1. setlocale(LC_NUMERIC, "C") always returns C, not the previous
locale.
2. The return value of setlocale() may point into static storage,
which may be corrupted when we call into libGL, if e.g.
the shader compiler calls setlocale() on its own.
1. If you're missing some functionality, Movit will now tell you
on stderr what you're missing. (We might suppress this later
if it turns out that people want to init_movit() but are actually
fine with it failing.)
2. Use a table instead of repeated if-then logic, since this started
to become a bit messy after we added OpenGL-version-equivalence
checks.
Same rationale as with the offset; we need resampling for proper zoom.
The look at heavy zoom isn't _quite_ what I had hoped for (although it's OK),
and there's a hint of shimmering in the zoom center if there's high-contrast
material there. For now, I'll write off the latter as Lanczos ringing;
I'll need to see what it does to video eventually (only tested with stills).
This enables smooth (subpixel) panning that people frequently want for stills
and titles, but that you couldn't do in a subpixel fashion before (PaddingEffect
could only do integer pixel offsets).
The placement (ResampleEffect) might seem a bit off at first, but subpixel
offset needs resampling, and ResampleEffect already has all the logic in place
for that. We could have used the GPU's built-in bilinear resampling, of course,
but it doesn't look all that good for high-contrast situations (although working
in linear light should help some).
This is mainly a convenience so that you can change e.g. a left-to-right
wipe into a right-to-left wipe without having to add a separate inverting
effect to the luma. Suggested by Dan Dennedy.
Make the ResourcePool hold FBOs as a per-context resource.
This is an attempt to get out of the FBO sharability mess (unfortunately
we can't just stop having persistent FBOs, due to NVidia performance).
We now require the client to tell us whenever a context is going away,
and we try to be more careful about not deleting them in the wrong context.
Also, we assumed FBO names were globally unique, which isn't necessarily
true, so re-key them.
For good measure, we were deleting FBOs off the freelist from the front,
not the back as we should have -- fixed.
We have a problem when trying to delete an EffectChain or ResourcePool;
we might have created FBOs or VAOs in the wrong context. Work around it
for now (unbreaking Kdenlive) by making VAOs non-persistent again,
and simply never deleting FBOs (leaking them).
A proper solution here will be hard, unfortunately, and will nede some thought.
Many of the rows in the support texture are exactly the same,
so don't store the duplicates; gives a small performance boost.
In a sense, this is exactly the same property that GPUwave uses
with drawing multiple quads at the lower level.
Stop the FFTPassEffect Repeat test after FFT size 128.
The reason is that the 256 test uses texture sizes of 256*31=7936,
and above ~3900, some cards (at least both my Intel and NVidia card)
start having accuracy issues on some sizes. The test happens not to
die on this for semi-obscure reasons, but that's mostly by accident,
and in any case, requiring 8k textures for a unit test might be
a bit on the upper side.
Redo FBO association yet again, this time per-texture.
According to http://adrienb.fr/blog/wp-content/uploads/2013/04/PortingSourceToLinux.pdf,
you want an FBO per-texture, not just format. And indeed, I can measure a very slight
performance improvement on both NVidia and ATI for this.
Seemingly this _also_ costs on NVidia; the demo app is down 0.9 ms/frame or so.
This rapidly started approaching complexity worthy of the ResourcePool,
so I moved the functionality in there even though it's not context-shareable.
We support 1.10 (for OpenGL 2.1 cards), 1.30 (for OpenGL 3.2 core contexts),
and 3.00 ES (for GLES3). There's some code duplication, but thankfully
not a whole lot.
With this, we compile in core contexts without any warning from ATI's driver,
and should also in theory be GLES3 compliant (tested on NVidia's desktop driver).
Make handling of non-RGBA sRGB textures more consistent.
Previously, we'd ask the driver to convert these to RGBA, which maybe
isn't ideal, and certainly doesn't work with GLES. Now we send in
the right format for RGB and RGBA, and refuse hardware conversions with
single-channel (which GLES doesn't accept). I don't think this is optimal,
but finding a use-case for sRGB single-channel is a bit tricky anyway,
and the fallback is fast, too.
Reduce the amount of arithmetic in the BlurEffect shader a bit.
We did additions and subtractions with zero, which is sort of a waste
on scalar architectures. Helps ever so slightly on the demo app on my NVidia
card (3–4%).
Seemingly creating and deleting them is crazy expensive on NVidia
(~3 ms for a create/delete pair), so 6dea8d2 caused a performance
regression at high frame rates. Now we instead keep one around per
context (they cannot be shared), which brings us basically back
to where we were performance-wise.
Make Phase take other Phases as inputs, not Nodes.
This was a refactoring I wanted to do for a while, but actually finding
the right structure was a bit tricky. In the process, the entire phase
generation logic was rewritten, but the separation between compilation
and Phase construction is much cleaner now, and the logic in general
is easier to follow with more use of explicit recursion.
I'm still not 100% happy about what might be overuse of output_node;
we still need to link Phase and Node (the link just goes the other way
now), but I'm not sure we need to use it in all the cases we currently do.