This was long overdue; I was reluctant for a while since it's easier
to develop or debug shaders this way, but the solution is of course
to give priority to the filesystem.
It is now allowed to give "" as the data directory, which will only
ever look in the bundle. Most users should probably just do that.
Shaders are no longer installed by make install.
There is no API or ABI break, but of course, you cannot give ""
as directory to older versions (it would mean essentially /).
https://photosauce.net/blog/post/what-makes-srgb-a-special-color-space
alerted me to that sRGB does not use the exact same matrix as Rec. 709;
it demands specific, rounded-off values. The difference should be minute,
but now we are at least exact in the forward direction.
Note that this isn't an ABI break per se, even though the definition of the
COLORSPACE_sRGB enum changes; old software will simply still continue
to treat it as Rec. 709, while newly-compiled software will get the
corrected version.
Fix an issue where temporary textures could be reused too early by a different thread.
When an EffectChain is done rendering (ie., has submitted all of the GL
rendering commands that it needs to), it releases all of the temporary
textures it's used back to a common freelist.
However, if another thread needs a texture of the same size and format,
it could be picking it off of the freelist before the GPU has actually
completed rendering the first thread's command list, and start uploading
into it. This is undefined behavior in OpenGL, and can create garbled
output depending on timing and driver. (I've seen this on at least the
classic Intel Mesa driver.)
Fix by setting fences, so that getting a texture from the freelist
will have an explicit ordering versus the previous work. This increases the
size of ResourcePool::TextureFormat, but it is only ever used in a private
std::map. std::map is node-based (it has to, since the C++ standard requires
iterators to it to be stable), and thus, sizeof(TextureFormat) does not factor
into sizeof(ResourcePool), and thus, there is no ABI break. Verified by
checking on libstdc++.
Fix an issue where we'd add an unneeded bounce for mipmaps in some cases.
Specifically, if we didn't need to bounce for mipmaps, but bounced for
some other reason (e.g. resizing), we'd still propagate the mipmap
requirements. This could lead to yet another bounce earlier up the chain.
A typical case would be YCbCrInput -> GammaExpansionEffect -> ResizeEffect,
where we'd have a completely unneccessary bounce between the input and the
gamma expansion.
Fix more confusion with strong one-to-one effects and compute shaders.
The resulting unit test also found an issue where we could add a
dummy effect at the end but then end up not having a compute shader
there, which would cause OpenGL errors as the internal state got
confused, so fix that as well.
Loosen up some restrictions on strong one-to-one-effects.
In particular, this allows strong one-to-one-effects with multiple
inputs; this was forbidden by the comment, but not enforced anywhere,
and MixEffect was inadvertedly put as strong one-to-one. This actually
triggered at least three distinct bugs (thus three new tests).
Break phases when a node needs both to supply mipmaps and _not_ supply mipmaps.
In most cases, this means that we can keep a local copy that supplies
mipmaps and an RTT input that doesn't, or the other way around.
This is not a complete solution, since it doesn't take into account
the case where _both_ inputs will be RTT, but it's good enough for now.
Also requires us to distinguish between links between nodes in the same phase
and in different phases, or we'd get GLSL compile errors as we had both the
local and the RTT input being called e.g. “in0”.
This is significantly less hackish than having the few effects
that don't want mipmaps having to request texture bounce and
fiddling with the sampling parameters themselves.
API change, although only for writing custom effects.
Support chaining certain effects after compute shaders.
This rests on the notion of “strong one-to-one” that's very similar
to the older “one-to-one“ concept, yet a bit stricter (in particular,
PaddingEffect is not strong one-to-one).
EffectChain::render_to_textures() ended up reusing FBOs in a way that
was not compatible with how the tester managed textures; this broke nearly all
unit tests on NVIDIA.
Support rendering compute shaders straight to textures (skipping the dummy phase).
There are lots of limitations currently (only one destination,
only GL_RGBA16F), but it's a good start. Curiously enough,
it doesn't really help anything on the deinterlacing benchmark
for my Haswell, but NVIDIA sees ~15% improvement.
Add a texture barrier after dispatching each compute shader.
This is maybe a bit heavy-handed (there are cases where shaders could
run in parallel), but it's by far the simplest thing to do, since we
have zero control over what happens to the textures we use when they
are handed back to the resource pool.
Implement a compute shdaer version of DeinterlaceEffect.
This is currently a loss for grayscale (probably due to the extra
rgba16f bounce), but a win of about ~30% on BGRA on my Haswell.
NVIDIA doesn't care much either way.
There are some performance mysteries remaining, but it's a good start.
NVIDIA requires this for the layout qualifier, and it's probably right.
Note that this required moving the unit tests to a core context,
due to Mesa's demands.