Fix an issue where the last pass would have been rendered with the sRGB flag set, which confused Qt applications running in certain NVIDIA configurations.
Add support for 10- and 12-bit planar Y'CbCr inputs.
This is mostly for completeness; at least for 10-bit, 10:10:10:2
should be a faster format. However, it's nice to allow direct
subsampled inputs _somehow_.
Allow adjusting the output Y'CbCr coefficients after finalize.
Primarily useful for Nageru, which may have to switch output modes runtime.
Pretty much the same speed (just a single extra branch on a boolean uniform),
as constants and uniforms are typically the same speed and we're generally
ALU-bound.
Allow storing values in intermediate framebuffers as sqrt(x).
Together with GL_RGB10_A2, this would seem to be an even better tradeoff for
many chains than GL_SRGB8_ALPHA8 is, as long as you don't need intermediate
alpha. (We verify its accuracy with a unit test.)
This changes the API for specifying intermediate framebuffers, but that API
was never in a release, so it should be fine.
Also document a rather obscure problem where, if you can actually hold on to
non-linear values across a bounce buffer, you don't really want to store them
in sRGB encoding. (The square-root version actually avoids this problem.
I guess we could snoop on the type and do a similar thing if we see it's an
GL_SRGB* encoding, but it seems so obscure that we can ignore it for now.)
This is similar to what we had earlier to just reuse the VAO,
but now with correct bindings no matter what vertex attributes
are assigned to what index, so that the (new) test passes.
This is actually slightly more efficient than what we had before,
since we don't look up the attributes by text anymore, and don't
reupload the VBO for each frame anymore. In practice, the effects
should be small.
The patch trickles a bug where, if the first phase doesn't need texture
coordinates, the rest of the phases don't get it either. (Or more generally,
if the vertex shader varying indices are not predictable, the patch does
the wrong thing.) Add a unit test and revert it for now; in time, we'll find a
way that's both low-overhead (the patch fixes a real problem) _and_ correct in
these cases.
I tried a few different things before I finally settled on this, in particular
Weston's 3-field deinterlacer (w3fdif). It's not perfect (see .h comments),
but it works overall pretty well.
We multiplied by 224/219 once too many, causing some small accuracy issues.
Furthermore, we also did this for full-range Y'CbCr, which obviously is wrong.
The issue was so small that the unit tests kept on passing (its investigation
was prompted by a test that failed on AMD cards, which is a separate issue).
After this, the Rec. 601 matrices match Wikipedia exactly, both for limited
and full range. Added unit tests for this.
Evidently ATI drivers use the freedom the standard gives them to assign
these in another order than they are specified in the shader source,
so we need to explicitly bind them, or YCbCrConversionEffectTest will fail
in the multi-output tests.
This was never intended to be there, and we don't install headers for it
(so no API/ABI break); it is actively harmful because it has a static
ResourcePool, which is attempted destroyed during shutdown (which causes
use of uninitialized memory as we try to get the current context).
Make get_current_context_identifier() understand EGL.
If we're using EGL and not GLX (typically because we're using GLES,
but also increasingly with desktop GL), we'd always return NULL.
This could FBOs to be confused between contexts.
The intended use case is to have Y'CbCr for encoding output but keep
RGBA around for easier preview. This causes a few effects to need to
send arrays around; it's a bit ugly to special-case them like this,
but I'm concerned about going generic wrt. how good various shader
compilers are to optimize if we went full multi-model everywhere
(without having tested, though).
In practice, we haven't _actually_ supported this since we used integers
in ResampleEffect (and ResampleEffect is a pretty central effect),
so let's be honest with ourselves. (Also, we will soon start using arrays
in some cases, which are cumbersome pre-1.30.) I don't know of any drivers
that support all the other stuff we want but not GLSL 1.30 anyway;
it came with OpenGL 3.0, in 2008.
This actually isn't an ABI break, at least not on the C++ level.
In ycbcr_conversion_effect_test, use a non-float framebuffer.
This way, we let the card convert float-to-int, which we have reasonable
control over, as opposed to glReadPixels(), which is rather unpredictable.
Fixes unit test failures on Broadwell on Linux (Mesa 10.1).
In ResampleEffect, precompute the Lanczos function into a table.
A 2048-element table (with linear interpolation between the elements)
is seemingly enough to get down to beyond float epsilon, and this
saves a lot of CPU time when computing large filter kernels.
Fix a bug where combined fp16 weights would be horribly wrong.
Seemingly weights were always returned as float, and then cast
to fp16_int_t -- without proper conversion! And sum_sq_error
would be calculated based on the correct value, not the broken-
casted one.
It's a small miracle the unit tests didn't catch this; they didn't
until I started introducing small errors for another reason.
Most real-world testing seems to have hit fp32, and thus this
wasn't caught there either.
Also make fp16_int_t a struct so that it is not implicitly
convertible to/from numeric types, so this never ever can happen again.