4 years agoFix a double scaling issue in Y'CbCr conversion.
Steinar H. Gunderson [Sun, 13 Dec 2015 12:26:01 +0000 (13:26 +0100)]
Fix a double scaling issue in Y'CbCr conversion.

We multiplied by 224/219 once too many, causing some small accuracy issues.
Furthermore, we also did this for full-range Y'CbCr, which obviously is wrong.
The issue was so small that the unit tests kept on passing (its investigation
was prompted by a test that failed on AMD cards, which is a separate issue).

After this, the Rec. 601 matrices match Wikipedia exactly, both for limited
and full range. Added unit tests for this.

4 years agoExplicitly bind fragment shader outputs in order.
Steinar H. Gunderson [Sat, 12 Dec 2015 13:31:40 +0000 (14:31 +0100)]
Explicitly bind fragment shader outputs in order.

Evidently ATI drivers use the freedom the standard gives them to assign
these in another order than they are specified in the shader source,
so we need to explicitly bind them, or YCbCrConversionEffectTest will fail
in the multi-output tests.

Originally reported by Iwan Gabovitch.

4 years agoAdd a hack to use #version 110 but keep using 130 features, for the benefit of OS X.
Steinar H. Gunderson [Sat, 12 Dec 2015 11:27:21 +0000 (12:27 +0100)]
Add a hack to use #version 110 but keep using 130 features, for the benefit of OS X.

5 years agoStop linking widgets.o into the shared library.
Steinar H. Gunderson [Sun, 22 Nov 2015 22:13:03 +0000 (23:13 +0100)]
Stop linking widgets.o into the shared library.

This was never intended to be there, and we don't install headers for it
(so no API/ABI break); it is actively harmful because it has a static
ResourcePool, which is attempted destroyed during shutdown (which causes
use of uninitialized memory as we try to get the current context).

5 years agoAdd the missing two array uniform types.
Steinar H. Gunderson [Sun, 22 Nov 2015 13:34:56 +0000 (14:34 +0100)]
Add the missing two array uniform types.

5 years agoAllow setting width/height on FlatInput and YCbCrInput after instantiation.
Steinar H. Gunderson [Sat, 21 Nov 2015 20:32:29 +0000 (21:32 +0100)]
Allow setting width/height on FlatInput and YCbCrInput after instantiation.

5 years agoForgot to increment version.h for bounce override; doing so now.
Steinar H. Gunderson [Sun, 1 Nov 2015 15:08:49 +0000 (16:08 +0100)]
Forgot to increment version.h for bounce override; doing so now.

5 years agoAdd a function to let non-input effects override texture bounce.
Steinar H. Gunderson [Sun, 1 Nov 2015 01:09:56 +0000 (02:09 +0100)]
Add a function to let non-input effects override texture bounce.

Definitely read the comment before using; it is not for the faint
of heart. Also make ResampleEffect tolerate this kind of abuse.

5 years agoAdd some earlier check_error() calls so that we do not get confusing behavior if...
Steinar H. Gunderson [Sun, 1 Nov 2015 01:02:10 +0000 (02:02 +0100)]
Add some earlier check_error() calls so that we do not get confusing behavior if there is already error on the entrance to render_to_fbo().

5 years agoInstall identity.frag; it is needed for ResizeEffect.
Steinar H. Gunderson [Thu, 8 Oct 2015 19:13:44 +0000 (21:13 +0200)]
Install identity.frag; it is needed for ResizeEffect.

5 years agoFix another #if issue, this time in dither_effect.frag. Reported by Dan Dennedy.
Steinar H. Gunderson [Wed, 7 Oct 2015 19:03:24 +0000 (21:03 +0200)]
Fix another #if issue, this time in dither_effect.frag. Reported by Dan Dennedy.

5 years agoInstall the new GLSL 1.50 shaders.
Steinar H. Gunderson [Wed, 7 Oct 2015 18:21:08 +0000 (20:21 +0200)]
Install the new GLSL 1.50 shaders.

5 years agoMake the demo program run with core contexts.
Steinar H. Gunderson [Wed, 7 Oct 2015 18:12:25 +0000 (20:12 +0200)]
Make the demo program run with core contexts.

Also, if SDL2 is in use, actually _ask_ for a core context.

5 years agoAdd separate shaders for GLSL 1.50.
Steinar H. Gunderson [Wed, 7 Oct 2015 17:24:19 +0000 (19:24 +0200)]
Add separate shaders for GLSL 1.50.

Seemingly, Apple's drivers (in OS X) do not support GLSL 1.30 in
core contexts, only 1.50. Reported by Dan Dennedy.

5 years agoFix GLSL compilation errors on some drivers.
Steinar H. Gunderson [Tue, 6 Oct 2015 18:08:05 +0000 (20:08 +0200)]
Fix GLSL compilation errors on some drivers.

Evidently #if FOO is illegal in GLSL if FOO is not defined,
unlike in C++. Reported by Dan Dennedy.

5 years agoMake get_current_context_identifier() understand EGL.
Steinar H. Gunderson [Mon, 5 Oct 2015 23:24:26 +0000 (01:24 +0200)]
Make get_current_context_identifier() understand EGL.

If we're using EGL and not GLX (typically because we're using GLES,
but also increasingly with desktop GL), we'd always return NULL.
This could FBOs to be confused between contexts.

5 years agoDisable dither explicitly per frame; fixes some weird artifacts I found.
Steinar H. Gunderson [Mon, 5 Oct 2015 22:06:14 +0000 (00:06 +0200)]
Disable dither explicitly per frame; fixes some weird artifacts I found.

5 years agoCall init_lanczos_table() once instead of checking for it all the time.
Steinar H. Gunderson [Mon, 5 Oct 2015 21:08:35 +0000 (23:08 +0200)]
Call init_lanczos_table() once instead of checking for it all the time.

5 years agoGet rid of a bunch of STL inefficiencies in FBO freelist handling.
Steinar H. Gunderson [Mon, 5 Oct 2015 20:43:59 +0000 (22:43 +0200)]
Get rid of a bunch of STL inefficiencies in FBO freelist handling.

5 years agoSupport GL_RGB565 targets.
Steinar H. Gunderson [Mon, 5 Oct 2015 19:04:29 +0000 (21:04 +0200)]
Support GL_RGB565 targets.

5 years agoBump version number after support for external OpenGL textures (it was forgotten).
Steinar H. Gunderson [Mon, 5 Oct 2015 18:49:10 +0000 (20:49 +0200)]
Bump version number after support for external OpenGL textures (it was forgotten).

5 years agoMake FlatInput and YCbCrInput support taking in external OpenGL textures.
Steinar H. Gunderson [Mon, 5 Oct 2015 17:49:51 +0000 (19:49 +0200)]
Make FlatInput and YCbCrInput support taking in external OpenGL textures.

5 years agoUnbreak make install after the last changes.
Steinar H. Gunderson [Sun, 4 Oct 2015 00:44:13 +0000 (02:44 +0200)]
Unbreak make install after the last changes.

5 years agoSome small cleanups after we got rid of GLSL 1.10; we can now unify 1.30 and ES 3...
Steinar H. Gunderson [Sun, 4 Oct 2015 00:43:02 +0000 (02:43 +0200)]
Some small cleanups after we got rid of GLSL 1.10; we can now unify 1.30 and ES 3.00 some places.

5 years agoAllow dual Y'CbCr/RGBA outputs.
Steinar H. Gunderson [Sun, 4 Oct 2015 00:37:56 +0000 (02:37 +0200)]
Allow dual Y'CbCr/RGBA outputs.

The intended use case is to have Y'CbCr for encoding output but keep
RGBA around for easier preview. This causes a few effects to need to
send arrays around; it's a bit ugly to special-case them like this,
but I'm concerned about going generic wrt. how good various shader
compilers are to optimize if we went full multi-model everywhere
(without having tested, though).

ABI break due to changed EffectChain size.

5 years agoRemove support for GLSL 1.10.
Steinar H. Gunderson [Sun, 4 Oct 2015 00:05:33 +0000 (02:05 +0200)]
Remove support for GLSL 1.10.

In practice, we haven't _actually_ supported this since we used integers
in ResampleEffect (and ResampleEffect is a pretty central effect),
so let's be honest with ourselves. (Also, we will soon start using arrays
in some cases, which are cumbersome pre-1.30.) I don't know of any drivers
that support all the other stuff we want but not GLSL 1.30 anyway;
it came with OpenGL 3.0, in 2008.

This actually isn't an ABI break, at least not on the C++ level.

5 years agoIn ycbcr_conversion_effect_test, use a non-float framebuffer.
Steinar H. Gunderson [Fri, 25 Sep 2015 23:33:29 +0000 (01:33 +0200)]
In ycbcr_conversion_effect_test, use a non-float framebuffer.

This way, we let the card convert float-to-int, which we have reasonable
control over, as opposed to glReadPixels(), which is rather unpredictable.
Fixes unit test failures on Broadwell on Linux (Mesa 10.1).

5 years agoFix a buffer overflow in ycbcr_conversion_effect_test.
Steinar H. Gunderson [Fri, 25 Sep 2015 23:29:40 +0000 (01:29 +0200)]
Fix a buffer overflow in ycbcr_conversion_effect_test.

5 years agoRelease Movit 1.2.0. 1.2.0
Steinar H. Gunderson [Thu, 24 Sep 2015 16:44:01 +0000 (18:44 +0200)]
Release Movit 1.2.0.

5 years agoIn ResampleEffect, precompute the Lanczos function into a table.
Steinar H. Gunderson [Thu, 24 Sep 2015 00:12:40 +0000 (02:12 +0200)]
In ResampleEffect, precompute the Lanczos function into a table.

A 2048-element table (with linear interpolation between the elements)
is seemingly enough to get down to beyond float epsilon, and this
saves a lot of CPU time when computing large filter kernels.

5 years agoFix a bug where combined fp16 weights would be horribly wrong.
Steinar H. Gunderson [Wed, 23 Sep 2015 23:59:47 +0000 (01:59 +0200)]
Fix a bug where combined fp16 weights would be horribly wrong.

Seemingly weights were always returned as float, and then cast
to fp16_int_t -- without proper conversion! And sum_sq_error
would be calculated based on the correct value, not the broken-
casted one.

It's a small miracle the unit tests didn't catch this; they didn't
until I started introducing small errors for another reason.
Most real-world testing seems to have hit fp32, and thus this
wasn't caught there either.

Also make fp16_int_t a struct so that it is not implicitly
convertible to/from numeric types, so this never ever can happen again.

5 years agoMicrooptimization in ResampleEffect.
Steinar H. Gunderson [Tue, 22 Sep 2015 23:34:58 +0000 (01:34 +0200)]
Microoptimization in ResampleEffect.

5 years agoAdd a mode for YCbCrInput where Cb and Cr are in the same texture.
Steinar H. Gunderson [Thu, 17 Sep 2015 23:48:10 +0000 (01:48 +0200)]
Add a mode for YCbCrInput where Cb and Cr are in the same texture.

5 years agoIn ResampleEffect, be more aggressive about giving up on saving bilinear samples.
Steinar H. Gunderson [Thu, 17 Sep 2015 18:05:43 +0000 (20:05 +0200)]
In ResampleEffect, be more aggressive about giving up on saving bilinear samples.

It turns out that for some kinds of loads, we can't use bilinearity at all
to our benefit, so we spend almost all of our time trying to go through
each line to see how much we can save. Simply send in the minimum number
so far when doing this evaluation to begin with, which means we'll effectively
short-circuit the entire thing pretty fast once we find a line that can save

5 years agoReduce the amount of computation in combine_two_samples().
Steinar H. Gunderson [Thu, 17 Sep 2015 17:28:43 +0000 (19:28 +0200)]
Reduce the amount of computation in combine_two_samples().

Mostly microoptimization, but seemingly this function is somewhat expensive.

5 years agoAdd support for overriding the output origin.
Steinar H. Gunderson [Wed, 16 Sep 2015 22:39:43 +0000 (00:39 +0200)]
Add support for overriding the output origin.

For me, this was needed when I wanted to render directly into
VA-API's encoder buffers, which are always top-left origin (and FBOs
are always bottom-left origin).

5 years agoAdd support for Y'CbCr output split between multiple textures.
Steinar H. Gunderson [Wed, 16 Sep 2015 21:51:30 +0000 (23:51 +0200)]
Add support for Y'CbCr output split between multiple textures.

This is useful primarily for avoiding copies in later stages;
e.g., when rendering directly into a video encoder buffer.
We support both full planar and NV12-style interleaved Cb+Cr.
You still have to subsample chroma yourself, though; we don't
really support chains that diverge except in the final output node
(and changing resolution would definitely need a bounce;
and even worse, one in a non-fp16 intermediate format).

5 years agoMake EffectChainTester capable of testing chains with multiple outputs.
Steinar H. Gunderson [Wed, 16 Sep 2015 21:49:45 +0000 (23:49 +0200)]
Make EffectChainTester capable of testing chains with multiple outputs.

5 years agoIn EffectChainTester, there is no point in making the FBO and texture persistent.
Steinar H. Gunderson [Wed, 16 Sep 2015 21:04:45 +0000 (23:04 +0200)]
In EffectChainTester, there is no point in making the FBO and texture persistent.

5 years agoSmall refactoring in EffectChainTester.
Steinar H. Gunderson [Wed, 16 Sep 2015 21:00:10 +0000 (23:00 +0200)]
Small refactoring in EffectChainTester.

5 years agoSupport multi-texture FBOs in ResourcePool.
Steinar H. Gunderson [Wed, 16 Sep 2015 20:33:09 +0000 (22:33 +0200)]
Support multi-texture FBOs in ResourcePool.

5 years agoAdd some check_error() for shaders miscompiling.
Steinar H. Gunderson [Wed, 16 Sep 2015 18:02:30 +0000 (20:02 +0200)]
Add some check_error() for shaders miscompiling.

5 years agoReuse the VAO across all phases.
Steinar H. Gunderson [Sun, 13 Sep 2015 23:18:38 +0000 (01:18 +0200)]
Reuse the VAO across all phases.

5 years agoHelp the compiler out a tiny bit.
Steinar H. Gunderson [Sun, 13 Sep 2015 23:12:43 +0000 (01:12 +0200)]
Help the compiler out a tiny bit.

5 years agoReduce the boilerplate around uniforms a bit.
Steinar H. Gunderson [Sun, 13 Sep 2015 21:15:37 +0000 (23:15 +0200)]
Reduce the boilerplate around uniforms a bit.

5 years agoCleanup: Make uniforms for RTT samplers like all other uniforms.
Steinar H. Gunderson [Sun, 13 Sep 2015 18:40:58 +0000 (20:40 +0200)]
Cleanup: Make uniforms for RTT samplers like all other uniforms.

This also removes an ugly special-casing where one single place
in the entire code would call glUniform1i directly.

5 years agoHandle sampler2D uniforms specially.
Steinar H. Gunderson [Sun, 13 Sep 2015 18:15:29 +0000 (20:15 +0200)]
Handle sampler2D uniforms specially.

We're going to need this soon, since sampler uniforms are special
in that they cannot be in a uniform block.

5 years agoRework uniform setting.
Steinar H. Gunderson [Sun, 13 Sep 2015 15:44:58 +0000 (17:44 +0200)]
Rework uniform setting.

One would think something as mundane as setting a few uniforms wouldn't
really mean much for performance, but seemingly this is not always so --
I had a real-world shader that counted no less than 55 uniforms.
Of course, not all of these were actually used, but we still have to go
through looking up the name etc. for every single one, every single frame.

Thus, we introduce a new way of dealing with uniforms: Register them before
finalization time, and then EffectChain can store their numbers once and
for all, instead of this repeated lookup. The system is also set up such
that we can go to uniform buffer objects (UBOs) in the very near future.

It's a bit unfortunate that uniform declaration now is removed from the
.frag files, where it sat very nicely, but the alternative would be to
try to parse GLSL, which I'm a bit wary at right now. All effects are
converted, leaving the set_uniform_* functions without any users, but
they are kept around for now in case external effects want them.

This seems to bring 1–2% speedup for my use case; hopefully UBOs will
bring a tiny bit more.

5 years agoAdd default constructors to Point2D/RGBTuple/RGBATuple, for convenience.
Steinar H. Gunderson [Sun, 13 Sep 2015 15:39:52 +0000 (17:39 +0200)]
Add default constructors to Point2D/RGBTuple/RGBATuple, for convenience.

5 years agoAdd a version header file to help clients that need to relate to multiple versions...
Steinar H. Gunderson [Sun, 13 Sep 2015 10:09:08 +0000 (12:09 +0200)]
Add a version header file to help clients that need to relate to multiple versions of Movit.

5 years agoUpdate README: It's now 2015. :-)
Steinar H. Gunderson [Wed, 9 Sep 2015 21:53:12 +0000 (23:53 +0200)]
Update README: It's now 2015. :-)

Needs to be done in time for 2016, of course.

5 years agoAdd support for Y'CbCr output.
Steinar H. Gunderson [Wed, 9 Sep 2015 21:51:48 +0000 (23:51 +0200)]
Add support for Y'CbCr output.

Currently only 8-bit and only 4:4:4 packed, but it should be a useful
building block.

5 years agoPrepare for better understanding of 10- and 12-bit Y'CbCr.
Steinar H. Gunderson [Tue, 8 Sep 2015 23:28:40 +0000 (01:28 +0200)]
Prepare for better understanding of 10- and 12-bit Y'CbCr.

Seemingly there is trickiness in how to interpret the integer
values that is different from what you'll typically see in R'G'B'
(or just GPUs and TV standards differ on that point as well).
Add an explanatory comment, and add a data member to YCbCrFormat
to prepare for correct 10/12-bit level handlings. We'll stay 8-bit
only for now, though, to avoid an API break for existing clients
for no good reason (there's no 10-bit input, really).

5 years agoMinor optimization in ResampleEffect: Set less GL state.
Steinar H. Gunderson [Sun, 6 Sep 2015 22:10:48 +0000 (00:10 +0200)]
Minor optimization in ResampleEffect: Set less GL state.

In particular, if we can avoid it, use glTexSubImage2D instead of glTexImage2D.
This actually has a real effect, at least on Intel/Linux, where the drive seems
to stall on some mappings.

Of course, this only really helps for things like pans, not zooms.

5 years agoMake the PaddingEffect border 1-pixel soft.
Steinar H. Gunderson [Sat, 5 Sep 2015 22:57:25 +0000 (00:57 +0200)]
Make the PaddingEffect border 1-pixel soft.

Note that this is an API break; PaddingEffect now does something else
from what it used to do before when it comes to fractional offsets.
But I feel this is more useful; it allows PaddingEffect to be used
more efficiently for moving things smoothly around.

Also add a concept of border offset which moves the border around
without changing the pixels; useful if you want the subpixel placement
to be done by ResampleEffect (put the integral offset into top/left
and then move the border by the fractional amount it missed).

5 years agoFix a comment.
Steinar H. Gunderson [Sat, 5 Sep 2015 19:04:31 +0000 (21:04 +0200)]
Fix a comment.

5 years agoMark ResampleEffect as not one-to-one sampling.
Steinar H. Gunderson [Sat, 5 Sep 2015 14:37:47 +0000 (16:37 +0200)]
Mark ResampleEffect as not one-to-one sampling.

The assumption is broken whenever a non-integral top or left parameter
is specified. Instead, make an IntegralResampleEffect that enforces
these parameters to be integers, and then mark it as one-to-one sampling.

5 years agoCollapse passes more aggressively in the face of size changes.
Steinar H. Gunderson [Wed, 2 Sep 2015 23:46:58 +0000 (01:46 +0200)]
Collapse passes more aggressively in the face of size changes.

The motivating chain for this change was a case where we had
a SinglePassResampleEffect (the second half of a ResampleEffect)
feeding into a PaddingEffect, feeding into an OverlayEffect.
Currently, since the two former change output size, we'd bounce
to a temporary texture twice (output size changes would always
cause bounces).

However, this is needlessly conservative. The reason for bouncing
when changing output size is really if you want to get rid of
data by downscaling and then later upsampling, e.g. for a blur.
(It could also be useful for cropping, but we don't really use
that right now; PaddingEffect, which does crop, explicitly checks
the borders anyway to set the border color manually.) But in this case,
we are not downscaling at all, so we could just drop the bounce,
saving tons of texture bandwidth.

Thus, we add yet more parameters that effects can specify; first,
that an effect uses _one-to-one_ sampling; that is, that it
 will only use its input as-is without sampling
between texels or outside the border (so the different
interpolation and border behavior will be irrelevant).
(Actually, almost all of our effects fall into this category.)
Second, a flag saying that even if an effect changes size,
it doesn't use virtual sizes (otherwise even a one-to-one effect
would de-facto be sampling between texels). If these flags
are set on the input and the output respectively, we can avoid
the bounce, at least unless there's an effect that's _not_
one-to-one further up the chain.

For my motivating case, this folded eight phases into four,
changing ~16.0 ms into ~10.6 ms rendering time. Seemingly
memory bandwidth is a really precious resource on my laptop's

5 years agoConvert an overly cut-and-pasted comment for AlphaDivisionEffect.
Steinar H. Gunderson [Wed, 2 Sep 2015 22:00:46 +0000 (00:00 +0200)]
Convert an overly cut-and-pasted comment for AlphaDivisionEffect.

5 years agoDraw an oversized triangle instead of a quad.
Steinar H. Gunderson [Wed, 2 Sep 2015 21:52:47 +0000 (23:52 +0200)]
Draw an oversized triangle instead of a quad.

This is mostly theoretical; I've never been able to measure any
sort of real change from this. But according to popular cargo-culting,
it might have an effect since there are fewer edge pixels to shade.

5 years agoPropagate size correctly across effects that change output size.
Steinar H. Gunderson [Tue, 1 Sep 2015 23:02:38 +0000 (01:02 +0200)]
Propagate size correctly across effects that change output size.

When propagating size information between effects in a phase,
we'd forget to check if the effect wanted to change size
and use that information instead of our own heuristics.
Fix that.

This is currently a no-op, since right now we always break a phase
when an effect changes output size, but there are very real situations
where we'd be fine with not doing so, so this patch paves the way
for that.

5 years agoFix broken YCbCr subpixel positioning. Caught by the unit tests.
Steinar H. Gunderson [Mon, 31 Aug 2015 23:56:42 +0000 (01:56 +0200)]
Fix broken YCbCr subpixel positioning. Caught by the unit tests.

5 years agoSupport timer queries for phases.
Steinar H. Gunderson [Mon, 31 Aug 2015 17:11:00 +0000 (19:11 +0200)]
Support timer queries for phases.

This is useful for debugging slow chains; it can give information
about which phase takes the most time. Right now there seems to be
~5 ms in one of my test chains that disappear into nothing
(ie. show up in the fps counter with vsync off, but not in any
phase), but hopefully we can eventually solve that discrepancy.

Note that this is an ABI break.

5 years agoAdd ycbcr.h to HDRS.
Steinar H. Gunderson [Thu, 30 Jul 2015 15:53:54 +0000 (17:53 +0200)]
Add ycbcr.h to HDRS.

Reported by Dan Dennedy.

5 years agoDo some IEEE trickery to help the shader optimizer remove a sub or two in some YCbCr...
Steinar H. Gunderson [Thu, 30 Jul 2015 11:35:20 +0000 (13:35 +0200)]
Do some IEEE trickery to help the shader optimizer remove a sub or two in some YCbCr cases.

5 years agoUse std::scientific when outputting floats, so we do not get issues with 0.0 being...
Steinar H. Gunderson [Thu, 30 Jul 2015 11:09:12 +0000 (13:09 +0200)]
Use std::scientific when outputting floats, so we do not get issues with 0.0 being outputs as 0 (which is an int, which cannot always be implicitly converted to float in GLSL).

5 years agoIf a shader fails to compile, output it for easier debugging.
Steinar H. Gunderson [Thu, 30 Jul 2015 11:08:31 +0000 (13:08 +0200)]
If a shader fails to compile, output it for easier debugging.

5 years agoAdd a missing entry in .gitignore.
Steinar H. Gunderson [Thu, 30 Jul 2015 10:48:40 +0000 (12:48 +0200)]
Add a missing entry in .gitignore.

5 years agoAdd a unit test for luma interpolation in YCbCr422InterleavedInput.
Steinar H. Gunderson [Thu, 30 Jul 2015 10:40:36 +0000 (12:40 +0200)]
Add a unit test for luma interpolation in YCbCr422InterleavedInput.

5 years agoAdd a small note on unit testing of ycbcr.cpp.
Steinar H. Gunderson [Thu, 30 Jul 2015 10:40:11 +0000 (12:40 +0200)]
Add a small note on unit testing of ycbcr.cpp.

5 years agoSmall whitespace fix.
Steinar H. Gunderson [Thu, 30 Jul 2015 10:39:32 +0000 (12:39 +0200)]
Small whitespace fix.

5 years agoAdd an effect for 4:2:2 interleaved YCbCr input (UYVY).
Steinar H. Gunderson [Wed, 29 Jul 2015 23:38:38 +0000 (01:38 +0200)]
Add an effect for 4:2:2 interleaved YCbCr input (UYVY).

This is primarily motivated by the fact that DeckLink uses this format

5 years agoRename the YCbCrInput test to YCbCrInputTest, for consistency.
Steinar H. Gunderson [Wed, 29 Jul 2015 23:28:24 +0000 (01:28 +0200)]
Rename the YCbCrInput test to YCbCrInputTest, for consistency.

5 years agoSmall refactoring in YCbCrInput.
Steinar H. Gunderson [Wed, 29 Jul 2015 11:53:59 +0000 (13:53 +0200)]
Small refactoring in YCbCrInput.

5 years agoUnbreak YCbCrInput (it needs to still support setting the "needs_mipmaps" int to...
Steinar H. Gunderson [Wed, 29 Jul 2015 11:53:44 +0000 (13:53 +0200)]
Unbreak YCbCrInput (it needs to still support setting the "needs_mipmaps" int to zero).

5 years agoAllow inputs to say they cannot support mipmaps.
Steinar H. Gunderson [Tue, 28 Jul 2015 23:28:30 +0000 (01:28 +0200)]
Allow inputs to say they cannot support mipmaps.

Really only FlatInput can easily support mipmaps; for things like YCbCrInput
that combine multiple inputs, it's hard (probably not downright impossible,
but at least not immediately obvious without thinking about it a bit) and for
FFTInput it makes no sense.

Thus, we allow an input to say that it can't do this, and then bounce it
to a texture if needed. Hopefully this should happen quite rarely.

5 years agoSave a mul in YCbCrInput by folding the scaling into the matrix.
Steinar H. Gunderson [Tue, 28 Jul 2015 12:01:45 +0000 (14:01 +0200)]
Save a mul in YCbCrInput by folding the scaling into the matrix.

5 years agoFix a C++11-related warning.
Steinar H. Gunderson [Mon, 30 Mar 2015 20:54:21 +0000 (22:54 +0200)]
Fix a C++11-related warning.

5 years agoRelease Movit 1.1.3. 1.1.3
Steinar H. Gunderson [Sun, 29 Mar 2015 00:06:58 +0000 (01:06 +0100)]
Release Movit 1.1.3.

5 years agoMake read_file() thread-safe.
Steinar H. Gunderson [Sat, 7 Mar 2015 01:06:29 +0000 (02:06 +0100)]
Make read_file() thread-safe.

This is long overdue, of course; I knew this function was a quick hack,
but didn't realize it was a problem until Christophe Thommeret reported
an issue that looked a lot like this.

5 years agoDrop setting the locale altogether.
Steinar H. Gunderson [Sat, 7 Mar 2015 01:01:45 +0000 (02:01 +0100)]
Drop setting the locale altogether.

Trying to use sprintf and floats right in a portable manner is seemingly
impossible (MinGW doesn't support the per-thread locale stuff), so simply
do it a different way; stop sprintf-ing floats and use std::stringstream
instead. I dislike the iostream interface a lot, but it can do per-stream
locales, which is exactly what we want here.

5 years agoFix build on OS X and MinGW.
Dan Dennedy [Thu, 5 Mar 2015 07:41:39 +0000 (23:41 -0800)]
Fix build on OS X and MinGW.

OS X requires the xlocale.h header to define locale_t:

MinGW does not include implementations for newlocale() and uselocale().
Instead, use the previous approach using setlocale().

5 years agoUse thread-local locale.
Steinar H. Gunderson [Tue, 3 Mar 2015 22:03:54 +0000 (23:03 +0100)]
Use thread-local locale.

setlocale() affects the whole process, not just the current thread
as I assumed; uselocale() (available since glibc 2.3, so basically
forever) is per-thread, and also conveniently seems to avoid the
issue of the returned pointer being destroyed (unless the driver
uses the return value of uselocale() as a base, which I really hope
it doesn't).

I'm slightly worried that since this overrides setlocale(), buggy drivers
might get confused when they try to do setlocale() and something else
overrides that precedence, but hopefully this shouldn't be a case.

Also add a unit test for locale handling while we're at it. It doesn't
test multi-threaded behavior, though, only the simple case.

Reported by Christophe Thommeret.

5 years agoIn ResampleEffect, ignore near-zero weights when combining.
Steinar H. Gunderson [Mon, 23 Feb 2015 19:41:45 +0000 (20:41 +0100)]
In ResampleEffect, ignore near-zero weights when combining.

5 years agoUse the F16C instruction set when available.
Steinar H. Gunderson [Mon, 23 Feb 2015 19:17:49 +0000 (20:17 +0100)]
Use the F16C instruction set when available.

For most users, this is mostly theoretical, as it requires compiling
with -march=native or similar. And these are definitely meant for
vectorizing, although it's still 2-3x as fast to use them as our own
software fallback.

These are supported starting from Haswell, and also by some AMD CPUs.

5 years agoRevert the optimization of the bilinear weights.
Steinar H. Gunderson [Mon, 23 Feb 2015 00:19:03 +0000 (01:19 +0100)]
Revert the optimization of the bilinear weights.

For the case where the resampling changed every frame (e.g. a zoom),
it just consumed too much CPU to be worth it, especially in memory
management; this is painful because it was an elegant solution to
a tricky problem, but it just has to go for now.

Also drop out to fp32 at the first sight of too-high error.

5 years agoUpdate a comment that wasn't really wrong, but less relevant in this context.
Steinar H. Gunderson [Sun, 22 Feb 2015 23:42:24 +0000 (00:42 +0100)]
Update a comment that wasn't really wrong, but less relevant in this context.

5 years agoBring the variable names in optimize_sum_sq_error() closer to the comments.
Steinar H. Gunderson [Sun, 22 Feb 2015 23:30:35 +0000 (00:30 +0100)]
Bring the variable names in optimize_sum_sq_error() closer to the comments.

5 years agoIn ResampleEffect, optimize the bilinear weights on a global scale.
Steinar H. Gunderson [Sun, 22 Feb 2015 23:20:49 +0000 (00:20 +0100)]
In ResampleEffect, optimize the bilinear weights on a global scale.

In addition to the individual weight optimization we do when combining samples,
this technique optimizes the weights as a whole, through some linear algebra.
This means it can take into account effects such as multiple bilinear samples
influencing the same coefficient (which normally should not happen, but might
nevertheless due to imprecisions in the stored texture coordinates), or
non-combined sample positions that can't hit the exact middle of the texel.

In practical tests, this is extremely effective; it often reduces the computed
sum of squared coefficient errors by as much as a factor 1000, although I
haven't verified how often it actually saves us from having to do fp32 fallback
with the rather tight error bounds that are in place.

5 years agoMake ResampleEffect fall back to fp32 as needed.
Steinar H. Gunderson [Sat, 21 Feb 2015 17:54:56 +0000 (18:54 +0100)]
Make ResampleEffect fall back to fp32 as needed.

This should kill all precision issues when zooming. There are still
a few tricks we can do to improve fp16, but that's primarily a
performance issue.

5 years agoMake combine_two_samples() into a template instead of having manual rounding checks.
Steinar H. Gunderson [Sat, 21 Feb 2015 14:52:54 +0000 (15:52 +0100)]
Make combine_two_samples() into a template instead of having manual rounding checks.

5 years agoFix combining in ResampleEffect again.
Steinar H. Gunderson [Sat, 21 Feb 2015 14:33:33 +0000 (15:33 +0100)]
Fix combining in ResampleEffect again.

It was completely broken after the last patch, so we'd effectively
never combine.

5 years agoAdd some fp16 conversion overloads, for making code that can be templatized across...
Steinar H. Gunderson [Sat, 21 Feb 2015 14:26:57 +0000 (15:26 +0100)]
Add some fp16 conversion overloads, for making code that can be templatized across fp16 and fp32.

5 years agoWhen combining samples, take fp16 rounding into account.
Steinar H. Gunderson [Sat, 21 Feb 2015 01:27:14 +0000 (02:27 +0100)]
When combining samples, take fp16 rounding into account.

This makes us somewhat more conservative in combining samples;
when we are near the lower/right edges of the image, we are starting
to get close to 1.0, and fp16 just doesn't have enough precision
to give us the 6 or 8 bits of subpixel precision we want (it is
hardly enough to address individual pixels!). In particular, this
can affect zooming with ResampleEffect, as reported by Christophe

This does not fix all cases (especially not non-power-of-two cases);
for that, we will probably need to be able to fall back to fp32
when we detect fp16 doesn't work well.

5 years agoIn ResampleEffect, use a struct instead of manually fiddling with the two elements...
Steinar H. Gunderson [Fri, 20 Feb 2015 22:24:55 +0000 (23:24 +0100)]
In ResampleEffect, use a struct instead of manually fiddling with the two elements ourselves.

5 years agoCheck for __APPLE__ instead of __DARWIN__.
Steinar H. Gunderson [Thu, 15 Jan 2015 21:47:29 +0000 (22:47 +0100)]
Check for __APPLE__ instead of __DARWIN__.

Fixes compile with recent epoxy. Bug report and suggestion
by Dan Dennedy.

5 years agoMake number of BlurEffect taps configurable.
Steinar H. Gunderson [Mon, 22 Dec 2014 15:34:55 +0000 (16:34 +0100)]
Make number of BlurEffect taps configurable.

This can be useful if you are using blur as part of a larger effect
chain, where artifacts get masked by further processing.

Request and initial patch by Christophe Thommeret, although the patch
was redone from scratch.

6 years agoFix some typos that would cause the sampler number not to be incremented.
Steinar H. Gunderson [Thu, 16 Oct 2014 20:07:29 +0000 (22:07 +0200)]
Fix some typos that would cause the sampler number not to be incremented.

Found by Christophe Thommeret, who also noticed these are most likely
harmless since both effects with the bug are typically last in their

6 years agoRelease Movit 1.1.2. 1.1.2
Steinar H. Gunderson [Tue, 12 Aug 2014 21:02:03 +0000 (23:02 +0200)]
Release Movit 1.1.2.

6 years agoCorrect the number of blur taps read.
Steinar H. Gunderson [Sat, 26 Jul 2014 23:17:12 +0000 (01:17 +0200)]
Correct the number of blur taps read.

We read about twice as many as we should have; the others were
probably just set to 0.0, which has no effect but still burns
arithmetic, unless your driver happens to optimize very aggressively
for this (which I don't think anyone does anymore).

Found by Christophe Thommeret.