Support chaining certain effects after compute shaders. This rests on the notion of “strong one-to-one” that's very similar to the older “one-to-one“ concept, yet a bit stricter (in particular, PaddingEffect is not strong one-to-one).
Use C++11 override everywhere it is appropriate.
Make gamma polynomial constants into an array; slightly fewer uniforms to set, and it makes sense overall, since they belong so much together.
Rework uniform setting. One would think something as mundane as setting a few uniforms wouldn't really mean much for performance, but seemingly this is not always so -- I had a real-world shader that counted no less than 55 uniforms. Of course, not all of these were actually used, but we still have to go through looking up the name etc. for every single one, every single frame. Thus, we introduce a new way of dealing with uniforms: Register them before finalization time, and then EffectChain can store their numbers once and for all, instead of this repeated lookup. The system is also set up such that we can go to uniform buffer objects (UBOs) in the very near future. It's a bit unfortunate that uniform declaration now is removed from the .frag files, where it sat very nicely, but the alternative would be to try to parse GLSL, which I'm a bit wary at right now. All effects are converted, leaving the set_uniform_* functions without any users, but they are kept around for now in case external effects want them. This seems to bring 1–2% speedup for my use case; hopefully UBOs will bring a tiny bit more.
Collapse passes more aggressively in the face of size changes. The motivating chain for this change was a case where we had a SinglePassResampleEffect (the second half of a ResampleEffect) feeding into a PaddingEffect, feeding into an OverlayEffect. Currently, since the two former change output size, we'd bounce to a temporary texture twice (output size changes would always cause bounces). However, this is needlessly conservative. The reason for bouncing when changing output size is really if you want to get rid of data by downscaling and then later upsampling, e.g. for a blur. (It could also be useful for cropping, but we don't really use that right now; PaddingEffect, which does crop, explicitly checks the borders anyway to set the border color manually.) But in this case, we are not downscaling at all, so we could just drop the bounce, saving tons of texture bandwidth. Thus, we add yet more parameters that effects can specify; first, that an effect uses _one-to-one_ sampling; that is, that it will only use its input as-is without sampling between texels or outside the border (so the different interpolation and border behavior will be irrelevant). (Actually, almost all of our effects fall into this category.) Second, a flag saying that even if an effect changes size, it doesn't use virtual sizes (otherwise even a one-to-one effect would de-facto be sampling between texels). If these flags are set on the input and the output respectively, we can avoid the bounce, at least unless there's an effect that's _not_ one-to-one further up the chain. For my motivating case, this folded eight phases into four, changing ~16.0 ms into ~10.6 ms rendering time. Seemingly memory bandwidth is a really precious resource on my laptop's GPU.
Switch from using GLEW to epoxy. The main reason is that we would like to support GL 3.2+ core contexts, and then later quite possibly GLES.
Move everything into “namespace movit”. This is a pretty hard API break, but it's probably the last big API break before 1.0, and some of the names (e.g. Effect, Input ResourcePool) are really so generic that they should not be allowed to pollute the global namespace.
Another round of include-what-you-use.
Comment and README updates about Rec. 2020 in light of the accuracy test results.
Implement GammaExpansionEffect using ALU ops instead of texture lookups. In a standalone benchmark (on a Sandy Bridge laptop), this is pretty much a no-op performance-wise, but when more ops are put into the mix, it's a ~20% FPS win, and but in a more real situation with multiple inputs etc., it's subjectively also a pretty clear win. The reason is probably that we generally are way overloaded on texture operations. Note that we had similar code like this before (before we started using the texture for lookup), but it used pow(), which is markedly slower than our fourth-degree polynomial approximation. We should probably do the same for GammaCompressionEffect.
Fix some outdated comments in GammaExpansionEffect.
Prefix include guards with _MOVIT to avoid clashes with external files.
Split out some private utilities into effect_util.cpp, so we do not need to include e.g. Eigen from effect.h.
Run include-what-you-use over all of movit. Some hand tuning.
Make the internal effects private to EffectChain. ColorspaceConversionEffect, DitherEffect, GammaExpansionEffect and GammaCompressionEffect are all supposed to be used by EffectChain only, so make them private; I've had reports of users trying to use these directly, leaving the framework in a confused state.
Add the rest of the files for the premultiplied alpha commit.
Output the graph in dot form at finalize time.
Comment all of *_effect.h.
Mark some functions in Effect as const.
Add a new framework for 1D-LUTs via fp16 textures. Make the gamma compression and expansion effects use it.