4 The motion search is one of the two major components of DIS. It works more or less
5 like you'd expect; there's a bunch of overlapping patches (8x8 or 12x12 pixels) in
6 a grid, and for each patch, there's a search to try to find the most similar patch
9 Unlike in a typical video codec, the DIS patch search is based on gradient descent;
10 conceptually, you start with an initial guess (the value from the previous level,
11 or the zero flow for the very first level), subtract the reference (“template”)
12 patch from the candidate, look at the gradient to see in what direction there is
13 a lower difference, and then inch a bit toward that direction. (There is seemingly
14 nothing like AdaM, Momentum or similar, but the searched value is only in two
15 dimensions, so perhaps it doesn't matter as much then.)
17 DIS does a tweak to this concept. Since the procedure as outlined above requires
18 computing the gradient of the candidate patch, it uses the reference patch as
19 candidate (thus the “inverse” name), and thus uses _its_ gradient to understand
20 in which direction to move. (This is a bit dodgy, but not _that_ dodgy; after
21 all, the two patches are supposed to be quite similar, so their surroundings and
22 thus also gradients should also be quite similar.) It's not entirely clear whether
23 this is still a win on GPU, where calculations are much cheaper, especially
24 the way we parallelize the search, but we've kept it around for now.
26 The inverse search is explained and derived in the supplementary material of the
27 paper, section A. Do note that there's a typo; the text under equation 9 claims
28 that the matrix H is n x n (where presumably n is the patch size), while in reality,
31 Our GPU parallellization is fairly dumb right now; we do one patch per fragment
32 (ie., parallellize only over patches, not within each patch), which may not
33 be optimal. In particular, in the initial level, we only have 40 patches,
34 which is on the low side for a GPU, and the memory access patterns may also not
38 const uint patch_size = 12;
39 const uint num_iterations = 8;
45 uniform sampler2D flow_tex, grad0_tex, image0_tex, image1_tex;
46 uniform vec2 inv_image_size, inv_prev_level_size;
50 vec2 image_size = textureSize(image0_tex, 0);
52 // Lock the patch center to an integer, so that we never get
53 // any bilinear artifacts for the gradient. (NOTE: This assumes an
54 // even patch size.) Then calculate the bottom-left texel of the patch.
55 vec2 base = (round(patch_center * image_size) - (0.5f * patch_size - 0.5f))
58 // First, precompute the pseudo-Hessian for the template patch.
59 // This is the part where we really save by the inverse search
60 // (ie., we can compute it up-front instead of anew for each
65 // where S is the gradient at each point in the patch. Note that
66 // this is an outer product, so we get a (symmetric) 2x2 matrix,
69 vec2 grad_sum = vec2(0.0f); // Used for patch normalization.
70 float template_sum = 0.0f;
71 for (uint y = 0; y < patch_size; ++y) {
72 for (uint x = 0; x < patch_size; ++x) {
73 vec2 tc = base + uvec2(x, y) * inv_image_size;
74 vec2 grad = texture(grad0_tex, tc).xy;
75 H[0][0] += grad.x * grad.x;
76 H[1][1] += grad.y * grad.y;
77 H[0][1] += grad.x * grad.y;
79 template_sum += texture(image0_tex, tc).x;
85 // Make sure we don't get a singular matrix even if e.g. the picture is
86 // all black. (The paper doesn't mention this, but the reference code
87 // does it, and it seems like a reasonable hack to avoid NaNs. With such
88 // a H, we'll go out-of-bounds pretty soon, though.)
89 if (determinant(H) < 1e-6) {
94 mat2 H_inv = inverse(H);
96 // Fetch the initial guess for the flow, and convert from the previous size to this one.
97 vec2 initial_u = texture(flow_tex, flow_tc).xy * (image_size * inv_prev_level_size);
99 float mean_diff, first_mean_diff;
101 for (uint i = 0; i < num_iterations; ++i) {
102 vec2 du = vec2(0.0, 0.0);
103 float warped_sum = 0.0f;
104 vec2 u_norm = u * inv_image_size; // In [0..1] coordinates instead of pixels.
105 for (uint y = 0; y < patch_size; ++y) {
106 for (uint x = 0; x < patch_size; ++x) {
107 vec2 tc = base + uvec2(x, y) * inv_image_size;
108 vec2 grad = texture(grad0_tex, tc).xy;
109 float t = texture(image0_tex, tc).x;
110 float warped = texture(image1_tex, tc + u_norm).x;
111 du += grad * (warped - t);
112 warped_sum += warped;
116 // Subtract the mean for patch normalization. We've done our
117 // sums without subtracting the means (because we didn't know them
120 // sum(S^T * ((x + µ1) - (y + µ2))) = sum(S^T * (x - y)) + (µ1 – µ2) sum(S^T)
122 // which gives trivially
124 // sum(S^T * (x - y)) = [what we calculated] - (µ1 - µ2) sum(S^T)
126 // so we can just subtract away the mean difference here.
127 mean_diff = (warped_sum - template_sum) * (1.0 / (patch_size * patch_size));
128 du -= grad_sum * mean_diff;
131 first_mean_diff = mean_diff;
134 // Do the actual update.
138 // Reject if we moved too far. Note that the paper says “too far” is the
139 // patch size, but the DIS code uses half of a patch size. The latter seems
140 // to give much better overall results.
142 // Also reject if the patch goes out-of-bounds (the paper does not mention this,
143 // but the code does, and it seems to be critical to avoid really bad behavior
145 vec2 patch_center = (base * image_size - 0.5f) + patch_size * 0.5f + u;
146 if (length(u - initial_u) > (patch_size * 0.5f) ||
147 patch_center.x < -(patch_size * 0.5f) ||
148 image_size.x - patch_center.x < -(patch_size * 0.5f) ||
149 patch_center.y < -(patch_size * 0.5f) ||
150 image_size.y - patch_center.y < -(patch_size * 0.5f)) {
152 mean_diff = first_mean_diff;
155 // NOTE: The mean patch diff will be for the second-to-last patch,
156 // not the true position of du. But hopefully, it will be very close.
158 out_flow = vec3(u.x, u.y, mean_diff);