Allow symlinked frame files. Useful for testing.

[nageru] / variational_refinement.txt
diff --git a/variational_refinement.txt b/variational_refinement.txt

index 58d06639a647876d1197e99ed7918addca1b42f6..0392011cbd7576ba3d9665acc7d48c51b8e5fc2f 100644 (file)
--- a/variational_refinement.txt
+++ b/variational_refinement.txt
@@ -9,7 +9,7 @@ below.
  The general idea is fairly simple; we try to optimize the flow field
  as a whole, by minimizing some mathematical notion of badness expressed
  as an energy function. The one used in the dense inverse search paper
-[Kroeger05; se references below] has this form:
+[Kroeger16; se references below] has this form:
  
    E(U) = int( σ Ψ(E_I) + γ Ψ(E_G) + α Ψ(E_S) ) dx
  
@@ -27,7 +27,7 @@ so the word “refinement” is maybe not doing the method justice.
  One could just as well say that the motion search is a way of
  finding a reasonable starting point for the optimization.)
  
-The dense inverse search paper [Kroeger05; se references below] sets
+The dense inverse search paper [Kroeger16; se references below] sets
  up the energy terms as described by some motion tensors and normalizations,
  then says simply that it is optimized by “θ_vo fixed point iterations
  and θ_vi iterations of Successive Over Relaxation (SOR) for the linear
@@ -153,8 +153,8 @@ iteration)
  
  we have the much more manageable
  
-  k1 I_x²    u + k1 I_x I_y v = - k1 I_t
-  k1 I_x I_y u + k1 I_y²    v = - k1 I_t
+  k1 I_x²    u + k1 I_x I_y v = - k1 I_t I_x
+  k1 I_x I_y u + k1 I_y²    v = - k1 I_t I_y
  
  ie., two linear equations in u and v. Now, you will notice two
  immediate problems by this equation set:
@@ -304,8 +304,8 @@ They allow us to minimize expressions that contain x, y, u(x, y) _and_ the parti
  derivatives u_x(x, y) and u_y(x, y), although the answer becomes a differential
  equation.
  
-The Wikipedia page is, unfortunately, suitable for scaring small
-children, but the general idea is: Differentiate the expression by u_x
+The Wikipedia page is, unfortunately, not very beginner-friendly,
+but the general idea is: Differentiate the expression by u_x
  (yes, differentiating by a partial derivative!), negate it, and then
  differentiate the result by x. Then do the same thing by u_y and y,
  add the two results together and equate to zero. Mathematically
@@ -335,15 +335,15 @@ operator; this factor has to be discretized together with u before we can treat
  it as a constant. So instead, we'll define it as a function (called the
  _diffusivity_ at the given point):
  
-  g(x, y) = -1 / sqrt(u_x² + u_y² + v_x² + v_y² + ε²) = 0
+  g(x, y) = 1 / sqrt(u_x² + u_y² + v_x² + v_y² + ε²) = 0
  
  which gives us
  
-  ∂/∂x ( g(x, y) u_x ) + ∂/∂y ( g(x, y) u_y ) = 0
+  - ∂/∂x ( g(x, y) u_x ) - ∂/∂y ( g(x, y) u_y ) = 0
  
  We'll also have a similar equation for minimizing v, of course:
  
-  ∂/∂x ( g(x, y) v_x ) + ∂/∂y ( g(x, y) v_y ) = 0
+  - ∂/∂x ( g(x, y) v_x ) - ∂/∂y ( g(x, y) v_y ) = 0
  
  There's no normalization term β here, unlike the other terms; DeepFlow2
  adds one, but we're not including it here.
@@ -364,11 +364,11 @@ differently in the numerics anyway.)
  
  This gives us:
  
-  ∂/∂x ( g(x, y) (u0 + du)_x ) + ∂/∂y ( g(x, y) (u0 + du)_y ) = 0
+  - ∂/∂x ( g(x, y) (u0 + du)_x ) - ∂/∂y ( g(x, y) (u0 + du)_y ) = 0
  
  or
  
-  ∂/∂x ( g(x, y) du_x ) + ∂/∂y ( g(x, y) du_y ) = - ∂/∂x ( g(x, y) u0_x ) - ∂/∂y ( g(x, y) u0_y )
+  - ∂/∂x ( g(x, y) du_x ) - ∂/∂y ( g(x, y) du_y ) = ∂/∂x ( g(x, y) u0_x ) + ∂/∂y ( g(x, y) u0_y )
  
  where the right-hand side is effectively a constant for these purposes
  (although it still needs to be calculated anew for each iteration,
@@ -399,21 +399,21 @@ du' and dv' will be zero.)
  
  Now back to our equations, as we look at practical implementation:
  
-  ∂/∂x ( g(x, y) du_x ) + ∂/∂y ( g(x, y) du_y ) = - ∂/∂x ( g(x, y) u0_x ) - ∂/∂y ( g(x, y) u0_y )
-  ∂/∂x ( g(x, y) dv_x ) + ∂/∂y ( g(x, y) dv_y ) = - ∂/∂x ( g(x, y) v0_x ) - ∂/∂y ( g(x, y) v0_y )
+  - ∂/∂x ( g(x, y) du_x ) - ∂/∂y ( g(x, y) du_y ) = ∂/∂x ( g(x, y) u0_x ) + ∂/∂y ( g(x, y) u0_y )
+  - ∂/∂x ( g(x, y) dv_x ) - ∂/∂y ( g(x, y) dv_y ) = ∂/∂x ( g(x, y) v0_x ) + ∂/∂y ( g(x, y) v0_y )
  
-We can discretize the left-hand and right-hand side identically, so let's
-look only at
+We can discretize the left-hand and right-hand side identically (they differ
+only in signs and in variable), so let's look only at
  
-  ∂/∂x ( g(x, y) du_x ) + ∂/∂y ( g(x, y) du_y )
+  - ∂/∂x ( g(x, y) du_x ) - ∂/∂y ( g(x, y) du_y )
  
  [Brox05] equation (2.14) (which refers to a 1998 book, although I couldn't
  immediately find the equation in question in that book) discretizes this as
  
-    1/2 (g(x+1, y) + g(x, y)) (du(x+1, y) - du(x, y))
-  - 1/2 (g(x-1, y) + g(x, y)) (du(x, y) - du(x-1, y))
-  + 1/2 (g(x, y+1) + g(x, y)) (du(x, y+1) - du(x, y))
-  - 1/2 (g(x, y-1) + g(x, y)) (du(x, y) - du(x, y-1))
+  - 1/2 (g(x+1, y) + g(x, y)) (du(x+1, y) - du(x, y))
+  + 1/2 (g(x-1, y) + g(x, y)) (du(x, y) - du(x-1, y))
+  - 1/2 (g(x, y+1) + g(x, y)) (du(x, y+1) - du(x, y))
+  + 1/2 (g(x, y-1) + g(x, y)) (du(x, y) - du(x, y-1))
  
  It also mentions that it would be better to sample g at the half-way points,
  e.g. g(x+0.5, y), but that begs the question exactly how we'd do that, and
@@ -422,7 +422,7 @@ DeepFlow doesn't seem to care, so we stick with their version.
  Now we can finally let g use the values of the flow (note that this is the
  actual flow u and v, not du and dv!) from the previous iteration, as before:
  
-  g(x, y) = -1 / sqrt(u'_x² + u'_y² + v'_x² + v'_y² + ε²) = 0
+  g(x, y) = 1 / sqrt(u'_x² + u'_y² + v'_x² + v'_y² + ε²)
  
  The single derivatives in g(x) are approximated by standard central differences
  (see https://en.wikipedia.org/wiki/Finite_difference_coefficient), e.g.
@@ -444,18 +444,18 @@ Let's now define a smoothness constant between the neighbors (x,y) and (x1,y1):
  Collecting all the du(x, y) terms of the discretized equation above,
  ignoring the right-hand side, which is just a constant for us anyway:
  
-    s(x+1, y) (du(x+1, y) - du(x, y))
-  - s(x-1, y) (du(x, y) - du(x-1, y))
-  + s(x, y+1) (du(x, y+1) - du(x, y))
-  - s(x, y-1) (du(x, y) - du(x, y-1)) = C
+  - s(x+1, y) (du(x+1, y) - du(x, y))
+  + s(x-1, y) (du(x, y) - du(x-1, y))
+  - s(x, y+1) (du(x, y+1) - du(x, y))
+  + s(x, y-1) (du(x, y) - du(x, y-1)) = C
  
-    s(x+1, y) du(x+1, y) - s(x+1, y) du(x, y)
-  - s(x-1, y) du(x, y) + s(x-1, y) du(x-1, y)
-  + s(x, y+1) du(x, y+1) - s(x, y+1) du(x, y)
-  - s(x, y-1) du(x, y) + s(x, y-1) du(x, y-1) = C
+  - s(x+1, y) du(x+1, y) + s(x+1, y) du(x, y)
+  + s(x-1, y) du(x, y) - s(x-1, y) du(x-1, y)
+  - s(x, y+1) du(x, y+1) + s(x, y+1) du(x, y)
+  + s(x, y-1) du(x, y) - s(x, y-1) du(x, y-1) = C
  
-  - (s(x+1, y) + s(x-1, y) + s(x, y+1) + s(x, y-1)) du(x, y) =
-  - s(x+1, y) du(x+1, y) - s(x-1, y) du(x-1, y) - s(x, y+1) du(x, y+1) - s(x, y-1) du(x, y-1) + C
+  (s(x+1, y) + s(x-1, y) + s(x, y+1) + s(x, y-1)) du(x, y) =
+  s(x+1, y) du(x+1, y) + s(x-1, y) du(x-1, y) + s(x, y+1) du(x, y+1) + s(x, y-1) du(x, y-1) + C
  
  It is interesting to note that if s = 1 uniformly, which would be the case
  without our penalizer Ψ(a²), we would have the familiar discrete Laplacian,
@@ -465,36 +465,53 @@ neighbors.
  Now our equation system is finally complete and linear, and the rest is
  fairly pedestrian. The last term connects all the unknowns together,
  but we still solve them mostly as 2x2 matrices. The most basic iterative
-method is Gauss-Seidel, where we solve du(x, y) and dv(x,y) using the
+method is Jacobi, where we solve du(x, y) and dv(x,y) using the
  previous iteration's value for all other du/dv values. (That this converges
  at all it beyond this text to prove, but it does. Not that we bother
  iterating until it converges; a few iterations is good enough.)
-Jacobi iterations improve on this in that (surprisingly!) using this
+Gauss-Seidel iterations improve on this in that (surprisingly!) using this
  iteration's computed du/dv values if they're ready; this improves convergence,
-but is hard to parallelize. On the GPU, we render to the same texture as
-we render from; as per the OpenGL spec, this will give us undefined behavior
-on read (since our read/write sets are neither identical nor disjoint),
-but in practice, we'll get either the old value (Gauss-Seidel) or the
-new one (Jacobi); more likely, the former.
+but is hard to parallelize.
  
  Successive over-relaxation (SOR) improves further on this, in that it
  assumes that the solution moves towards the right value, so why not
-just go a bit further? That is, if Jacobi would tell you to increase
+just go a bit further? That is, if Gauss-Seidel would tell you to increase
  the flow by 1.0 pixel to the right, perhaps go 1.5 pixels to the right
  instead (this value is called ω). Again, the convergence proof is beyond the
  scope here, but SOR converges for any ω between 1 and 2 (1 gives plain
-Jacobi, and over 2, we risk overshooting and never converging). Optimal
+Gauss-Seidel, and over 2, we risk overshooting and never converging). Optimal
  ω depends on the equation system; DIS uses ω = 1.6, which presumably was
-measured, and should be OK for us too, even if we are closer to Gauss-Seidel
-than to Jacobi.
+measured, while we do ω = 1.8 (seems to be marginally better after some
+light testing).
+
+Efficient GPU implementation of SOR is not trivial; like noted before,
+Gauss-Seidel is inherently serial, which is a poor match for the GPU.
+Worse, doing SOR with Jacobi as base instead of Gauss-Seidel makes for
+an algorithm which simply does not converge. We solve this by using a
+method called red-black SOR (not to be confused with red-black binary
+trees). Conceptually, it assigns every unknown a color, with every other
+being red or black (similar to a checkerboard). Since red values now
+only depend on black values and vice versa, one can do all red values
+in parallel, then all black values, and so on. (This is equivalent to
+reordering the equation set; different such orderings can have different
+convergence speeds.)
+
+Our GPU SOR implementation is not overly efficient, so essentially one such
+half-iteration of red-black SOR costs the same as one full iteration of
+Jacobi but convergence is so much faster that it's worth it. Generally
+speaking, Gauss-Seidel converges twice as fast as Jacobi (ie., if Jacobi
+converges in N iterations, Gauss-Seidel does so in N/2), but SOR converges
+_geometrically_ faster, ie., in O(√N) iterations.
  
  Do note that the DeepFlow code does not fully use SOR or even Gauss-Seidel;
  it solves every 2x2 block (ie., single du/dv pair) using Cramer's rule,
-and then pushes that vector 80% further, SOR-style. This would be clearly
+and then pushes that vector 60% further, SOR-style. This would be clearly
  more accurate if we didn't have SOR in the mix (since du and dv would
  converge immediately relative to each other, bar Cramer's numerical issues),
  but I'm not sure whether it's better given SOR. (DIS changes this to a more
-traditional SOR formulation, which we also use.)
+traditional SOR formulation, which we also use. It doesn't seem to be much
+different in practical testing; perhaps minutely worse, but I haven't done
+a deep analysis here.)
  
  And that's it. References:
  
@@ -510,7 +527,7 @@ And that's it. References:
  [Fahad07]: Fahad, Morris: “Multiple Combined Constraints for Optical Flow
    Estimation”, in Proceedings of the 3rd International Conference on Advances
    in Visual Computing (ISVC), 2007
-[Kroeger05]: Kroeger, Timofte, Dai, van Gool: “Fast Optical Flow using Dense
+[Kroeger16]: Kroeger, Timofte, Dai, van Gool: “Fast Optical Flow using Dense
    Inverse Search”, in Proceedings of the European Conference on Computer Vision
    (ECCV), 2016
  [Weinzaepfel13]: Weinzaepfel, Revaud, Harchaoui, Schmid: “DeepFlow: Large