Update the SOR comment about twinned buffering.

author Steinar H. Gunderson <sgunderson@bigfoot.com>

Sat, 4 Aug 2018 19:20:26 +0000 (21:20 +0200)

committer Steinar H. Gunderson <sgunderson@bigfoot.com>

Sat, 4 Aug 2018 19:21:27 +0000 (21:21 +0200)
author Steinar H. Gunderson <sgunderson@bigfoot.com>
Sat, 4 Aug 2018 19:20:26 +0000 (21:20 +0200)
committer Steinar H. Gunderson <sgunderson@bigfoot.com>
Sat, 4 Aug 2018 19:21:27 +0000 (21:21 +0200)
diff --git a/sor.frag b/sor.frag

index e1f86bbbfea8c3f6f019857369fe220eb69e5e4c..ef431d3346d96be4e0270ddd6ad4c9ebd1664a63 100644 (file)
--- a/sor.frag
+++ b/sor.frag
@@ -45,8 +45,9 @@ void main()
         // just immediately throws away half of the warp, but it helps convergence
         // a _lot_ (rough testing indicates that five iterations of SOR is as good
         // as ~50 iterations of Jacobi). We could probably do better by reorganizing
-       // the data into two-values-per-pixel, so-called “twinning buffering”,
-       // but it makes for rather annoying code in the rest of the pipeline.
+       // the data into two-values-per-pixel, so-called “twinned buffering”;
+       // seemingly, it helps Haswell by ~15% on the SOR code, but GTX 950 not at all
+       // (at least not on 720p). Presumably the latter is already bandwidth bound.
         int color = int(round(element_sum_idx)) & 1;
         if (color != phase) discard;
author	Steinar H. Gunderson <sgunderson@bigfoot.com>
	Sat, 4 Aug 2018 19:20:26 +0000 (21:20 +0200)
committer	Steinar H. Gunderson <sgunderson@bigfoot.com>
	Sat, 4 Aug 2018 19:21:27 +0000 (21:21 +0200)