We introduce a small bit of error in the combining (due to having to
compensate for lack of subpixel sampling precision), so normalize
after it rather than before it. Also, do a second normalization pass,
which seemingly helps sometimes (probably due to inaccuracies in the
float sum).
This seems to kill about half the precision loss on Intel, at least.