+ // Clear the least significant 14 bits of h_step, to avoid
+ // divergence when accumulating h_step BUF_SIZE times into
+ // a float variable which may or may not have extra intermediate
+ // precision. Therefore clear roughly log2(BUF_SIZE) less
+ // significant bits, to get the same result regardless of any
+ // extra precision in the accumulator.
+ clear_less_significant_bits((INTFLOAT *)h_step, 2 * 4, 14);