Affine transform refactoring.
authorMaximMolchanov <maksym.n.molchanov@gmail.com>
Wed, 6 Jan 2021 03:29:32 +0000 (05:29 +0200)
committerJoost VandeVondele <Joost.VandeVondele@gmail.com>
Fri, 8 Jan 2021 15:35:44 +0000 (16:35 +0100)
commit23c385ec36f9d5a9514ec5b0811ec99d08b45e90
tree649f681618259c1bb12cfb98a93789440f40797d
parentd21e421ad74cff3b157d156d6ea8fdee3634e75b
Affine transform refactoring.

Reordered weights in such a way that accumulated sum fits to output.
Weights are grouped in blocks of four elements because four
int8 (weight type) corresponds to one int32 (output type).
No horizontal additions.
Grouped AVX512, AVX2 and SSSE3 implementations.
Repeated code was removed.

An earlier version passed STC:

LLR: 2.97 (-2.94,2.94) {-0.25,1.25}
Total: 15336 W: 1495 L: 1355 D: 12486
Ptnml(0-2): 44, 1054, 5350, 1158, 62
https://tests.stockfishchess.org/tests/view/5ff60e106019e097de3eefd5

Speedup depends on the architecture, up to 4% measured on a NNUE only bench.

closes https://github.com/official-stockfish/Stockfish/pull/3287

No functional change
src/nnue/layers/affine_transform.h