AVX512, AVX2 and SSSE3 speedups
authormstembera <MissingEmail@email>
Sat, 12 Dec 2020 22:18:38 +0000 (14:18 -0800)
committerJoost VandeVondele <Joost.VandeVondele@gmail.com>
Mon, 14 Dec 2020 06:46:15 +0000 (07:46 +0100)
commitd862ba40692797031ec5b0d95e46bcfc5a80f06c
tree1db65f8ed8f2d500653fae4eed5aa1837fc3d556
parentd706ae62d73d90c0f80cdccd58384a347295d549
AVX512, AVX2 and SSSE3 speedups

Improves throughput by summing 2 intermediate dot products using 16 bit addition before upconverting to 32 bit.

Potential saturation is detected and the code-path is avoided in this case.
The saturation can't happen with the current nets,
but nets can be constructed that trigger this check.

STC https://tests.stockfishchess.org/tests/view/5fd40a861ac1691201888479
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 25544 W: 2451 L: 2296 D: 20797
Ptnml(0-2): 92, 1761, 8925, 1888, 106

about 5% speedup

closes https://github.com/official-stockfish/Stockfish/pull/3261

No functional change
src/nnue/layers/affine_transform.h