Use tiling to speed up accumulator refreshes and updates
authorsyzygy1 <3028851+syzygy1@users.noreply.github.com>
Wed, 16 Sep 2020 15:39:11 +0000 (17:39 +0200)
committerJoost VandeVondele <Joost.VandeVondele@gmail.com>
Thu, 17 Sep 2020 15:24:52 +0000 (17:24 +0200)
commit8b8a510fd6a1a17b39b2d4b166f60ac7be0dab23
tree694f9e90416d25daa8b91c6d28b33894c09e016d
parent64a63464d7bc72a3aac33aa680cd2b2b240ff903
Use tiling to speed up accumulator refreshes and updates

Perform the update and refresh operations tile by tile in a local
array of vectors. By selecting the array size carefully, we
achieve that the compiler keeps the whole array in vector registers.

Idea and original implementation by @sf-x.

STC: https://tests.stockfishchess.org/tests/view/5f623eec912c15f19854b855
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 4872 W: 623 L: 477 D: 3772
Ptnml(0-2): 14, 350, 1585, 450, 37

LTC: https://tests.stockfishchess.org/tests/view/5f62434e912c15f19854b860
LLR: 2.94 (-2.94,2.94) {0.25,1.25}
Total: 25808 W: 1565 L: 1401 D: 22842
Ptnml(0-2): 23, 1186, 10332, 1330, 33

closes https://github.com/official-stockfish/Stockfish/pull/3130

No functional change
src/nnue/nnue_feature_transformer.h