AVX-512 for smaller affine and feature transforms.
authorTomasz Sobczyk <tomasz.sobczyk1997@gmail.com>
Tue, 3 Nov 2020 21:49:10 +0000 (22:49 +0100)
committerJoost VandeVondele <Joost.VandeVondele@gmail.com>
Sat, 7 Nov 2020 15:49:49 +0000 (16:49 +0100)
commitba35c88ab84b959d41a67b3d8fcb40adc6537ec8
tree147b8d99c19b11b2f2a740be99b6f8f0e3d1cb4a
parent7fc47eeb6f6b5f3c5ff697e974093ff14413e42c
AVX-512 for smaller affine and feature transforms.

For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.

For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.

These changes provide about 1.5% speedup for AVX-512 builds.

Closes https://github.com/official-stockfish/Stockfish/pull/3218

No functional change.
src/nnue/layers/affine_transform.h
src/nnue/nnue_feature_transformer.h