git.sesse.net Git - stockfish/commit

AVX-512 for smaller affine and feature transforms.

For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.

For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.

These changes provide about 1.5% speedup for AVX-512 builds.

Closes https://github.com/official-stockfish/Stockfish/pull/3218

No functional change.

author	Tomasz Sobczyk <tomasz.sobczyk1997@gmail.com>
	Tue, 3 Nov 2020 21:49:10 +0000 (22:49 +0100)
committer	Joost VandeVondele <Joost.VandeVondele@gmail.com>
	Sat, 7 Nov 2020 15:49:49 +0000 (16:49 +0100)
commit	ba35c88ab84b959d41a67b3d8fcb40adc6537ec8
tree	147b8d99c19b11b2f2a740be99b6f8f0e3d1cb4a	tree \| snapshot
parent	7fc47eeb6f6b5f3c5ff697e974093ff14413e42c	commit \| diff

src/nnue/layers/affine_transform.h		diff \| blob \| history
src/nnue/nnue_feature_transformer.h		diff \| blob \| history