git.sesse.net Git - stockfish/commit

author	Gian-Carlo Pascutto <gcp@sjeng.org>
	Wed, 1 Dec 2021 22:36:14 +0000 (23:36 +0100)
committer	Joost VandeVondele <Joost.VandeVondele@gmail.com>
	Fri, 3 Dec 2021 07:51:06 +0000 (08:51 +0100)
commit	c9977aa0a89c83bf21651bffd3b6f10c344ccc46
tree	5f8fd3b7e343fef06503fe94bd5f93d3973d2a60	tree \| snapshot
parent	c1f9a359e8e319d832ee5a55277dab996dd29d25	commit \| diff

Add AVX-VNNI support for Alder Lake and later.

In their infinite wisdom, Intel axed AVX512 from Alder Lake
chips (well, not entirely, but we kind of want to use the Gracemont
cores for chess!) but still added VNNI support.
Confusingly enough, this is not the same as VNNI256 support.

This adds a specific AVX-VNNI target that will use this AVX-VNNI
mode, by prefixing the VNNI instructions with the appropriate VEX
prefix, and avoiding AVX512 usage.

This is about 1% faster on P cores:

Result of  20 runs
==================
base (./clang-bmi2   ) =    3306337  +/- 7519
test (./clang-vnni   ) =    3344226  +/- 7388
diff                   =     +37889  +/- 4153

speedup        = +0.0115
P(speedup > 0) =  1.0000

But a nice 3% faster on E cores:

Result of  20 runs
==================
base (./clang-bmi2   ) =    1938054  +/- 28257
test (./clang-vnni   ) =    1994606  +/- 31756
diff                   =     +56552  +/- 3735

speedup        = +0.0292
P(speedup > 0) =  1.0000

This was measured on Clang 13. GCC 11.2 appears to generate
worse code for Alder Lake, though the speedup on the E cores
is similar.

It is possible to run the engine specifically on the P or E using binding,
for example in linux it is possible to use (for an 8 P + 8 E setup like i9-12900K):
taskset -c 0-15 ./stockfish
taskset -c 16-23 ./stockfish
where the first call binds to the P-cores and the second to the E-cores.

closes https://github.com/official-stockfish/Stockfish/pull/3824

No functional change

src/Makefile		diff \| blob \| history
src/simd.h		diff \| blob \| history