Use the F16C instruction set when available.
authorSteinar H. Gunderson <sgunderson@bigfoot.com>
Mon, 23 Feb 2015 19:17:49 +0000 (20:17 +0100)
committerSteinar H. Gunderson <sgunderson@bigfoot.com>
Mon, 23 Feb 2015 19:17:49 +0000 (20:17 +0100)
commit6be20704cdc7b64e37cc886b7872df58ef66eb1f
tree09e894480a02ccf1bcd6c6c6248920574c08eef9
parent641053e9fc86b2166e361a983075febc3bb69acd
Use the F16C instruction set when available.

For most users, this is mostly theoretical, as it requires compiling
with -march=native or similar. And these are definitely meant for
vectorizing, although it's still 2-3x as fast to use them as our own
software fallback.

These are supported starting from Haswell, and also by some AMD CPUs.
fp16.cpp
fp16.h