For most users, this is mostly theoretical, as it requires compiling
with -march=native or similar. And these are definitely meant for
vectorizing, although it's still 2-3x as fast to use them as our own
software fallback.
These are supported starting from Haswell, and also by some AMD CPUs.