It was due to a missing -msse compiler option !
Without this option the CPU silently discards
prefetcht2 instructions during execution.
Also added a (gcc documented) hack to prevent Intel
compiler to optimize away the prefetches.
Special thanks to Heinz for testing and suggesting
improvments. And for Jim for testing icc on Windows.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
### Compiler speed switches for both GCC and ICC. These settings are generally
### fast on a broad range of systems, but may be changed experimentally
### ==========================================================================
### Compiler speed switches for both GCC and ICC. These settings are generally
### fast on a broad range of systems, but may be changed experimentally
### ==========================================================================
-GCCFLAGS = -O3
-ICCFLAGS = -fast
+GCCFLAGS = -O3 -msse
+ICCFLAGS = -fast -msse
### ==========================================================================
### ==========================================================================
### Dependencies. Do not change
.depend:
### Dependencies. Do not change
.depend:
- $(CXX) -MM $(OBJS:.o=.cpp) > $@
+ $(CXX) -msse -MM $(OBJS:.o=.cpp) > $@
#include <cassert>
#include <cmath>
#include <cstring>
#include <cassert>
#include <cmath>
#include <cstring>
#include "movegen.h"
#include "tt.h"
#include "movegen.h"
#include "tt.h"
-#if defined(_MSC_VER)
-#include <xmmintrin.h>
-#endif
-
// The main transposition table
TranspositionTable TT;
// The main transposition table
TranspositionTable TT;
void TranspositionTable::prefetch(const Key posKey) const {
void TranspositionTable::prefetch(const Key posKey) const {
-#if defined(_MSC_VER)
- char* addr = (char*)first_entry(posKey);
- _mm_prefetch(addr, _MM_HINT_T0);
- _mm_prefetch(addr+64, _MM_HINT_T0);
-#else
- // We need to force an asm volatile here because gcc builtin
- // is optimized away by Intel compiler.
- char* addr = (char*)first_entry(posKey);
- asm volatile("prefetcht0 %0" :: "m" (addr));
+#if defined(__INTEL_COMPILER) || defined(__ICL)
+ // This hack prevents prefetches to be optimized away by the
+ // Intel compiler. Both MSVC and gcc seems not affected.
+ __asm__ ("");
+
+ char const* addr = (char*)first_entry(posKey);
+ _mm_prefetch(addr, _MM_HINT_T2);
+ _mm_prefetch(addr+64, _MM_HINT_T2); // 64 bytes ahead