From 328098d027e7c35625acbb28e42732137c02d0c1 Mon Sep 17 00:00:00 2001 From: lucasart Date: Fri, 20 Nov 2015 23:23:53 -0800 Subject: [PATCH] Fix TT comment and static_assert() Comment is based on a misunderstanding of what unaligned memory access is. Here is an article that explains it very clearly: https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt No matter how we define TTEntry or TTCluster, there will never be any unaligned memory access. This is because the complier knows the alignment rules, and does the necessary adjustments to make sure unaligned memory access does not occur. The issue being adressed here has nothing to do with unaligned memory access. It is about cache performance. In order to achieve best cache performance: - we prefetch the cacheline as soon as possible. - we ensure that TT clusters do not spread across two cachelines. If they did, we would need to prefetch 2 cachelines, which could hurt cache performance. Therefore the true conditions to achieve this are: 1/ start adress of TT is cache line aligned. void TranspositionTable::resize() enforces this. 2/ TT cluster size should *divide* the cache line size. Currently, we pack 2 clusters per cache lines. It used to be 1 before "TT sardines". Does not matter what the ratio is, all we want is to fit an integer number of clusters per cache line. No functional change. Resolves #506 --- src/tt.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/tt.h b/src/tt.h index a983805b..84a4b9fa 100644 --- a/src/tt.h +++ b/src/tt.h @@ -76,8 +76,9 @@ private: /// A TranspositionTable consists of a power of 2 number of clusters and each /// cluster consists of ClusterSize number of TTEntry. Each non-empty entry /// contains information of exactly one position. The size of a cluster should -/// not be bigger than a cache line size. In case it is less, it should be padded -/// to guarantee always aligned accesses. +/// divide the size of a cache line size, to ensure that clusters never cross +/// cache lines. This ensures best cache performance, as the cacheline is +/// prefetched, as soon as possible. class TranspositionTable { @@ -86,10 +87,10 @@ class TranspositionTable { struct Cluster { TTEntry entry[ClusterSize]; - char padding[2]; // Align to the cache line size + char padding[2]; // Align to a divisor of the cache line size }; - static_assert(sizeof(Cluster) == CacheLineSize / 2, "Cluster size incorrect"); + static_assert(CacheLineSize % sizeof(Cluster) == 0, "Cluster size incorrect"); public: ~TranspositionTable() { free(mem); } -- 2.39.2