From: Marco Costalba <mcostalba@gmail.com>
Date: Mon, 29 Aug 2016 07:11:20 +0000 (+0200)
Subject: Use per-thread counterMoveHistory
X-Git-Url: https://git.sesse.net/?p=stockfish;a=commitdiff_plain;h=5c58d1f5cb4871595c07e6c2f6931780b5ac05b5

Use per-thread counterMoveHistory

Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:

10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752

5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084

We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.

When we move to high threads number it's another story:

5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949

2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590

As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.

So, summarizing:

GOOD:

Measured big improvement in high core scenario

Suitable for TCEC 9 superfinal (big hardware, very long TC)

Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread

Non functional change for the common case of a single core

Very simple (just 6 lines modified, no added ones)

BAD:

Small regression (within 2-3 ELO) with few threads and short TC

bench: 5341477
---

diff --git a/src/search.cpp b/src/search.cpp
index 542144bf..c09e23d6 100644
--- a/src/search.cpp
+++ b/src/search.cpp
@@ -157,7 +157,6 @@ namespace {
 
   EasyMoveManager EasyMove;
   Value DrawValue[COLOR_NB];
-  CounterMoveHistoryStats CounterMoveHistory;
 
   template <NodeType NT>
   Value search(Position& pos, Stack* ss, Value alpha, Value beta, Depth depth, bool cutNode);
@@ -208,13 +207,13 @@ void Search::init() {
 void Search::clear() {
 
   TT.clear();
-  CounterMoveHistory.clear();
 
   for (Thread* th : Threads)
   {
       th->history.clear();
       th->counterMoves.clear();
       th->fromTo.clear();
+      th->counterMoveHistory.clear();
   }
 
   Threads.main()->previousScore = VALUE_INFINITE;
@@ -807,7 +806,7 @@ namespace {
             if (pos.legal(move))
             {
                 ss->currentMove = move;
-                ss->counterMoves = &CounterMoveHistory[pos.moved_piece(move)][to_sq(move)];
+                ss->counterMoves = &thisThread->counterMoveHistory[pos.moved_piece(move)][to_sq(move)];
                 pos.do_move(move, st, pos.gives_check(move));
                 value = -search<NonPV>(pos, ss+1, -rbeta, -rbeta+1, rdepth, !cutNode);
                 pos.undo_move(move);
@@ -968,7 +967,7 @@ moves_loop: // When in check search starts from here
       }
 
       ss->currentMove = move;
-      ss->counterMoves = &CounterMoveHistory[moved_piece][to_sq(move)];
+      ss->counterMoves = &thisThread->counterMoveHistory[moved_piece][to_sq(move)];
 
       // Step 14. Make the move
       pos.do_move(move, st, givesCheck);
diff --git a/src/thread.h b/src/thread.h
index 8181163e..969fe635 100644
--- a/src/thread.h
+++ b/src/thread.h
@@ -66,11 +66,12 @@ public:
   Position rootPos;
   Search::RootMoves rootMoves;
   Depth rootDepth;
-  HistoryStats history;
-  MoveStats counterMoves;
   FromToStats fromTo;
   Depth completedDepth;
   std::atomic_bool resetCalls;
+  HistoryStats history;
+  MoveStats counterMoves;
+  CounterMoveHistoryStats counterMoveHistory;
 };