From: Marco Costalba Date: Mon, 29 Aug 2016 07:11:20 +0000 (+0200) Subject: Use per-thread counterMoveHistory X-Git-Url: https://git.sesse.net/?p=stockfish;a=commitdiff_plain;h=5c58d1f5cb4871595c07e6c2f6931780b5ac05b5 Use per-thread counterMoveHistory Drops a scalability bottleneck due to memory contention of a single shared table across threads. The effect starts to be sensible with a high number of threads. Specifically we have a small regression with 7 threads both at 60 and 180 seconds TC: 10000 @ 60+0.6 th 7 ELO: -2.46 +-3.2 (95%) LOS: 6.5% Total: 9896 W: 1037 L: 1107 D: 7752 5000 @ 180+0.6 th 7 ELO: -1.95 +-4.1 (95%) LOS: 17.7% Total: 5000 W: 444 L: 472 D: 4084 We have a regression because counterMoveHistory table is quite big and it takes time for a single thread to fill it. Sharing the table yields to a higher fill rate and better quality of moves and up to 7 threads the benefits of sharing more then compensate the loss in speed due to contention. Interestingly even with a 3X longer TC, so with more time for the single thread to catch up, the improvment is quite limited and below noise level. It seems we really need much longer TC to saturate the table. When we move to high threads number it's another story: 5000 @ 60+0.6 th 22 ELO: 3.49 +-4.3 (95%) LOS: 94.6% Total: 4880 W: 490 L: 441 D: 3949 2000 @ 60+0.6 th 32 ELO: 8.34 +-6.9 (95%) LOS: 99.1% Total: 2000 W: 229 L: 181 D: 1590 As expected the speed-up more than compensates the filling rate, and we expect that with tournament TC, where single thread is able to saturate the table, the difference will be even stronger. For instance for TCEC 9 super-final time control will be 180 minutes + 15 seconds and this scalability improvement seems definitely the way to go. So, summarizing: GOOD: Measured big improvement in high core scenario Suitable for TCEC 9 superfinal (big hardware, very long TC) Consistent and natural patch that extends to counterMoveHistory what we already do for remaining history tables, that are all per-thread Non functional change for the common case of a single core Very simple (just 6 lines modified, no added ones) BAD: Small regression (within 2-3 ELO) with few threads and short TC bench: 5341477 --- diff --git a/src/search.cpp b/src/search.cpp index 542144bf..c09e23d6 100644 --- a/src/search.cpp +++ b/src/search.cpp @@ -157,7 +157,6 @@ namespace { EasyMoveManager EasyMove; Value DrawValue[COLOR_NB]; - CounterMoveHistoryStats CounterMoveHistory; template Value search(Position& pos, Stack* ss, Value alpha, Value beta, Depth depth, bool cutNode); @@ -208,13 +207,13 @@ void Search::init() { void Search::clear() { TT.clear(); - CounterMoveHistory.clear(); for (Thread* th : Threads) { th->history.clear(); th->counterMoves.clear(); th->fromTo.clear(); + th->counterMoveHistory.clear(); } Threads.main()->previousScore = VALUE_INFINITE; @@ -807,7 +806,7 @@ namespace { if (pos.legal(move)) { ss->currentMove = move; - ss->counterMoves = &CounterMoveHistory[pos.moved_piece(move)][to_sq(move)]; + ss->counterMoves = &thisThread->counterMoveHistory[pos.moved_piece(move)][to_sq(move)]; pos.do_move(move, st, pos.gives_check(move)); value = -search(pos, ss+1, -rbeta, -rbeta+1, rdepth, !cutNode); pos.undo_move(move); @@ -968,7 +967,7 @@ moves_loop: // When in check search starts from here } ss->currentMove = move; - ss->counterMoves = &CounterMoveHistory[moved_piece][to_sq(move)]; + ss->counterMoves = &thisThread->counterMoveHistory[moved_piece][to_sq(move)]; // Step 14. Make the move pos.do_move(move, st, givesCheck); diff --git a/src/thread.h b/src/thread.h index 8181163e..969fe635 100644 --- a/src/thread.h +++ b/src/thread.h @@ -66,11 +66,12 @@ public: Position rootPos; Search::RootMoves rootMoves; Depth rootDepth; - HistoryStats history; - MoveStats counterMoves; FromToStats fromTo; Depth completedDepth; std::atomic_bool resetCalls; + HistoryStats history; + MoveStats counterMoves; + CounterMoveHistoryStats counterMoveHistory; };