as discussed in issue #1349, the way pages are allocated with calloc might imply some overhead on first write.
This overhead can be large and slow down the first search after a TT resize significantly, especially for large TT.
Using an explicit clear of the TT on resize fixes this problem.
Not implemented, but possibly useful for large TT, is to do this zero-ing using all search threads. Not only would this be faster, it could also lead to a more favorable memory allocation on numa systems with a first touch policy.
Upon changing the number of threads, make sure all threads are bound
The heuristic to avoid thread binding if less than 8 threads are requested resulted in the first 7 threads not being bound.
The branch was verified to yield a roughly 13% speedup by @CoffeeOne on the appropriate hardware and OS, and an earlier version of this patch tested well on his machine:
To make sure all threads (including mainThread) are bound as soon as the total number exceeds 7, recreate all threads on a change of thread number.
To do this, unify Threads::init, Threads::exit and Threads::set are unified in a single Threads::set function that goes through the needed steps.
The code includes several suggestions from @joergoster.
Allow for general transposition table sizes. (#1341)
For efficiency reasons current master only allows for transposition table sizes that are N = 2^k in size, the index computation can be done efficiently as (hash % N) can be written instead as (hash & 2^k - 1). On a typical computer (with 4, 8... etc Gb of RAM), this implies roughly half the RAM is left unused in analysis.
This issue was mentioned on fishcooking by Mindbreaker:
http://tests.stockfishchess.org/tests/view/5a3587de0ebc590ccbb8be04
Recently a neat trick was proposed to map a hash into the range [0,N[ more efficiently than (hash % N) for general N, nearly as efficiently as (hash % 2^k):
namely computing (hash * N / 2^32) for 32 bit hashes. This patch implements this trick and now allows for general hash sizes. Note that for N = 2^k this just amounts to using a different subset of bits from the hash. Master will use the lower k bits, this trick will use the upper k bits (of the 32 bit hash).
There is no slowdown as measured with [-3, 1] test:
1) the patch is implemented for a 32 bit hash (so that a 64 bit multiply can be used), this effectively limits the number of clusters that can be used to 2^32 or to 128Gb of transpostion table. That's a change in the maximum allowed TT size, which could bother those using 256Gb or more regularly.
2) Already in master, an excluded move is hashed into the position key in rather simple way, essentially only affecting the lower 16 bits of the key. This is OK in master, since bits 0-15 end up in the index, but not in the new scheme, which picks the higher bits. This is 'fixed' by shifting the excluded move a few bits up. Eventually a better hashing scheme seems wise.
Despite these two caveats, I think this is a nice improvement in usability.
Current master can yield different staticEvals depending on the path
used to reach the position. The reason for this is that the evaluation after a
null move is always computed subtracting 2 * Eval::Tempo, while this is not
the case for lazy or specialized evals. This patch always adds tempo to evals,
which doesn't affect playing strength:
Rocky640 [Sun, 17 Dec 2017 07:50:45 +0000 (02:50 -0500)]
Simplify other checks (#1337)
Replace an intricate definition with a more natural one.
Master was excluding squares occupied by a pawn which was blocked by a pawn.
This version excludes any squares occupied by a pawn which is blocked by "something"
Alain SAVARD [Sun, 10 Dec 2017 15:13:28 +0000 (10:13 -0500)]
Simplify other checks #1334
Simplify the other check penalty computation. Compared to current master,
a) it uses a 143 kingDanger penalty instead of S(10, 10) for the "otherCheck"
(credits to ElbertoOne for finding a suitable kingDanger range to replace the score
and to Guardian for showing this could also be a neutral change at LTC).
This makes our king safety model more consistent and simpler.
b) it might also score more than one "otherCheck" penalty for a given piece type instead of just one
c) it might score many pinned penalties instead of just one.
d) It also remove 3 conditionals and uses simpler expressions.
So it was tested as a SPRT[-3, 1]
Trying to improve on b) another attempt was made to score also the
"otherchecks" for piece types which had some safe checks, but this
failed STC http://tests.stockfishchess.org/tests/view/5a2c79e60ebc590ccbb8badd
A better contempt implementation for Stockfish (#1325)
* A better contempt implementation for Stockfish
The round 2 of TCEC season 10 demonstrated the benefit of having a nice contempt implementation: it gives the strongest programs in the tournament the ability to slow down the game when they feel the position is slightly worse, prefering to stay in a complicated (even if slightly risky) middle game rather than simplifying by force into a drawn endgame.
The current contempt implementation of Stockfish is inadequate, and this patch is an attempt to provide a better one.
Passed STC non-regression test against master:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 83360 W: 15089 L: 15075 D: 53196
http://tests.stockfishchess.org/tests/view/5a1bf2de0ebc590ccbb8b370
This contempt implementation is showing promising results in certains situations. For instance, it obtained a nice +30 Elo gain when playing with contempt=40 against Stockfish 7, compared to current master:
• master against SF 7 (20000 games at LTC): +121.2 Elo
• this patch with contempt=40 (20000 games at LTC): +154.11 Elo
This was the result of real cooperative work from the Stockfish team, with key ideas coming from Stefan Geschwentner (locutus2) and Chris Cain (ceebo) while most of the community helped with feedback and computer time.
In this commit the bench is unchanged by default, but you can test at home with the new contempt in the UCI options. The style of play will change a lot when using contempt different of zero (I repeat: not done in this version by default, however)!
The Stockfish team is still deliberating over the best default contempt value in self-play and the best contempt modeling strategy, to help users choosing a contempt value when playing against much weaker programs. These informations will be given in future commits when available :-)
syzygy1 [Mon, 4 Dec 2017 16:52:31 +0000 (17:52 +0100)]
Use a Direction enum for Square deltas
Currently the NORTH/WEST/SOUTH/EAST values are of type Square, but conceptually they are not squares but directions. This patch separates these values into a Direction enum and overloads addition and subtraction to allow adding a Square to a Direction (to get a new Square).
I have also slightly trimmed the possible overloadings to improve type safety. For example, it would normally not make sense to add a Color to a Color or a Piece to a Piece, or to multiply or divide them by an integer. It would also normally not make sense to add a Square to a Square.
Add the -fno-exceptions flag to the Makefile to avoid the unecessary exceptions support in the executable (we do not use any exception in Stockfish at the moment).
This change gives a 9.2% reduction in size for the executable binary.
syzygy [Sat, 25 Nov 2017 22:20:36 +0000 (23:20 +0100)]
Minor cleanup of search.cpp
Four very minor edits. Note that tte->save() uses posKey and
not pos.key() in other places.
Originally I also added a futility_move_counts() function to
make things more consistent with the futility_margin() and
reduction() functions. But then razor_margin[] should probably
also be turned into a function, etc. Maybe a good idea, maybe not.
So I did not include it.
Rocky640 [Sat, 11 Nov 2017 12:37:29 +0000 (07:37 -0500)]
Simplify some kingring penalties expressions
The new "weak" expression helps simplify the safe check calculations for rooks or minors, (but the end result for all the safe checks is the exactly the same as in current master)
The only functional change is for the "outer king ring" (for example, squares f3 g3 h3 when white king is on g1). In current master, there was a 191 penalty if any of these was not defended at all.
With this pr, there is this 191 penalty if any of these is not defended at all or is only defended by a white queen.
Tested as a simplification
STC
http://tests.stockfishchess.org/tests/view/59fb03d80ebc590ccbb89fee
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 66167 W: 12015 L: 11971 D: 42181
(against master (Update Copyright year inMakefile))
LTC
http://tests.stockfishchess.org/tests/view/5a0106ae0ebc590ccbb8a55f
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15790 W: 2095 L: 1968 D: 11727
(against master (Handle BxN trade as good capture when history scor))
same as #1296 but rebased on latest master
bench: 5109559
ceebo [Wed, 8 Nov 2017 17:21:46 +0000 (17:21 +0000)]
Add comments to pos.see_ge()
In terms of technical changes this patch eliminates the return
statements from the main loop of pos.see_ge() and replaces two conditional
computations with a single bitwise negation.
Stockfish currently relies on the "filter_root_moves" function also
having the side effect of clamping Cardinality against MaxCardinality
(the actual piece count in the tablebases). So if we skip this function,
we will end up probing in the search even without tablebases installed.
We cannot bail out of this function before this check is done, so move
the MultiPV hack a few lines below.
IIvec [Sun, 15 Oct 2017 14:44:29 +0000 (16:44 +0200)]
Fix premature using of all available time in x/y TC
In x/y time controls there was a theoretical possibility
to use all available time few moves before the clock will
be updated with new time. This patch fixes that issue.
The first change (ss->statScore >= 0) does nothing.
The second change ((ss-1)->statScore >= 0 ) has a massive change.
(ss-1)->statScore is not set until (ss-1) begins to apply LMR to moves.
So we now increase the reduction for bad quiets when our opponent is
running through the first captures and the hash move.
Let ss->ply denote the number of plies from the root to the current node
This patch lets ss->ply be equal to 0 at the root of the search.
Currently, the root has ss->ply == 1, which is less intuitive:
- Setting the rootNode bool has to check (ss-1)->ply == 0.
- All mate values are off by one: the code seems to assume that mated-in-0
is -VALUE_MATE, mate-1-in-ply is VALUE_MATE-1, mated-in-2-ply is VALUE_MATE+2, etc.
But the mate_in() and mated_in() functions are called with ss->ply, which is 1 in
at the root.
- The is_draw() function currently needs to explain why it has "ply - 1 > i" instead
of simply "ply > i".
- The ss->ply >= MAX_PLY tests in search() and qsearch() already assume that
ss->ply == 0 at the root. If we start at ss->ply == 1, it would make more sense to
go up to and including ss->ply == MAX_PLY, so stop at ss->ply > MAX_PLY. See also
the asserts testing for 0 <= ss->ply && ss->ply < MAX_PLY.
The reason for ss->ply == 1 at the root is the line "ss->ply = (ss-1)->ply + 1" at
the start for search() and qsearch(). By replacing this with "(ss+1)->ply = ss->ply + 1"
we keep ss->ply == 0 at the root. Note that search() already clears killers in (ss+2),
so there is no danger in accessing ss+1.
I have NOT changed pv[MAX_PLY + 1] to pv[MAX_PLY + 2] in search() and qsearch().
It seems to me that MAX_PLY + 1 is exactly right:
- MAX_PLY entries for ss->ply running from 0 to MAX_PLY-1, and 1 entry for the
final MOVE_NONE.
I have verified that mate scores are reported correctly. (They were already reported
correctly due to the extra ply being rounded down when converting to moves.)
The value of seldepth output to the user should probably not change, so I add 1 to it.
(Humans count from 1, computers from 0.)
A small optimisation I did not include: instead of setting ss->ply in every invocation
of search() and qsearch(), it could be set once for all plies at the start of
Thread::search(). This saves a couple of instructions per node.
No functional change (unless the search searches a branch MAX_PLY deep), so bench
does not change.
Do not use the opposed flag for scoring backward and isolated pawns
in pawns.cpp, instead give a S(5,25) bonus for each opponent unopposed
weak pawns when we have a rook or a queen on the board.
STC run stopped after 113188 games:
LLR: 1.63 (-2.94,2.94) [0.00,5.00]
Total: 113188 W: 20804 L: 20251 D: 72133
http://tests.stockfishchess.org/tests/view/59b58e4d0ebc5916ff64b12e
In light of issue #1232, a test was performed about the value of '-fno-exceptions' and a second one of the combination '-fno-exceptions -fno-rtti'. It turns out these options are can be removed without introducing slowdown.
Prevent Stockfish from exiting if DTZ table is not present
During TB initialisation, Stockfish checks for the presence of WDL
tables but not for the presence of DTZ tables. When attempting to probe
a DTZ table, it is therefore possible that the table is not present.
In that case, Stockfish should neither exit nor report an error.
To verify the bug:
$ ./stockfish
setoption name SyzygyTable value <path_to_WDL_dir>
position fen 8/8/4r3/4k3/8/1K2P3/3P4/6R1 w - -
go infinite
Could not mmap() /opt/tb/regular/KRPPvKR.rtbz
$
(On my system, the WDL tables are in one directory and the DTZ tables
in another. If they are in the same directory, it will be difficult
to trigger the bug.)
The fix is trivial: check the file descriptor/handle after opening
the file.
If any thread found a 'mate in x' stop the search. Previously only
mainThread would do so. Requires the bestThread selection to be
adjusted to always prefer mate scores, even if the search depth is less.
I've tried to collect some data for this patch. On 30 cores, mate finding
seems 5-30% faster on average. It is not so easy to get numbers for this,
as the time to find a mate fluctuates significantly with multi-threaded runs,
so it is an average over 100 searches for the same position. Furthermore,
hash size and position make a difference as well.
Use less reduction for moves with larger moveCount if your
opponent did an unexpected (== high moveCount) move in the
previous ply... unexpected moves might need unexpected answers.
lucasart [Sun, 20 Aug 2017 11:59:46 +0000 (19:59 +0800)]
Restore safety margin of 60ms
What this patch does is:
* increase safety margin from 40ms to 60ms. It's worth noting that the previous
code not only used 60ms incompressible safety margin, but also an additional
buffer of 30ms for each "move to go".
* remove a whart, integrating the extra 10ms in Move Overhead value instead.
Additionally, this ensures that optimumtime doesn't become bigger than maximum
time after maximum time has been artificially discounted by 10ms. So it keeps
the code more logical.
Marco Costalba [Tue, 15 Aug 2017 08:05:22 +0000 (01:05 -0700)]
Restore perft
Rewrite perft to be placed naturally inside new
bench code. In particular we don't have special
custom code to run perft anymore but perft is
just a new parameter of 'go' command.
Marco Costalba [Wed, 16 Aug 2017 09:28:54 +0000 (02:28 -0700)]
Speed up Trevis CI
Avoid a couple of redundant rebuilds and compile
with 2 threads since travis gives 2vCPUs.
Also enable -O1 optimization for valgrind and
sanitizers, it should be safe withouth false
positives and it gives a very sensible speed
up, especially with valgrind.
The spee dup allow us to increase testing to
depth 10, useful for thread sanitizer.
This time management handles base time and movestogo cases separatelly. One can test one case without affecting the other. Also, increment usage can be tested separately without (necessarily) affecting sudden death or x moves in y seconds performance.
On stable machines there are no time losses on 0.1+0.001 time control (tested on i7 + Windows 10 platform).
Marco Costalba [Mon, 14 Aug 2017 16:12:16 +0000 (09:12 -0700)]
Fix incorrect StateInfo
We use Position::set() to set root position across
threads. But there are some StateInfo fields (previous,
pliesFromNull, capturedPiece) that cannot be deduced
from a fen string, so set() clears them and to not lose
the info we need to backup and later restore setupStates->back().
Note that setupStates is shared by threads but is accessed
in read-only mode.
I have not fixed all suggestions, for instance I still prefer
to declare the type instead of a spread use of 'auto'. I also
perfer good old 'typedef' to the new 'using' form.
I have not fixed some warnings in the last functions of
syzygy code because those are still the original functions
and need to be completely rewritten anyhow.
tthsqe12 [Sat, 12 Aug 2017 08:50:38 +0000 (10:50 +0200)]
Fix the handling of opposite bishops in KXK endgame evaluation
The case of three or more bishops against a long king must look at all of the
bishops and not just the first two in the piece lists. This patch makes sure
that the position is treated as a win when there are bishops on opposite
colors. This functional change is very small because bench remains the same.
Marco Costalba [Thu, 10 Aug 2017 19:32:50 +0000 (12:32 -0700)]
Re-apply the fix for Limits::ponder race
But this time correctly set Threads.ponder
We avoid using 'limits' for passing pondering
flag because we don't want to have 2 ponder
variables in search scope: Search::Limits.ponder
and Threads.ponder. This would be confusing also
because limits.ponder is set at the beginning of
the search and never changes, instead Threads.ponder
can change value asynchronously during search.
Limits::ponder was used as a signal between uci and search threads,
but is not an atomic variable, leading to the following race as
flagged by a sanitized binary.
Marco Costalba [Sun, 6 Aug 2017 11:43:02 +0000 (04:43 -0700)]
Fix some races and clarify the code
Better split code that should be run at
startup from code run at ucinewgame. Also
fix several races when 'bench', 'perft' and
'ucinewgame' are sent just after 'bestomve'
from the engine threads are still running.
Also use a specific UI thread instead of
main thread when setting up the Position
object used by UI uci loop. This fixes a
race when sending 'eval' command while searching.
We accept a race on 'setoption' to allow the
GUI to change an option while engine is searching
withouth stalling the pipe. Note that changing an
option while searchingg is anyhow not mandated by
UCI protocol.
as a lower level routine, movepicker should not depend on the
search stack or the thread class, removing a circular dependency.
Instead of copying the search stack into the movepicker object,
as well as accessing the thread class for one of the histories,
pass the required fields explicitly to the constructor (removing
the need for thread.h and implicitly search.h in movepick.cpp).
The signature is thus longer, but more explicit:
Also some renaming of histories structures while there.
Rocky640 [Wed, 2 Aug 2017 01:36:33 +0000 (18:36 -0700)]
Rework the "unsupported" penalty into a "supported" bonus
A pawn (according to all the searched positions of a bench run) is not supported 85% of the time,
(in current master it is either isolated, backward or "unsupported").
So it made sense to try moving the S(17, 8) "unsupported" penalty value into the base pawn value hoping for a more representative pawn value, and accordingly
a) adjust backward and isolated so that they stay more or less the same as master
b) increase the mg connected bonus in the supported case by S(17, 0) and let the Connected formula find a suitable eg value according to rank.
in the last month a couple of timeouts have been seen in travis valgrind testing, leading to undesired false positives. The precise cause of this is unclear: a normal valgrind instrumented run is about 6min, the timeout is 10min. Either there are rare hangs (not reproduced locally), or maybe the actual runtime fluctuates on the travis infrastructure (which uses VMs on AWS as far as I know). This patch leads to roughly a 2x speedup of the instrumented testing by reducing the depth from 10 to 9. If timeouts persist, it needs further analysis.
Instead of having Signals in the search namespace,
make the stop variables part of the Threads structure.
This moves more of the shared (atomic) variables towards
the thread-related structures, making their role more clear.