This patch give a small bonus to incite the attacking side to keep more
pawns on the board.
A consequence of this bonus is that Stockfish will tend to play positions
slightly more closed on average than master, especially when it believes
that it has an advantage.
To lower the risk of blockades where Stockfish start shuffling without
progress, we also implement a progressive decrease of the evaluation
value with the 50 moves counter (along with the necessary aging of the
transposition table to reduce the impact of the Graph History Interaction
problem): since the evaluation decreases during shuffling phases, the
engine will tend to examine the consequences of pawn breaks faster during
the search.
Lolligerhans [Tue, 12 Jan 2021 13:30:25 +0000 (14:30 +0100)]
Add penalty for doubled pawns in agile structure
Give an additional penalty of S(20, 10) for any doubled pawn if none of
the opponent's pawns is facing any of our
- pawns or
- pawn attacks;
that means, each of their pawns can push at least one square without
being captured.
This ignores their non-pawns pieces and attacks.
One possible justification: Their pawns' ability to push freely provides
options to react to our threats by changing their pawn structure. Our
doubled pawns however will likely lead to an exploitable weakness, even
if the pawn structure is not yet fixed.
Note that the notion of "their pawns not being fixed" is symmetric for
both players: If all of their pawns can push freely so can ours. All
pawns being freely pushable might just be an early-game-indicator.
However, it can trigger during endgame pawns races, where doubled pawns
are especially hindering, too.
Tomasz Sobczyk [Sun, 10 Jan 2021 06:30:40 +0000 (03:30 -0300)]
Optimize generate_moves
This change simplifies control flow in the generate_moves function which ensures the compiler doesn't duplicate work due to possibly not resolving pureness of the function calls. Also the biggest change is the removal of the unnecessary condition checking for empty b in a convoluted way. The rationale for removal of this condition is that computing attacks_bb with occupancy is not much more costly than computing pseudo attacks and overall the condition (also being likely unpredictable) is a pessimisation.
This is inspired by previous changes by @BM123499.
FauziAkram [Sat, 9 Jan 2021 23:31:09 +0000 (01:31 +0200)]
Bad Outpost Pawn Scale
Changed name from Bad Outpost to Uncontested Outpost
Scale Uncontested Outpost with number of pawns + Decrease Bishop PSQT values and general tuning
Credits for the decrease of the Bishop PSQT values: Fauzi
Credits for scaling Uncontested Outpost with number of pawns: Lolligerhans
Credits for the tunings: Fauzi
Vizvezdenec [Sat, 9 Jan 2021 14:42:58 +0000 (17:42 +0300)]
Refine stat based reductions
This patch separates stat based reductions for quiet moves in case of being in check and in case of not being in check.
We will be using sum of first continuation history and main history (similar to movepicker) instead of statScore for the first case.
BM123499 [Fri, 8 Jan 2021 17:03:26 +0000 (14:03 -0300)]
Rethink En Passant Evasion Capture
It now checks if it were a discovery attack instead of the attacking piece is the double-moved pawn.
As a side effect, certain illegal fens have different, and slightly more logical move generation.
There is no intend to maintain particular behavior for such non-reachable fens.
- "discovered check" (instead of "discovery check")
- "en passant" (instead of "en-passant")
- "pseudo-legal" before a noun (instead of "pseudo legal")
- "3-fold" (instead of "3fold")
Vizvezdenec [Sun, 10 Jan 2021 17:24:37 +0000 (20:24 +0300)]
Small code cleanup in LMR
In a recent patch we added comparing capture history to a number for LMR of captures.
Calling it via thisThread-> is not needed since capture history was already declared by this time -
so removing makes code slightly shorter and easier to follow.
MaximMolchanov [Mon, 11 Jan 2021 05:49:41 +0000 (07:49 +0200)]
Affine transform robust implementation
Size of the weights in the last layer is less than 512 bits. It leads to wrong data access for AVX512. There is no error because in current implementation it is guaranteed that there is an array of zeros after weights so zero multiplied by something is returned and sum is correct. It is a mistake that can lead to unexpected bugs in the future. Used AVX2 instructions for smaller input size.
MaximMolchanov [Wed, 6 Jan 2021 03:29:32 +0000 (05:29 +0200)]
Affine transform refactoring.
Reordered weights in such a way that accumulated sum fits to output.
Weights are grouped in blocks of four elements because four
int8 (weight type) corresponds to one int32 (output type).
No horizontal additions.
Grouped AVX512, AVX2 and SSSE3 implementations.
Repeated code was removed.
Vizvezdenec [Thu, 24 Dec 2020 22:11:09 +0000 (01:11 +0300)]
Do more LMR for captures
This patch enables LMR for all captures at allNodes that were not in PV.
Currently we do LMR for all captures at cutNodes so this is an expansion of this logic:
now we do LMR for all captures almost at all non-pv nodes,
excluding only allNodes that were in PV.
Moez Jellouli [Sun, 20 Dec 2020 21:28:23 +0000 (22:28 +0100)]
Correct Outflanking calculations in classical eval
Take signed value of rank difference between kings squares instead absolute value in outflanking calculation. This change correct evaluation of endgames with one king invading opponent last ranks.
FauziAkram [Sun, 20 Dec 2020 15:50:34 +0000 (17:50 +0200)]
Tweak the formulas for unsafeSquares
We give more bonus for a special case: If there are some enemy squares occupied
or attacked by the enemy on the passed pawn span,
but if they are all attacked by our pawn, use new intermediate factor 30.
The main credit goes to Rocky for the idea, with additional tuning and tests.
pb00067 [Mon, 14 Dec 2020 15:30:56 +0000 (16:30 +0100)]
Simplify condition for assigning static-eval based bonus
for quiet move ordering and simplify bonus formula.
Due to clamping the bonus to relative low values the impact on high
depths is minimal, thus the restriction to low depths seems not
necessary.
Also the condition of movecount in previous node seems to be not
determinant.
Vizvezdenec [Mon, 14 Dec 2020 00:49:04 +0000 (03:49 +0300)]
Increase reduction in case of stable best move
The idea of this patch is pretty simple - we already do more reductions
for non-PV and root nodes in case of stable best move for depth > 10.
This patch makes us do so if root depth if > 10 instead, which
is logical since best move changes (thus instability of it) is
counted at root, so it makes a lot of sense to use depth of the root.
pb00067 [Sun, 13 Dec 2020 20:23:30 +0000 (21:23 +0100)]
Merge static history into main history,
thus simplifying and reducing the memory footprint.
I believe using static diff for better move ordering is more suited for
low depths, so restrict writing to low depths.
Todo: probably the condition for writing can be simplified
mstembera [Sat, 12 Dec 2020 22:18:38 +0000 (14:18 -0800)]
AVX512, AVX2 and SSSE3 speedups
Improves throughput by summing 2 intermediate dot products using 16 bit addition before upconverting to 32 bit.
Potential saturation is detected and the code-path is avoided in this case.
The saturation can't happen with the current nets,
but nets can be constructed that trigger this check.
FauziAkram [Mon, 7 Dec 2020 17:28:47 +0000 (19:28 +0200)]
New Imbalance Tables Tweak
Imbalance tables tweaked to contain MiddleGame and Endgame values, instead of a single value.
The idea started from Fisherman, which requested my help to tune the values back in June/July,
so I tuned the values back then, and we were able to accomplish good results,
but not enough to pass both STC and LTC tests.
So after the recent changes, I decided to give it another shot, and I am glad that it was a successful attempt.
A special thanks goes also to mstembera, which notified me a simple way to let the patch perform a little better.
Use arithmetic right shift for sign extension in MMX and SSE2 paths
This appears to be slightly faster than using a comparison against zero
to compute the high bits, on both old (like Pentium III) and new (like
Zen 2) hardware.
Vizvezdenec [Sat, 5 Dec 2020 01:00:41 +0000 (04:00 +0300)]
Introduce static history
The idea of this patch can be described as following: we update static
history stats based on comparison of the static evaluations of the
position before and after the move. If the move increases static evaluation
it's assigned positive bonus, if it decreases static evaluation
it's assigned negative bonus. These stats are used in movepicker
to sort quiet moves.
Include scaling change as suggested by Dietrich Kappe,
the one who trained net for Komodo. According to him,
some nets may require different scaling in order to utilize its full strength.
syzygy1 [Sun, 29 Nov 2020 11:05:26 +0000 (12:05 +0100)]
Remove piece lists
This patch removes the incrementally updated piece lists from the Position object.
This has been tried before but always failed. My reasons for trying again are:
* 32-bit systems (including phones) are now much less important than they were some years ago (and are absent from fishtest);
* NNUE may have made SF less finely tuned to the order in which moves were generated.
Lolligerhans [Fri, 20 Nov 2020 17:09:41 +0000 (18:09 +0100)]
Refine rook penalty on closed files
+-----------------+
| . . . . . . . . | All files are closed. Some files are
| . . . . . o o . | more valuable for rooks, because
| . . . . o . . o | they might open in the future.
| . . . o x . . x |
| o . o x . x x . |
| x o x . . . . . | x our pawns
| . x . . . . . . | o their pawns
| . . . . . . . . | ^ rooks are scored higher on these files
+-----------------+
^ ^
Files containing none of our own pawns are open or half-open (otherwise
they are closed). Rooks on (half-)open files recieve a bonus for the
future potential to act along all ranks.
This commit refines the (relative) penalty of rooks on closed files.
Files that contain one of our blocked pawns are considered less likely
to open in the future; rooks on these files are now penalized stronger.
This bonus does not generally correlate with mobility. If the condition
is sufficiently refined in the future, it may be beneficial to adjust or
override mobility scores in some cases.
MaximMolchanov [Sat, 14 Nov 2020 00:55:29 +0000 (02:55 +0200)]
Calculate sum from first elements
in affine transform for AVX512/AVX2/SSSE3
The idea is to initialize sum with the first element instead of zero.
Reduce one add_epi32 and one set_zero SIMD instructions for each output dimension.
sum = 0; for i = 1 to n sum += a[i] ->
sum = a[1]; for i = 2 to n sum += a[i]
We now use the same guard condition (testing that we already have a defense with
a score better score than a TB loss) for all pruning heuristics in qsearch().
This allows some pruning when in check, but in a controlled way to ensure that
no wrong mate scores appear.
Tomasz Sobczyk [Tue, 3 Nov 2020 21:49:10 +0000 (22:49 +0100)]
AVX-512 for smaller affine and feature transforms.
For the feature transformer the code is analogical to AVX2 since there was room for easy adaptation of wider simd registers.
For the smaller affine transforms that have 32 byte stride we keep 2 columns in one zmm register. We also unroll more aggressively so that in the end we have to do 16 parallel horizontal additions on ymm slices each consisting of 4 32-bit integers. The slices are embedded in 8 zmm registers.
These changes provide about 1.5% speedup for AVX-512 builds.
The advantage of using the normal time management for a single legal move is that scores
reported for that move are reasonable, not searching leads to artifacts during games
(see e.g. https://tcec-chess.com/#div=sf&game=96&season=19)
The disadvantage of using normal time management of a single legal move is that thinking
times can be unnaturally long, making it 'painful to watch' in online tournaments.
This patch uses normal time management, but caps the used time to 500ms.
This should lead to reasonable scores, and be hardly perceptible.
J. Oster [Sun, 1 Nov 2020 17:33:17 +0000 (18:33 +0100)]
Fix incorrect pruning in qsearch
Only do countermove based pruning in qsearch if we already have a move with a better score than a TB loss.
This patch fixes a bug (started as 843a961) that incorrectly prunes moves if in check,
and adds an assert to make sure no wrong mate scores are given in the future.
It replaces a no-op moveCount check with a check for bestValue.
Initially discussed in #3171 and later in #3199, #3198 and #3210.
This PR effectively closes #3171
It also likely fixes #3196 where this causes user visible incorrect TB scores,
which probably result from these incorrect mate scores.
Tomasz Sobczyk [Wed, 28 Oct 2020 23:14:53 +0000 (00:14 +0100)]
Optimize affine transform for SSSE3 and higher targets.
A non-functional speedup. Unroll the loops going over
the output dimensions in the affine transform layers by
a factor of 4 and perform 4 horizontal additions at a time.
Instead of doing naive horizontal additions on each vector
separately use hadd and shuffling between vectors to reduce
the number of instructions by using all lanes for all stages
of the horizontal adds.
syzygy1 [Tue, 27 Oct 2020 18:22:41 +0000 (19:22 +0100)]
Do not skip non-recapture ttMove when in check
The qsearch() MovePicker incorrectly skips a non-recapture ttMove
when in check (if depth <= DEPTH_QS_RECAPTURES). This is clearly not
intended and can cause qsearch() to return a mate score when there
is no mate. Introduced in cad300c and 6596f0e, as observed by
joergoster in #3171 and #3198.
This PR fixes the bug by not skipping the non-recapture ttMove when in check.
syzygy1 [Tue, 20 Oct 2020 19:06:06 +0000 (21:06 +0200)]
More incremental accumulator updates
This patch was inspired by c065abd which updates the accumulator,
if possible, based on the accumulator of two plies back if
the accumulator of the preceding ply is not available.
With this patch we look back even further in the position history
in an attempt to reduce the number of complete recomputations.
When we find a usable accumulator for the position N plies back,
we also update the accumulator of the position N-1 plies back
because that accumulator is most likely to be helpful later
when evaluating positions in sibling branches.
By not updating all intermediate accumulators immediately,
we avoid doing too much work that is not certain to be useful.
Overall, roughly 2-3% speedup.
This patch makes the code more specific to the net architecture,
changing input features of the net will require additional changes
to the incremental update code as discussed in the PR #3193 and #3191.
Vizvezdenec [Sat, 17 Oct 2020 11:40:10 +0000 (13:40 +0200)]
Do more reductions for late quiet moves in case of consecutive fail highs.
Idea of this patch can be described as following - in case we have consecutive fail highs and we reach late enough moves at root node probability of remaining quiet moves being able to produce even bigger value than moves that produced previous cutoff (so ones that should be high in move ordering but now they fail to produce beta cutoff because we actually reached high move count) should be quiet small so we can reduce them more.
The new net is based on the previous net 04cf2b4ed1da but with the biases
for the 1st hidden layer tuned SPSA, see the SPSA session on fishtest there:
https://tests.stockfishchess.org/tests/view/5f875213dcdad978fe8c5211
Thanks to @vondele for writing out the net, see discussion in this thread:
https://github.com/mstembera/Stockfish/commit/432da86721647dff1d9426a7cdcfd2dbada8155e
Further tune the net parameters, now the last but one layer (32x32).
To limit the number of parameters optimized, the network layer was
decomposed using SVD, and the singular values were treated
as parameters and tuned.
We now include the total pawn count in the scaling factor for the output
of the NNUE evaluation network. This should have the effect of trying to
keep more pawns when SF has the advantage, but exchange them when she
is defending.
Thanks to Alexander Pagel (Lolligerhans) for the idea of using the
value of pawns to ease the comparison with the rest of the material
estimation.
Replace log(i) with log(i + 0.25 * log(i)). This increases especially for low values the reductions. But for bigger values there are nearly no changes.
Idea is that division by fraction of 2 is slightly faster than by other numbers so parameters are adjusted in a way that division in null move pruning depth reduction features dividing by 256 instead of dividing by 213.
Other than this patch is almost non-functional - difference starts to exist by depth 133.
an optimization of Sergio's nn-03744f8d56d8.nnue tuning the output layer (33 parameters) on game play.
WIP code to make layer parameters tunable is https://github.com/vondele/Stockfish/tree/optionOutput
Optimization itself is using https://github.com/vondele/nevergrad4sf
Writing of the modified net using WIP code based on the learner code https://github.com/vondele/Stockfish/tree/evalWrite
Most parameters in the output layer are changed only little (~5 for int8_t).
Current master uses a constant scale factor of 5/4 = 1.25 for the output
of the NNUE network, for compatibility with search and classical evaluation.
We modify this scale factor to make it dependent on the phase of the game,
going from about 1.5 in the opening to 1.0 for pure pawn endgames.
This helps Stockfish to avoid exchanges of pieces (heavy pieces in particular)
when she has the advantage, keeping more material on the board when attacking.
Introduce a small chance of switching to NNUE if PSQ imbalance is large but we have opposite colored bishops and the classical eval is struggling to win.
On Windows, Stockfish wouldn't launch in some GUI because we output some
info strings (about the use of large pages) before sending the 'uci'
command. It seems more robust to suppress these info strings, and instead
to add a proper section section in the Readme about large pages use.
- Clean signature of functions in namespace NNUE
- Add comment for countermove based pruning
- Remove bestMoveCount variable
- Add const qualifier to kpp_board_index array
- Fix spaces in get_best_thread()
- Fix indention in capture LMR code in search.cpp
- Rename TtmemDeleter to LargePageDeleter
Sami Kiminki [Sun, 30 Aug 2020 16:41:30 +0000 (19:41 +0300)]
Add large page support for NNUE weights and simplify TT mem management
Use TT memory functions to allocate memory for the NNUE weights. This
should provide a small speed-up on systems where large pages are not
automatically used, including Windows and some Linux distributions.
Further, since we now have a wrapper for std::aligned_alloc(), we can
simplify the TT memory management a bit:
- We no longer need to store separate pointers to the hash table and
its underlying memory allocation.
- We also get to merge the Linux-specific and default implementations
of aligned_ttmem_alloc().
Finally, we'll enable the VirtualAlloc code path with large page
support also for Win32.
Use tiling to speed up accumulator refreshes and updates
Perform the update and refresh operations tile by tile in a local
array of vectors. By selecting the array size carefully, we
achieve that the compiler keeps the whole array in vector registers.
This PR sets the "comp" variable simply to "clang",
which seems to be more consistent and allows a small simplification.
The PR also moves the section that sets "profile_make" and "profile_use" to after the NDK section,
which ensures that these variables are now set correctly for NDK/clang.
NNUE appears to provide a more stable eval than the classic eval,
so the time use dependencies on bestMoveChanges, fallingEval,
etc may need to change to make the best use of available time.
This change doubles the effect of totBestMoveChanges when giving
more time because the choice of best move is unstable.
No need to initialize StatScore at rootNode. Current Logic is redundant because at subsequent levels the grandchildren statScore is initialized to zero.
This patch doubles the moderate imbalance threshold and probability of using classical eval.
So now if imbalance is greater than PawnValueMg / 4 then there is a 1/8 chance of using classical eval.
Restore the default NNUE setting (enabled) after a bench command.
This also makes the resulting program settings independent of the
number of FENs that are being benched.
Bug fix in do_null_move() and NNUE simplification.
This fixes #3108 and removes some NNUE code that is currently not used.
At the moment, do_null_move() copies the accumulator from the previous
state into the new state, which is correct. It then clears the "computed_score"
flag because the side to move has changed, and with the other side to move
NNUE will return a completely different evaluation (normally with changed
sign but also with different NNUE-internal tempo bonus).
The problem is that do_null_move() clears the wrong flag. It clears the
computed_score flag of the old state, not of the new state. It turns out
that this almost never affects the search. For example, fixing it does not
change the current bench (but it does change the previous bench). This is
because the search code usually avoids calling evaluate() after a null move.
This PR corrects do_null_move() by removing the computed_score flag altogether.
The flag is not needed because nnue_evaluate() is never called twice on a position.
This PR also removes some unnecessary {}s and inserts a few blank lines
in the modified NNUE files in line with SF coding style.