cj5716 [Wed, 1 May 2024 10:31:38 +0000 (18:31 +0800)]
Some history fixes and tidy-up
This adds the functions `update_refutations` and `update_quiet_histories` to better distinguish the two. `update_quiet_stats` now just calls both of these functions.
The functional side of this patch is two-fold:
1. Stop refutations being updated when we carry out multicut
2. Update pawn history every time we update other quiet histories
Viren6 [Sat, 4 May 2024 16:29:23 +0000 (17:29 +0100)]
Introduce Quadruple Extensions
This patch introduces quadruple extensions, with the new condition of not ttPv. It also generalises all margins, so that extensions can still occur if conditions are only partially fulfilled, but with a stricter margin.
Michael Chaly [Sat, 4 May 2024 07:33:26 +0000 (10:33 +0300)]
Add extra bonuses to some moves that forced a fail low
The previous patch on this idea was giving bonuses to this moves if best value of search is far below current static evaluation.
This patch does similar thing but adds extra bonus when best value of search is far below static evaluation before previous move.
1) Fixes a bug introduced in
https://github.com/official-stockfish/Stockfish/pull/5194. Only one
psqtOnly flag was used for two perspectives which was causing
wrong entries to be cleared and marked.
2) The finny caches should be cleared like histories and not at the
start of every search.
Saves a (currently) 800 KB allocation and deallocation when running
`eval`, not particularly significant and zero impact on play but not
necessary either.
Use capture history to better judge which sacrifices to explore
This idea has been bouncing around a while. @Vizvezdenec tried it a
couple years ago in Stockfish without results, but its recent arrival in
Ethereal inspired him and thence me to try it afresh in Stockfish.
(Also factor out the now-common code with futpruning for captures.)
More reduction at cut nodes which are not a former PV node
But the tt move and first killer are excluded.
This idea is based on following LMR condition tuning
https://tests.stockfishchess.org/tests/view/66228bed3fe04ce4cefc0c71 by
using only the two largest terms P[0] and P[1].
For each thread persist an accumulator cache for the network, where each
cache contains multiple entries for each of the possible king squares.
When the accumulator needs to be refreshed, the cached entry is used to more
efficiently update the accumulator, instead of rebuilding it from scratch.
This idea, was first described by Luecx (author of Koivisto) and
is commonly referred to as "Finny Tables".
When the accumulator needs to be refreshed, instead of filling it with
biases and adding every piece from scratch, we...
1. Take the `AccumulatorRefreshEntry` associated with the new king bucket
2. Calculate the features to activate and deactivate (from differences
between bitboards in the entry and bitboards of the actual position)
3. Apply the updates on the refresh entry
4. Copy the content of the refresh entry accumulator to the accumulator
we were refreshing
5. Copy the bitboards from the position to the refresh entry, to match
the newly updated accumulator
Parameters Tune, adding also another tunable parameter (npmDiv) to be
variable for different nets (bignet, smallnet, psqtOnly smallnet). P.s:
The changed values are only the parameters where there is agreement
among the different time controls, so in other words, the tunings are
telling us that changing these specific values to this specific
direction is good in all time controls, so there shouldn't be a high
risk of regressing at longer time controls.
Previously it was possible to also get the node counter after running a bench with perft, i.e.
`./stockfish bench 1 1 5 current perft`, caused by a small regression from the uci refactoring.
We change the definition of "age" from "age of this position" to "age of this TT entry".
In this way, despite being on the same position, when we save into TT, we always prefer the new entry as compared to the old one.
It makes more sense to not (potentially) change the developers alsr entropy setting to make the test run through. This should be an active choice even if the test then might fail locally for them.
the recent refactoring has shown some limitations of our testing, hence we add a couple of more tests including:
* expected mate score
* expected mated score
* expected in TB win score
* expected in TB loss score
* expected info line output
* expected info line output (wdl)
Small improvement of the elapsed time usage in search, makes the code easier to read overall.
Also Search::Worker::iterative_deepening() now only checks the elapsed time once, instead of 3 times in a row.
* TB values can have a distance of 0, mainly when we are in a tb position but haven't found mate.
* Add a missing whitespace to UCIEngine::on_update_no_moves()
A side note, it is still required for the static functions,
but these should be moved to a different namespace/class
later on, since sf kinda relies on them.
The assignment (ss + 1)->excludedMove = Move::none() can be simplified away because when that line is reached, (ss + 1)->excludedMove is always already none. The only moment stack[x]->excludedMove is modified, is during singular search, but it is reset to none right after the singular search is finished.
The same functionality is available by using COMPCXX and having another variable which does the same is just confusing.
There was only one mention on Stockfish Wiki about this which has been changed to COMPCXX.
* Using an earlier L1-3072 net, and with triple extension margin manually set to 0: https://tests.stockfishchess.org/tests/view/65ffdf5d0ec64f0526c544f2 (~30k games)
* Continue tuning, but with the previous master net (L1-2560). https://tests.stockfishchess.org/tests/view/660663f00ec64f0526c59c41 (~27k games)
* Starting with the parameters from step 2, use the current L1-3072 net, and allow the triple extension margin to be tuned starting from 0: https://tests.stockfishchess.org/tests/view/660c16b8216a13d9498e7536 (40k games)
Disservin [Sat, 23 Mar 2024 09:22:20 +0000 (10:22 +0100)]
Transform search output to engine callbacks
Part 2 of the Split UCI into UCIEngine and Engine refactor.
This creates function callbacks for search to use when an update should occur.
The benching in uci.cpp for example does this to extract the total nodes
searched.
Disservin [Sun, 17 Mar 2024 11:33:14 +0000 (12:33 +0100)]
Split UCI into UCIEngine and Engine
This is another refactor which aims to decouple uci from stockfish. A new engine
class manages all engine related logic and uci is a "small" wrapper around it.
In the future we should also try to remove the need for the Position object in
the uci and replace the options with an actual options struct instead of using a
map. Also convert the std::string's in the Info structs a string_view.
Update NNUE architecture to SFNNv9 and net nn-ae6a388e4a1a.nnue
Part 1: PyTorch Training, linrock
Trained with a 10-stage sequence from scratch, starting in May 2023:
https://github.com/linrock/nnue-tools/blob/master/exp-sequences/3072-10stage-SFNNv9.yml
While the training methods were similar to the L1-2560 training sequence,
the last two stages introduced min-v2 binpacks,
where bestmove capture and in-check position scores were not zeroed during minimization,
for compatibility with skipping SEE >= 0 positions and future research.
Training data can be found at:
https://robotmoon.com/nnue-training-data
This net was tested at epoch 679 of the 10th training stage:
https://tests.stockfishchess.org/tests/view/65f32e460ec64f0526c48dbc
Part 2: SPSA Training, Viren6
The net was then SPSA tuned.
This consisted of the output weights (32 * 8) and biases (8)
as well as the L3 biases (32 * 8) and L2 biases (16 * 8), totalling 648 params in total.
The SPSA tune can be found here:
https://tests.stockfishchess.org/tests/view/65fc33ba0ec64f0526c512e3
With the help of Disservin , the initial weights were extracted with:
https://github.com/Viren6/Stockfish/tree/new228
The net was saved with the tuned weights using:
https://github.com/Viren6/Stockfish/tree/new241
Earlier nets of the SPSA failed STC compared to the base 3072 net of part 1:
https://tests.stockfishchess.org/tests/view/65ff356e0ec64f0526c53c98
Therefore it is suspected that the SPSA at VVLTC has
added extra scaling on top of the scaling of increasing the L1 size.
current master triggers a gcc note:
parameter passing for argument of type 'std::pair<double, double>' when C++17 is enabled changed to match C++14 in GCC 10.1
while this is inconsequential, and just informative https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111516 we can easily avoid it.
Disservin [Fri, 29 Mar 2024 09:49:53 +0000 (10:49 +0100)]
Improve prerelease creation workflow
In the last couple of months we sometimes saw duplicated prereleases uploaded to GitHub, possibly due to some racy behavior when concurrent jobs create a prerelease. This now creates an empty prerelease at the beginning of the CI and the binaries are later just attached to this one.
Michael Chaly [Thu, 28 Mar 2024 21:17:37 +0000 (00:17 +0300)]
Adjust best value after a pruned quiet move
Logic somewhat similar to how we adjust best value after pruned captures
in qsearch, but in search this patch does it after pruned quiet moves
and also to not full scale of futility value but to smth in between
best value and futility value.
Values were found using two tunes with the final values taken from the ltc tune after 62k games :
stc - https://tests.stockfishchess.org/tests/view/65fb526b0ec64f0526c50694
ltc - https://tests.stockfishchess.org/tests/view/65fd36e60ec64f0526c5214b
Ideas for future work;
* tune these values with the other TM adjustments
* try narrower bands
* calculate adjustment for exact eval by interpolation
Muzhen Gaming [Sun, 17 Mar 2024 03:20:41 +0000 (11:20 +0800)]
VVLTC search tune
This set of parameters was derived from 3 tuning attempts:
https://tests.stockfishchess.org/tests/view/65d19ab61d8e83c78bfd8436 (80+0.8 x8, ~40k games)
Then tuned with one of linrock's early L1-3072 nets:
https://tests.stockfishchess.org/tests/view/65def7b04b19edc854ebdec8 (VVLTC, ~36k games)
Starting from the result of this tuning, the parameters were then tuned with the current master net:
https://tests.stockfishchess.org/tests/view/65f11c420ec64f0526c46fc4 (VVLTC, ~45k games)
Additionally, at the start of the third tuning phase, 2 parameters were manually changed:
Notably, the triple extension margin was decreased from 78 to 22. This idea was given by Vizvezdenec:
https://tests.stockfishchess.org/tests/view/65f0a2360ec64f0526c46752.
The PvNode extension margin was also adjusted from 50 to 40.
This tune also differs from previous tuning attempts by tuning the evaluation thresholds for smallnet and psqt-only.
The former was increased through the tuning, and this is hypothesized to scale better at VVLTC,
although there is not much evidence of it.
Robert Nurnberg [Sun, 17 Mar 2024 14:39:01 +0000 (15:39 +0100)]
Base WDL model on material count and normalize evals dynamically
This PR proposes to change the parameter dependence of Stockfish's
internal WDL model from full move counter to material count. In addition
it ensures that an evaluation of 100 centipawns always corresponds to a
50% win probability at fishtest LTC, whereas for master this holds only
at move number 32. See also
https://github.com/official-stockfish/Stockfish/pull/4920 and the
discussion therein.
The new model was fitted based on about 340M positions extracted from
5.6M fishtest LTC games from the last three weeks, involving SF versions
from e67cc979fd2c0e66dfc2b2f2daa0117458cfc462 (SF 16.1) to current
master.
The involved commands are for
[WDL_model](https://github.com/official-stockfish/WDL_model) are:
```
./updateWDL.sh --firstrev e67cc979fd2c0e66dfc2b2f2daa0117458cfc462
python scoreWDL.py updateWDL.json --plot save --pgnName update_material.png --momType "material" --momTarget 58 --materialMin 10 --modelFitting optimizeProbability
```
The anchor `58` for the material count value was chosen to be as close
as possible to the observed average material count of fishtest LTC games
at move 32 (`43`), while not changing the value of
`NormalizeToPawnValue` compared to the move-based WDL model by more than
1.
The patch only affects the displayed cp and wdl values.
Michael Chaly [Fri, 15 Mar 2024 15:55:40 +0000 (18:55 +0300)]
Clamp history bonus to stats range
Before, one always had to keep track of the bonus one assigns to a history to stop
the stats from overflowing. This is a quality of life improvement. Since this would often go unnoticed during benching.
Lemmy [Sun, 10 Mar 2024 21:57:22 +0000 (07:57 +1000)]
Sudden Death - Improve TM
Due to the 50 estimated move horizon, once a sudden death game got below
1 second, the time allocation for optimumTime would go into the negative
and SF would start instamoving.
To counter this, once limits.time is below 1 second, the move horizon
will start decreasing, at a rate of 1 move per 20ms. This was just what
seemed a reasonable rate of decay.
Disservin [Tue, 12 Mar 2024 17:20:19 +0000 (18:20 +0100)]
Fix Raspberry Pi Compilation
Reported by @Torom over discord.
> dev build fails on Raspberry Pi 5 with clang
```
clang++ -o stockfish benchmark.o bitboard.o evaluate.o main.o misc.o movegen.o movepick.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o tune.o tbprobe.o nnue_misc.o half_ka_v2_hm.o network.o -fprofile-instr-generate -latomic -lpthread -Wall -Wcast-qual -fno-exceptions -std=c++17 -fprofile-instr-generate -pedantic -Wextra -Wshadow -Wmissing-prototypes -Wconditional-uninitialized -DUSE_PTHREADS -DNDEBUG -O3 -funroll-loops -DIS_64BIT -DUSE_POPCNT -DUSE_NEON=8 -march=armv8.2-a+dotprod -DUSE_NEON_DOTPROD -DGIT_SHA=627974c9 -DGIT_DATE=20240312 -DARCH=armv8-dotprod -flto=full
/tmp/lto-llvm-e9300e.o: in function `_GLOBAL__sub_I_network.cpp':
ld-temp.o:(.text.startup+0x704c): relocation truncated to fit: R_AARCH64_LDST64_ABS_LO12_NC against symbol `gEmbeddedNNUEBigEnd' defined in .rodata section in /tmp/lto-llvm-e9300e.o
/usr/bin/ld: ld-temp.o:(.text.startup+0x704c): warning: one possible cause of this error is that the symbol is being referenced in the indicated code as if it had a larger alignment than was declared where it was defined
ld-temp.o:(.text.startup+0x7068): relocation truncated to fit: R_AARCH64_LDST64_ABS_LO12_NC against symbol `gEmbeddedNNUESmallEnd' defined in .rodata section in /tmp/lto-llvm-e9300e.o
/usr/bin/ld: ld-temp.o:(.text.startup+0x7068): warning: one possible cause of this error is that the symbol is being referenced in the indicated code as if it had a larger alignment than was declared where it was defined
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [Makefile:1051: stockfish] Error 1
make[2]: Leaving directory '/home/torsten/chess/Stockfish_master/src'
make[1]: *** [Makefile:1058: clang-profile-make] Error 2
make[1]: Leaving directory '/home/torsten/chess/Stockfish_master/src'
make: *** [Makefile:886: profile-build] Error 2
```
Disservin [Sat, 9 Mar 2024 13:42:37 +0000 (14:42 +0100)]
Refactor Network Usage
Continuing from PR #4968, this update improves how Stockfish handles network
usage, making it easier to manage and modify networks in the future.
With the introduction of a dedicated Network class, creating networks has become
straightforward. See uci.cpp:
```cpp
NN::NetworkBig({EvalFileDefaultNameBig, "None", ""}, NN::embeddedNNUEBig)
```
The new `Network` encapsulates all network-related logic, significantly reducing
the complexity previously required to support multiple network types, such as
the distinction between small and big networks #4915.
Note: The small net and psqt-only thresholds have been moved to
evaluate.h. The reasoning is that these values are used in both
`evaluate.cpp` and `evaluate_nnue.cpp`, and thus unifying their usage
avoids inconsistencies during testing, where one occurrence is changed
without the other (this happened during the search tune SPRT).
Disservin [Sat, 9 Mar 2024 11:14:57 +0000 (12:14 +0100)]
Assorted cleanups
- fix naming convention for `workingDirectory`
- use type alias for `EvalFiles` everywhere
- move `ponderMode` into `LimitsType`
- move limits parsing into standalone static function
- when running go mate x in losing positions, master always goes to the
maximal depth, arguably against what the UCI protocol demands
- when running go mate x in winning positions with multiple
threads, master may return non-mate scores from the search (this issue
is present in stockfish since at least sf16) The issues are fixed by
(a) also checking if score is mate -x and by (b) only letting
mainthread stop the search for go mate x commands, and by not looking
for a best thread but using mainthread as per the default. Related:
niklasf/python-chess#1070
More diagnostics can be found here peregrineshahin#6 (comment)
rn5f107s2 [Thu, 15 Feb 2024 22:01:02 +0000 (23:01 +0100)]
Reduce futility_margin if opponents last move was bad
This reduces the futiltiy_margin if our opponents last move was bad by
around ~1/3 when not improving and ~1/2.7 when improving, the idea being
to retroactively futility prune moves that were played, but turned out
to be bad. A bad move is being defined as their staticEval before their
move being lower as our staticEval now is. If the depth is 2 and we are
improving the opponent worsening flag is not set, in order to not risk
having a too low futility_margin, due to the fact that when these
conditions are met the futility_margin already drops quite low.
Created by retraining the previous main net `nn-b1a57edbea57.nnue` with:
- some of the same options as before:
- ranger21, more WDL skipping, 15% more loss when Q is too high
- removal of the huge 514G pre-interleaved binpack
- removal of SF-generated dfrc data (dfrc99-16tb7p-filt-v2.min.binpack)
- interleaving many binpacks at training time
- training with some bestmove capture positions where SEE < 0
- increased usage of torch.compile to speed up training by up to 40%
This particular net was reached at epoch 759. Use of more torch.compile decorators
in nnue-pytorch model.py than in the previous main net training run sped up training
by up to 40% on Tesla gpus when using recent pytorch compiled with cuda 12:
https://github.com/linrock/nnue-tools/blob/7fb9831/Dockerfile
Skipping positions with bestmove captures where static exchange evaluation is >= 0
is based on the implementation from Sopel's NNUE training & experimentation log:
https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY
Experiment 293 - only skip captures with see>=0
Positions with bestmove captures where score == 0 are always skipped for
compatibility with minimized binpacks, since the original minimizer sets
scores to 0 for slight improvements in compression.
The trainer branch used was:
https://github.com/linrock/nnue-pytorch/tree/r21-more-wdl-skip-15p-more-loss-high-q-skip-see-ge0-torch-compile-more
Binpacks were renamed to be sorted chronologically by default when sorted by name.
The binpack data are otherwise the same as binpacks with similar names in the prior
naming convention.
Training data can be found at:
https://robotmoon.com/nnue-training-data/
FauziAkram [Mon, 26 Feb 2024 15:08:22 +0000 (18:08 +0300)]
Simplify Time Management
Instead of having a formula for using extra time with larger increments.
Simply set it to 1 when the increment is lower than 0.5s and to 1.1 when
the increment is higher.
Today, we have the pleasure to announce Stockfish 16.1. As always, you can
freely download it at https://stockfishchess.org/download and use it in the GUI
of your choice[1].
Don't forget to join our Discord server[2] to get in touch with the community of
developers and users of the project!
*Quality of chess play*
In our testing against its predecessor, Stockfish 16.1 shows a notable
improvement in performance, with an Elo gain of up to 27 points and winning over
2 times more game pairs[3] than it loses.
*Update highlights*
*Improved evaluation*
- Updated neural network architecture: The neural network architecture has
undergone two updates and is currently in its 8th version[4].
- Removal of handcrafted evaluation (HCE): This release marks the removal of the
traditional handcrafted evaluation and the transition to a fully neural
network-based approach[5].
- Dual NNUE: For the first time, Stockfish includes a secondary neural
network[6], used to quickly evaluate positions that are easily decided.
*UCI Options removed*
`Use NNUE` and `UCI_AnalyseMode`[7] have been removed as they no longer had any
effect. `SlowMover`[8] has also been removed in favor of `Move Overhead`.
*More binaries*
We now offer 13 new binaries. These new binaries include `avx512`, `vnni256`,
`vnni512`, `m1-apple-silicon`, and `armv8-dotprod`, which take advantage of
specific CPU instructions for improved performance.
For most users, using `sse41-popcnt` (formerly `modern`), `avx2`, or `bmi2`
should be enough, but if your CPU supports these new instructions, feel free to
try them!
*Development changes*
- Updated testing book: This new book[9], now derived exclusively from the open
Lichess database[10], is 10 times larger than its predecessor, and has been
used to test potential improvements to Stockfish over the past few months.
- Consolidation of repositories: Aiming to simplify access to our resources, we
have moved most Stockfish-related repositories into the official Stockfish
organization[11] on GitHub.
- Growing maintainer team: We welcome Disservin[12] to the team of maintainers
of the project! This extra pair of hands will ensure the lasting success of
Stockfish.
*Thank you*
The Stockfish project builds on a thriving community of enthusiasts (thanks
everybody!) who contribute their expertise, time, and resources to build a free
and open-source chess engine that is robust, widely available, and very strong.
We would like to express our gratitude for the 10k stars[13] that light up our
GitHub project! Thank you for your support and encouragement – your recognition
means a lot to us.
We invite our chess fans to join the Fishtest testing framework[14], and
programmers to contribute to the project either directly to Stockfish[15] (C++),
to Fishtest[16] (HTML, CSS, JavaScript, and Python), to our trainer
nnue-pytorch[17] (C++ and Python), or to our website[18] (HTML, CSS/SCSS, and
JavaScript).
```
Look recursively in directory pgns for games from SPRT tests using books
matching "UHO_4060_v..epd|UHO_Lichess_4852_v1.epd" for SF revisions
between 8e75548f2a10969c1c9211056999efbcebe63f9a (from 2024-02-17
17:11:46 +0100) and HEAD (from 2024-02-17 17:13:07 +0100). Based on 127920843 positions from 2109240 games, NormalizeToPawnValue should
change from 345 to 356.
```
The patch only affects the UCI-reported cp and wdl values.
Created by retraining the previous main net `nn-baff1edbea57.nnue` with:
- some of the same options as before: ranger21, more WDL skipping
- the addition of T80 nov+dec 2023 data
- increasing loss by 15% when prediction is too high, up from 10%
- use of torch.compile to speed up training by over 25%
Epoch 819 trained with the above config led to this PR. Use of torch.compile
decorators in nnue-pytorch model.py was found to speed up training by at least
25% on Ampere gpus when using recent pytorch compiled with cuda 12:
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
See recent main net PRs for more info on
- ranger21 and more WDL skipping: https://github.com/official-stockfish/Stockfish/pull/4942
- increasing loss when Q is too high: https://github.com/official-stockfish/Stockfish/pull/4972
Training data can be found at:
https://robotmoon.com/nnue-training-data/
Disservin [Sun, 11 Feb 2024 13:19:44 +0000 (14:19 +0100)]
Use node counting to early stop search
This introduces a form of node counting which can
be used to further tweak the usage of our search
time.
The current approach stops the search when almost
all nodes are searched on a single move.
The idea originally came from Koivisto, but the
implemention is a bit different, Koivisto scales
the optimal time by the nodes effort and then
determines if the search should be stopped.
We just scale down the `totalTime` and stop the
search if we exceed it and the effort is large
enough.