This patch reuses the threatenedPieces variable (which is calculated in movepicker)
to reduce less in the search tree the moves which escape a capture.
This removes the restriction that no hashfull information is printed within the first second of a search.
On modern systems, a non-zero value is returned within 6 ms with default settings.
This patch simplifies the formulas used to compute the trend and optimism values before each search iteration.
As a side effect, this removes the parameters which make the relationship between the displayed evaluation value
and the expected game result asymmetric.
I've also provided links to the results of isotonic regression analysis of the relationship between the evaluation and game result (statistical data and a graph) for both tests, which demonstrate that the new version has a more symmetric relationship:
STC: [Data and graph](https://github.com/official-stockfish/Stockfish/discussions/4150#discussioncomment-3548954)
LTC: [Data and graph](https://github.com/official-stockfish/Stockfish/discussions/4150#discussioncomment-3626311)
See also https://github.com/official-stockfish/Stockfish/issues/4142
Michael Chaly [Sat, 10 Sep 2022 09:30:25 +0000 (12:30 +0300)]
Do less singular extensions for former PVnode
Patch is a reintroduction of logic what was simplified a while ago
in a slightly different form. Do bigger extension offset in
case of non-pv node having a pv.
relatively soon servers with 512 threads will be available 'quite commonly',
anticipate even more threads, and increase our current maximum from 512 to 1024.
A bug fix plus non functional speed optimization. Position::key_after(Move m) is now
consistent with Position::key() thus prefetching correct TT entries which speeds things up.
Related PR #3759
In general the history update bonus is slightly decreased by 11% which gives a slower saturation speed.
In addition only for main history the divisor is halfed (used history values are doubled to maintain same maximum)
which have an effect in the opposite direction on saturation speed.
Michael Chaly [Sat, 30 Jul 2022 23:04:23 +0000 (02:04 +0300)]
Do more TT cutoffs in case of exact bound
The idea is that these TT entries are considered move valuable in TT replacement scheme - they are always overwriting other entries. So it makes sence for them to produce more aggressive cutoffs.
using trainer branch https://github.com/glinscott/nnue-pytorch/pull/208 with a slightly
tweaked loss function (power 2.5 instead of 2.6), otherwise same training as in
the previous net update https://github.com/official-stockfish/Stockfish/pull/4100
If the elapsed time is close to the available time, the time management thread can signal that the next iterations should be searched at the same depth (Threads.increaseDepth = false). While the rootDepth increases, the adjustedDepth is kept constant with the searchAgainCounter.
In exceptional cases, when threading is used and the master thread, which controls the time management, signals to not increaseDepth, but by itself takes a long time to finish the iteration, the helper threads can search repeatedly at the same depth. This search finishes more and more quickly, leading to helper threads that report a rootDepth of MAX_DEPTH (245). The latter is not optimal as it is confusing for the user, stops search on these threads, and leads to an incorrect bias in the thread voting scheme. Probably with only a small impact on strength.
This behavior was observed almost two years ago,
see https://github.com/official-stockfish/Stockfish/issues/2717
This patch fixes #2717 by ensuring the effective depth increases at once every four iterations,
even in increaseDepth is false.
Depth 245 searches (for non-trivial positions) were indeed absent with this patch,
but frequent with master in the tests below:
https://discord.com/channels/435943710472011776/813919248455827515/994872720800088095
Total pgns: 2173
Base: 2867
Patch: 0
it passed non-regression testing in various setups:
Special thanks to miguel-I, Disservin, ruicoelhopedro and others for analysing the problem,
the data, and coming up with the key insight, needed to fix this longstanding issue.
oversight changed the corresponding float division to integer division in a previous tune https://github.com/official-stockfish/Stockfish/commit/442c40b43de8ede1e424efa674c8d45322e3b43c it is stronger to keep the original float division.
Michael Chaly [Tue, 5 Jul 2022 11:15:34 +0000 (14:15 +0300)]
Simplify away FRC correction term
Since new net is trained partially using FRC data this part of adjustment that penalises bishops that are locked in the corner is no longer needed - net should "know" this things itself much better.
this PR is being made from court. Today, Tord and Stéphane, with broad support
of the developer community are defending their complaint, filed in Munich, against ChessBase.
With their products Houdini 6 and Fat Fritz 2, both Stockfish derivatives,
ChessBase violated repeatedly the Stockfish GPLv3 license. Tord and Stéphane have terminated
their license with ChessBase permanently. Today we have the opportunity to present
our evidence to the judge and enforce that termination. To read up, have a look at our blog post
https://stockfishchess.org/blog/2022/public-court-hearing-soon/ and
https://stockfishchess.org/blog/2021/our-lawsuit-against-chessbase/
This PR introduces a net trained with an enhanced data set and a modified loss function in the trainer.
A slight adjustment for the scaling was needed to get a pass on standard chess.
Local testing at a fixed 25k nodes resulted in
Test run1026/easy_train_data/experiments/experiment_2/training/run_0/nn-epoch799.nnue
localElo: 4.2 +- 1.6
The real strength of the net is in FRC and DFRC chess where it gains significantly.
This is due to the mixing in a significant fraction of DFRC training data in the final training round. The net is
trained using the easy_train.py script in the following way:
where the data set used (Leela-dfrc_n5000.binpack) is a combination of our previous best data set (mix of Leela and some SF data) and DFRC data, interleaved to form:
The data is available in https://drive.google.com/drive/folders/1S9-ZiQa_3ApmjBtl2e8SyHxj4zG4V8gG?usp=sharing
Leela mix: https://drive.google.com/file/d/1JUkMhHSfgIYCjfDNKZUMYZt6L5I7Ra6G/view?usp=sharing
DFRC: https://drive.google.com/file/d/17vDaff9LAsVo_1OfsgWAIYqJtqR8aHlm/view?usp=sharing
The training branch used is
https://github.com/vondele/nnue-pytorch/commits/lossScan4
A PR to the main trainer repo will be made later. This contains a revised loss function, now computing the loss from the score based on the win rate model, which is a more accurate representation than what we had before. Scaling constants are tweaked there as well.
ppigazzini [Mon, 13 Jun 2022 20:08:01 +0000 (22:08 +0200)]
Restore NDKv21 for GitHub Actions
GitHub updated the versions of NDK installed on the Actions runners
breaking the ARM tests.
Restore the NDKv21 using the GitHub suggested mitigation, see:
https://github.com/actions/virtual-environments/issues/5595
xoto10 [Thu, 19 May 2022 07:51:40 +0000 (08:51 +0100)]
Adjust scale param higher
xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule,
so further increases of that param were tried. Joint work xoto10 and dubslow.
This patch provides command line flags `--help` and `--license` as well as the corresponding `help` and `license` commands.
```
$ ./stockfish --help
Stockfish 200522 by the Stockfish developers (see AUTHORS file)
Stockfish is a powerful chess engine and free software licensed under the GNU GPLv3.
Stockfish is normally used with a separate graphical user interface (GUI).
Stockfish implements the universal chess interface (UCI) to exchange information.
For further information see https://github.com/official-stockfish/Stockfish#readme
or the corresponding README.md and Copying.txt files distributed with this program.
```
The idea is to provide a minimal help that links to the README.md file,
not replicating information that is already available elsewhere.
We use this opportunity to explicitly report the license as well.
and should have nearly no influence at STC as depth 27 is rarely reached.
It was noticed that initializing the threshold with MAX_PLY, had an adverse effect,
possibly because the first move is sensitive to this.
Tomasz Sobczyk [Fri, 13 May 2022 15:26:50 +0000 (17:26 +0200)]
Update NNUE architecture to SFNNv5. Update network to nn-3c0aa92af1da.nnue.
Architecture changes:
Duplicated activation after the 1024->15 layer with squared crelu (so 15->15*2). As proposed by vondele.
Trainer changes:
Added bias to L1 factorization, which was previously missing (no measurable improvement but at least neutral in principle)
For retraining linearly reduce lambda parameter from 1.0 at epoch 0 to 0.75 at epoch 800.
reduce max_skipping_rate from 15 to 10 (compared to vondele's outstanding PR)
Note: This network was trained with a ~0.8% error in quantization regarding the newly added activation function.
This will be fixed in the released trainer version. Expect a trainer PR tomorrow.
Note: The inference implementation cuts a corner to merge results from two activation functions.
This could possibly be resolved nicer in the future. AVX2 implementation likely not necessary, but NEON is missing.
train a net using training data with a
heavier weight on positions having 16 pieces on the board. More specifically,
with a relative weight of `i * (32-i)/(16 * 16)+1` (where i is the number of pieces on the board).
This is done with the trainer branch https://github.com/glinscott/nnue-pytorch/pull/173
The command used is:
```
python train.py $datafile $datafile $restarttype $restartfile --gpus 1 --threads 4 --num-workers 12 --random-fen-skipping=3 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --features=HalfKAv2_hm^ --lambda=1.00 --max_epochs=$epochs --seed $RANDOM --default_root_dir exp/run_$i
```
The datafile is T60T70wIsRightFarseerT60T74T75T76.binpack, the restart is from the master net.
A new major release of Stockfish is now available at https://stockfishchess.org
Stockfish 15 continues to push the boundaries of chess, providing unrivalled
analysis and playing strength. In our testing, Stockfish 15 is ahead of
Stockfish 14 by 36 Elo points and wins nine times more game pairs than it
loses[1].
Improvements to the engine have made it possible for Stockfish to end up
victorious in tournaments at all sorts of time controls ranging from bullet to
classical and even at Fischer random chess[2]. At CCC, Stockfish won all of
the latest tournaments: CCC 16 Bullet, Blitz and Rapid, CCC 960 championship,
and the CCC 17 Rapid. At TCEC, Stockfish won the Season 21, Cup 9, FRC 4 and
in the current Season 22 superfinal, at the time of writing, has won 16 game
pairs and not yet lost a single one.
This progress is the result of a dedicated team of developers that comes up
with new ideas and improvements. For Stockfish 15, we tested nearly 13000
different changes and retained the best 200. These include the fourth
generation of our NNUE network architecture, as well as various search
improvements. To perform these tests, contributors provide CPU time for
testing, and in the last year, they have collectively played roughly a
billion chess games. In the last few years, our distributed testing
framework, Fishtest, has been operated superbly and has been developed and
improved extensively. This work by Pasquale Pigazzini, Tom Vijlbrief, Michel
Van den Bergh, and various other developers[3] is an essential part of the
success of the Stockfish project.
Indeed, the Stockfish project builds on a thriving community of enthusiasts
to offer a free and open-source chess engine that is robust, widely
available, and very strong. We invite our chess fans to join the Fishtest
testing framework and programmers to contribute to the project[4].
This patch lessens the Late Move Reduction at PV nodes with low depth. Previously the affect of depth on LMR was independant of nodeType. The idea behind this patch is that at PV nodes, LMR at low depth is will miss out on potential alpha-raising moves.
This patch enforces that NNUE evaluation is used for endgame positions at shallow depth (depth <= 9).
Classic evaluation will still be used for high imbalance positions when the depth is high or there are many pieces.
Topologist [Mon, 28 Mar 2022 09:50:08 +0000 (11:50 +0200)]
Play more positional in endgames
This patch chooses the delta value (which skews the nnue evaluation between positional and materialistic)
depending on the material: If the material is low, delta will be higher and the evaluation is shifted
to the positional value. If the material is high, the evaluation will be shifted to the psqt value.
I don't think slightly negative values of delta should be a concern.
Michael Chaly [Mon, 28 Mar 2022 11:15:56 +0000 (14:15 +0300)]
In movepicker increase priority for moves that evade a capture
This idea is a mix of koivisto idea of threat history and heuristic that
was simplified some time ago in LMR - decreasing reduction for moves that evade a capture.
Instead of doing so in LMR this patch does it in movepicker - to do this it
calculates squares that are attacked by different piece types and pieces that are located
on this squares and boosts up weight of moves that make this pieces land on a square that is not under threat.
Boost is greater for pieces with bigger material values.
Special thanks to koivisto and seer authors for explaining me ideas behind threat history.
This patch replaces `pos.capture_or_promotion()` with `pos.capture()`
and comes after a few attempts with elo-gaining bounds, two of which
failed yellow at LTC
(https://tests.stockfishchess.org/tests/view/622f8f0cc9e950cbfc237024
and
https://tests.stockfishchess.org/tests/view/62319a8bb3b498ba71a6b2dc).
Via the ttPv flag an implicit tree of current and former PV nodes is maintained. In addition this tree is grown or shrinked at the leafs dependant on the search results. But now the shrinking step has been removed.
As the frequency of ttPv nodes decreases with depth the shown scaling behavior (STC barely passed but LTC scales well) of the tests was expected.
Michael Chaly [Tue, 8 Mar 2022 07:56:07 +0000 (10:56 +0300)]
Decrease reductions in Lmr for some Pv nodes
This patch makes us reduce less in Lmr at pv nodes in case of static eval being far away from static evaluation of position.
Idea is that if it's the case then probably position is pretty complex so we can't be sure about how reliable LMR is so we need to reduce less.
This patch (partially) sort captures in analogy to quiet moves. All
three movepickers are affected, hence `depth` is added as an argument in
probcut's.
mstembera [Thu, 24 Feb 2022 02:19:36 +0000 (18:19 -0800)]
Clean up and simplify some nnue code.
Remove some unnecessary code and it's execution during inference. Also the change on line 49 in nnue_architecture.h results in a more efficient SIMD code path through ClippedReLU::propagate().
Michael Chaly [Sat, 19 Feb 2022 15:24:11 +0000 (18:24 +0300)]
Adjust usage of LMR for 2nd move in move ordering
Current master prohibits usage of LMR for 2nd move at rootNode. This patch also disables LMR for 2nd move not only at rootNode but also at first PvNode that is a reply to rootNode.
ppigazzini [Sun, 6 Feb 2022 18:20:30 +0000 (19:20 +0100)]
Add ARM NDK to Github Actions matrix
- set the variable only for the required tests to keep simple the yml file
- use NDK 21.x until will be fixed the Stockfish static build problem
with NDK 23.x
- set the test for armv7, armv7-neon, armv8 builds:
- use armv7a-linux-androideabi21-clang++ compiler for armv7 armv7-neon
- enforce a static build
- silence the Warning for the unused compilation flag "-pie" with
the static build, otherwise the Github workflow stops
- use qemu to bench the build and get the signature
Many thanks to @pschneider1968 that made all the hard work with NDK :)
Michael Chaly [Thu, 17 Feb 2022 07:54:07 +0000 (10:54 +0300)]
Tune search at very long time control
This patch is a result of tuning done by user @candirufish after 150k games.
Since the tuned values were really interesting and touched heuristics
that are known for their non-linear scaling I decided to run limited
games LTC match, even if the STC test was really bad (which was expected).
After seeing the results of the LTC match, I also run a VLTC (very long
time control) SPRTtest, which passed.
The main difference is in extensions: this patch allows much more
singular/double extensions, both in terms of allowing them at lower
depths and with lesser margins.
Michael Chaly [Sat, 12 Feb 2022 17:08:45 +0000 (20:08 +0300)]
Big search tuning (version 2)
One more tuning - this one includes newly introduced heuristics and
some other parameters that were not included in previous one. Result
of 400k games at 20+0.2 "as is". Tuning is continuing since there is
probably a lot more elo to gain.
The most important architectural changes are the following:
* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.
Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.
First session:
The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.
The training was done using the following command:
Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.
The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view
Second session:
The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py
The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).
The training was done using the following command:
The training was done using the following command:
In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.
The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:
Get the https://github.com/official-stockfish/Stockfish/blob/5640ad48ae5881223b868362c1cbeb042947f7b4/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack
with some hand polishing on top of it, which includes :
a) correcting trend sigmoid - for some reason original tuning resulted in it being negative. This heuristic was proven to be worth some elo for years so reversing it sign is probably some random artefact;
b) remove changes to continuation history based pruning - this heuristic historically was really good at providing green STCs and then failing at LTC miserably if we tried to make it more strict, original tuning was done at short time control and thus it became more strict - which doesn't scale to longer time controls;
c) remove changes to improvement - not really indended :).
Michael Chaly [Mon, 7 Feb 2022 10:32:21 +0000 (13:32 +0300)]
Do less depth reduction in null move pruning for complex positions
This patch makes us reduce less depth in null move pruning if complexity is high enough.
Thus, null move pruning now depends in two distinct ways on complexity,
while being the only search heuristic that exploits complexity so far.
Michael Chaly [Sat, 5 Feb 2022 01:03:02 +0000 (04:03 +0300)]
Reintroduce razoring
Razoring was simplified away some years ago, this patch reintroduces it in a slightly different form.
Now for low depths if eval is far below alpha we check if qsearch can push it above alpha - and if it can't we return a fail low.
Michael Chaly [Fri, 4 Feb 2022 19:42:41 +0000 (22:42 +0300)]
Introduce movecount pruning for quiet check evasions in qsearch
Idea of this patch is that we usually don't consider quiet check evasions as "good" ones and prefer capture based ones instead. So it makes sense to think that if in qsearch 2 quiet check evasions failed to produce anything good 3rd and further ones wouldn't be good either.
Michael Chaly [Sat, 29 Jan 2022 03:39:40 +0000 (06:39 +0300)]
Do stats updates after LMR for captures
Since captures that are in LMR use continuation histories of corresponding quiet moves it makes sense to update this histories if this capture passes LMR by analogy to existing logic for quiet moves.
pschneider1968 [Fri, 21 Jan 2022 13:11:53 +0000 (14:11 +0100)]
Fix Makefile for Android NDK cross-compile
For cross-compiling to Android on windows, the Makefile needs some tweaks.
Tested with Android NDK 23.1.7779620 and 21.4.7075529, using
Windows 10 with clean MSYS2 environment (i.e. no MINGW/GCC/Clang
toolchain in PATH) and Fedora 35, with build target:
build ARCH=armv8 COMP=ndk
The resulting binary runs fine inside Droidfish on my Samsung
Galaxy Note20 Ultra and Samsung Galaxy Tab S7+
Other builds tested to exclude regressions: MINGW64/Clang64 build
on Windows; MINGW64 cross build, native Clang and GCC builds on Fedora.
wiki docs https://github.com/glinscott/fishtest/wiki/Cross-compiling-Stockfish-for-Android-on-Windows-and-Linux