Miguel Lahoz [Mon, 27 Mar 2023 16:06:24 +0000 (00:06 +0800)]
Clean up repetitive declarations for see_ge
The occupied bitboard is only used in one place and is otherwise thrown away.
To simplify use, see_ge function can instead be overloaded.
Repetitive declarations for occupied bitboard can be removed.
Created by retraining the master net with these modifications:
* New filtering methods for existing data from T80 sep+oct2022, T79 apr2022, T78 jun+jul+aug+sep2022, T77 dec2021
* Adding new filtered data from T80 aug2022 and T78 apr+may2022
* Increasing early-fen-skipping from 28 to 30
The v3 filtering used for data from T77dec 2021 differs from v2 filtering in that:
* To improve binpack compression, positions after ply 28 were skipped during training by setting position scores to VALUE_NONE (32002) instead of removing them entirely
* All early-game positions with ply <= 28 were removed to maximize binpack compression
* Only bestmove captures at d6pv2 search were skipped, not 2nd bestmove captures
* Binpack compression was repaired for the remaining positions by effectively replacing bestmoves with "played moves" to maintain contiguous sequences of positions in the training game data
After improving binpack compression, The T77 dec2021 data size was reduced from 95G to 19G.
The v6 filtering used for data from T80augsepoctT79aprT78aprtosep 2022 differs from v2 in that:
* All positions with only one legal move were removed
* Tighter score differences at d6pv2 search were used to remove more positions with only one good move than before
* d6pv2 search was not used to remove positions where the best 2 moves were captures
MinetaS [Wed, 22 Mar 2023 07:50:55 +0000 (16:50 +0900)]
Remove non_pawn_material in NNUE::evaluate
After "Use NNUE complexity in search, retune related parameters" commit,
the effect of non-pawn material adjustment has been nearly diminished.
This patch removes pos.non_pawn_material as a simplification, which
passed non-regression tests with both STC and LTC.
Michael Chaly [Fri, 24 Mar 2023 04:45:22 +0000 (07:45 +0300)]
Simplify statScore initialization
This patch simplifies initialization of statScore to "always set it up to 0" instead of setting it up to 0 two plies deeper.
Reason for why it was done in previous way partially was because of LMR usage of previous statScore which was simplified long time ago so it makes sense to make in more simple there.
FauziAkram [Tue, 21 Mar 2023 20:58:25 +0000 (23:58 +0300)]
Update Elo estimates for terms in search
Setting the Elo value of some functions which were not set before.
All tests run at 10+0.1 (STC), 25000 games (Same as #4294).
Book used: UHO_XXL_+0.90_+1.19.epd
Values are rounded to the nearest non-negative integer.
pb00067 [Mon, 20 Mar 2023 07:56:44 +0000 (08:56 +0100)]
Verified SEE pruning for capturing and checking moves.
Patch analyzes field after SEE exchanges concluded with a recapture by
the opponent:
if opponent Queen/Rook/King results under attack after the exchanges, we
consider the move sharp and don't prune it.
Important note:
By accident I forgot to adjust 'occupied' when the king takes part in
the exchanges.
As result of this a move is considered sharp too, when opponent king
apparently can evade check by recapturing.
Surprisingly this seems contribute to patch's strength.
pb00067 [Mon, 13 Mar 2023 17:32:40 +0000 (18:32 +0100)]
Remove 'si' StateInfo variable/parameter.
Since st is a member of position we don't need to pass it separately as
parameter.
While being there also remove some line in pos_is_ok, where
a copy of StateInfo was made by using default copy constructor and
then verified it's correctedness by doing a memcmp.
There is no point in doing that.
The clang 16 release will remove the -fexperimental-new-pass-manager
flag (see https://github.com/llvm/llvm-project/commit/69b2b7282e92a1b576b7bd26f3b16716a5027e8e).
Thus, the commit adapts the Makefile to use this flag only for older
clang versions.
Michael Chaly [Mon, 13 Mar 2023 12:57:58 +0000 (15:57 +0300)]
Do more singular extensions
This patch continues trend of last VLTC tuning - as measured by dubslow
most of it gains was in lowering marging in calculation of singularBeta.
This patch is a manual adjustment on top of it - it lowers multiplier
of depth in calculation of singularBeta even further, from 2/3 to 1,5/2,5.
Dubslow [Fri, 10 Mar 2023 00:33:13 +0000 (18:33 -0600)]
More negative extensions on nonsingular nonpv nodes.
Following up the previous gainer also in this nonsingular node
section of code. Credit shared with @FauziAkram for realizing
this nonsingular node stuff had some potential, and @XInTheDark
for reminding us that !PvNodes better handle extensions/reductions
than Pv.
Michael Chaly [Mon, 6 Mar 2023 22:54:20 +0000 (01:54 +0300)]
Do more negative extensions
This patch does negatively extend transposition table move if singular search failed and tt value is not bigger than alpha.
Logic is close to what we had before recent simplification of negative extensions but uses or condition instead of and condition.
https://github.com/official-stockfish/Stockfish/commit/5c75c1c2fbb7bb4f0bf7c44fb855c415b788cbf7
introduced a capture_stage() function, but TB usage needs a pure capture() function.
disservin [Sat, 4 Mar 2023 15:34:34 +0000 (16:34 +0100)]
Add wiki to artifacts
snapshot the wiki https://github.com/official-stockfish/stockfish/wiki as part of the artifacts generated.
This will allow future release to include the wiki pages as a form of documentation
Michael Chaly [Fri, 24 Feb 2023 09:09:45 +0000 (12:09 +0300)]
Fix duplicated moves generation in movepicker
in a some of cases movepicker returned some moves more than once which lead
to them being searched more than once. This bug was possible because of how
we use queen promotions - they are generated as a captures but are not
included in position function which checks if move is a capture. Thus if
any refutation (killer or countermove) was a queen promotion it was
searched twice - once as a capture and one as a refutation.
This patch affects various things, namely stats assignments for queen promotions
and other moves if best move is queen promotion,
also some heuristics in search and qsearch.
With this patch every queen promotion is now considered a capture.
After this patch number of found duplicated moves is 0 during normal 13 depth bench run.
Created by retraining the master net with modifications to the previous best dataset:
* Improving T80 oct+nov 2022 endgame lambda accuracy by rescoring with 12-16tb of syzygy 7p tablebases
* Filtering T78 jun+jul+aug 2022 with d6pv2 search to remove positions with bestmove captures or one good move
* Adding T80 sep 2022 data, rescored with 16tb of 7p tablebases, unfiltered
Trained with max-epoch 900, end-lambda 0.7, and early-fen-skipping 28.
Dubslow [Wed, 22 Feb 2023 11:45:43 +0000 (05:45 -0600)]
Late counter bonus: boost underestimated moves
The idea here is very intuitive: since we've just proven that the move is good, then if it previously had poor stats, boost those stats more than otherwise.
Call the recently added hint function for NNUE accumulator update after a failed probcut search.
In this case we already searched at least some captures and tt move which, however, is not sufficient for a cutoff.
So it seems we have a greater chance that the full search will also have no cutoff and hence all moves have to be searched.
pb00067 [Sun, 26 Feb 2023 08:59:35 +0000 (09:59 +0100)]
Use common_parent_position hint also at PVNodes TT hits.
Credits to Stefan Geschwentner (locutus2) showing that the hint
is useful on PvNodes. In contrast to his test,
this version avoids to use the hint when in check.
I believe checking positions aren't good candidates for the hint
because:
- evasion moves are rather few, so a checking pos. has much less childs
than a normal position
- if the king has to move the NNUE eval can't use incremental updates,
so the child nodes have to do a full refresh anyway.
Michael Chaly [Fri, 24 Feb 2023 15:25:24 +0000 (18:25 +0300)]
Search tuning at very long time control
This patch is a result of tuning session of approximately 100k games at 120+1.2.
Biggest changes are in extensions, stat bonus and depth reduction for nodes without a tt move.
Linmiao Xu [Tue, 21 Feb 2023 16:17:59 +0000 (11:17 -0500)]
Reintroduce nnue pawn scaling with lower lazy thresholds
Params found with the nevergrad TBPSA optimizer via nevergrad4sf modified to:
* use SPRT LLR with fishtest STC elo gainer bounds [0, 2] as the objective function
* increase the game batch size after each new optimal point is found
The params were the optimal point after TBPSA iteration 7 and 160 nevergrad evaluations with:
* initial batch size of 96 games per evaluation
* batch size increase of 64 games after each iteration
* a budget of 512 evaluations
* TC: fixed 1.5 million nodes per move, no time limit
nevergrad4sf enables optimizing stockfish params with TBPSA:
https://github.com/vondele/nevergrad4sf
Using pentanomial game results with smaller game batch sizes was inspired by:
Use of SPRT LLR calculated from pentanomial game results as the objective function was an experiment at maximizing the information from game batches to reduce the computational cost for TBPSA to converge on good parameters.
For the exact code used to find the params:
https://github.com/linrock/tuning-fork
This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently.
This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067).
Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that
The sdot instruction computes (and accumulates) a signed dot product,
which is quite handy for Stockfish's NNUE code. The instruction is
optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above.
The commit adds a new 'arm-dotprod' architecture with enabled dot
product support. It also enables dot product support for the existing
'apple-silicon' architecture, which is at least Armv8.5.
The following local speed test was performed on an Apple M1 with
ARCH=apple-silicon. I had to remove CPU pinning from the benchmark
script. However, the results were still consistent: Checking both
binaries against themselves reported a speedup of +0.0000 and +0.0005,
respectively.
```
Result of 100 runs
==================
base (...ish.037ef3e1) = 1917997 +/- 7152
test (...fish.dotprod) = 2159682 +/- 9066
diff = +241684 +/- 2923
MinetaS [Mon, 13 Feb 2023 02:54:59 +0000 (11:54 +0900)]
Fix overflow in add_dpbusd_epi32x2
This patch fixes 16bit overflow in *_add_dpbusd_epi32x2 functions,
that can be triggered in rare cases depending on the NNUE weights.
While the code leads to some slowdown on affected architectures
(most notably avx2), the fix is simpler than some of the other
options discussed in
https://github.com/official-stockfish/Stockfish/pull/4394
Created by retraining the master net on a dataset composed of:
* Most of the previous best dataset filtered to remove positions likely having only one good move
* Adding training data from Leela T77 dec2021 rescored with 16tb of 7-piece tablebases
Trained with end lambda 0.7 and max epoch 900. Positions with ply <= 28 were removed from most of the previous best dataset before training began. A new nnue-pytorch trainer param for skipping early plies was used to skip plies <= 24 in the unfiltered and additional Leela T77 parts of the dataset.
The depth6 multipv2 search filtering method is the same as the one used for filtering recent best datasets, with a lower eval difference threshold to remove slightly more positions than before. These parts of the dataset were filtered:
* 96% of T60T70wIsRightFarseerT60T74T75T76.binpack
* 99% of dfrc_n5000.binpack
* T80 oct + nov 2022 data, no positions with castling flags, rescored with ~600gb 7p tablebases
* T79 apr + may 2022 data, rescored with 12tb 7p tablebases
* T60 nov + dec 2021 data, rescored with 12tb 7p tablebases
These parts of the dataset were not filtered. Positions with ply <= 24 were skipped during training:
* T78 aug + sep 2022 data, rescored with 12tb 7p tablebases
* 84% of T77 dec 2021 data, rescored with 16tb 7p tablebases
The code and exact evaluation thresholds used for data filtering can be found at:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-eval-diff-t2/src/filter
The exact training data used can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move:
nn-epoch859.nnue : 3.5 +/ 1.2
Michael Chaly [Sat, 4 Feb 2023 20:46:44 +0000 (23:46 +0300)]
Cleanup and reorder in qsearch
This patch is a simplification / code normalisation in qsearch.
Adds steps in comments the same way we have in search;
Makes a separate "pruning" stage instead of heuristics randomly being spread over qsearch code;
Reorders pruning heuristics from least taxing ones to more taxing ones;
Removes repeated check for best value not being mated, instead uses 1 check - thus removes some lines of code.
Moves prefetch and move setup after pruning - makes no sense to do them if move will actually get pruned.
pb00067 [Fri, 3 Feb 2023 16:57:19 +0000 (17:57 +0100)]
Improve excluded move logic
PR consists of 2 improvements on nodes with excludeMove:
1. Remove xoring the posKey with make_key(excludedMove)
Since we never call tte->save anymore with excludedMove,
the unique left purpose of the xoring was to avoid a TT hit.
Nevertheless on a normal bench run this produced ~25 false positives
(key collisions)
To avoid that we now forbid early TT cutoff's with excludeMove
Maybe these accesses to TT with xored key caused useless misses
in the CPU caches (L1, L2 ...)
Now doing the probe with the same key as the enclosing search does,
should hit the CPU cache.
2. Don't probe Tablebases with excludedMove.
This can't be tested on fishtest, but it's obvious that
tablebases don't deliver any information about suboptimal moves.
Side note:
Very surprisingly it looks like we cannot use static eval's from
TT since they slightly differ over time due to changing optimism.
Attempts to use static eval's from TT did loose about 13 ELO.
This is something about to investigate.
Michael Chaly [Thu, 2 Feb 2023 10:05:54 +0000 (13:05 +0300)]
Do less SEE pruning in qsearch
Current master prunes all moves with negative SEE values in qsearch.
This patch sets constant negative threshold thus allowing some moves with negative SEE values to be searched.
Value of threshold is completely arbitrary and can be tweaked - also it as function of depth can be tried.
Original idea by author of Alexandria engine.
update the WLD model with about 400M positions extracted from recent LTC games after the net updates.
This ensures that the 50% win rate is again at 1.0 eval.
MinetaS [Thu, 19 Jan 2023 00:49:42 +0000 (09:49 +0900)]
Remove maxNextDepth
This patch allows full PV search to have double extensions as well when
extension == 1 && doDeeperSearch && doEvenDeeperSearch && !doShallowerSearch
is true, which is extremely rare to occur.
Dubslow [Sun, 15 Jan 2023 10:08:33 +0000 (04:08 -0600)]
Remove `previousDepth` in favor of `completedDepth + 2`
Beyond the simplification, this could be considered a bugfix from a certain point of view.
However, the effect is very subtle and essentially impossible for users to notice. 5372f81cc8 added about 2 Elo at LTC, but only for second and later `go` commands; now, with
this patch, the first `go` command will also benefit from that gain. Games under time
controls are unaffected (as per the tests).
disservin [Mon, 23 Jan 2023 18:32:26 +0000 (19:32 +0100)]
Fixed UCI TB win values
This patch results in search values for a TB win/loss to be reported in a way that does not change with normalization, i.e. will be consistent over time.
A value of 200.00 pawns is now reported upon entering a TB won position. Values smaller than 200.00 relate to the distance in plies from the root to the probed position position,
with 1 cp being 1 ply distance.
Michael Chaly [Wed, 25 Jan 2023 05:12:40 +0000 (08:12 +0300)]
Rebalance usage of history heuristics in pruning
This patch has multiple effects:
* history heuristics sum in futility pruning now can't exceed some negative value so futility pruning for moves with negative histories should become slightly less aggressive;
* history heuristics are now used in SEE pruning for quiet moves;
Created by retraining the master net with Leela T78 data from Aug+Sep 2022 added to the previous best dataset. Trained with end lambda 0.7 and started with max epoch 800. All positions with ply <= 28 were skipped:
Around epoch 750, training was manually paused and max epoch increased to 950 before resuming. The additional Leela training data from T78 was prepared in the same way as the previous best dataset.
The exact training data used can be found at:
https://robotmoon.com/nnue-training-data/
While the local elo ratings during this experiment were much lower than in recent master nets, several later epochs had a consistent elo above zero, and this was hypothesized to represent potential strength at slower time controls.
Local elo at 25k nodes per move
leela95-dfrc96-filt-only-T80octnov-T60novdecT78augsepT79aprmay-12tb7p-sk28-lambda7
nn-epoch819.nnue : 0.4 +/- 1.1 (nn-bc24c101ada0.nnue)
nn-epoch799.nnue : 0.3 +/- 1.2
nn-epoch759.nnue : 0.3 +/- 1.1
nn-epoch839.nnue : 0.2 +/- 1.4
Stephen Touset [Mon, 16 Jan 2023 22:25:47 +0000 (14:25 -0800)]
Remove precomputed SquareBB
Bit-shifting is a single instruction, and should be faster than an array lookup
on supported architectures. Besides (ever so slightly) speeding up the
conversion of a square into a bitboard, we may see minor general performance
improvements due to preserving more of the CPU's existing cache.
The old parameterization (https://github.com/official-stockfish/Stockfish/pull/2225/files) has now become quite inaccurate.
This updates the formula based on updated results with master. The formula is based on a fit of the Elo results for games
played between master at various skill levels, and various versions of the Stash engine, which have been ranked at CCRL.
Skill 0 .. 19, now covers CCRL Blitz Elo from 1320 to 3190, approximately.
Indeed, the Elo of stash in this analysis is only to within +- 100 Elo of CCRL,
probably because it depends quite a bit on the opponent pool.
To obtain a skill level for a given Elo number, the above data is fit as a 3rd
degree polynomial Skill(Elo). A quick test confirms the correspondence to the above table:
```
Score of master-elo-2721 vs stash-bot-v21.0: 51 - 16 - 19 [0.703] 86
Elo difference: 150.1 +/- 70.2, LOS: 100.0 %, DrawRatio: 22.1 %
```
Fix asm modifiers in add_dpbusd_epi32x2 implementations
The accumulator should be an earlyclobber because it is written before
all input operands are read. Otherwise, the asm code computes a wrong
result if the accumulator shares a register with one of the other input
operands (which happens if we pass in the same expression for the
accumulator and the operand).
Created by retraining the master net on a dataset composed of:
* The Leela-dfrc_n5000.binpack dataset filtered with depth6 multipv2 search to remove positions with only one good move, in addition to removing positions where either of the two best moves are captures
* The same Leela T80 oct+nov 2022 training data used in recent best datasets
* Additional Leela training data from T60 nov+dec 2021 and T79 apr+may 2022
Trained with end lambda 0.7 and started with max epoch 800. All positions with ply <= 28 were skipped:
Around epoch 780, training was manually paused and max epoch increased to 920 before resuming.
During depth6 multipv2 data filtering, positions were considered to have only one good move if the score of the best move was significantly better than the 2nd best move in a way that changes the outcome of the game:
* the best move leads to a significant advantage while the 2nd best move equalizes or loses
* the best move is about equal while the 2nd best move loses
The modified stockfish branch and exact score thresholds used for filtering are at:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-eval-diff/src/filter
About 95% of the Leela portion and 96% of the DFRC portion of the Leela-dfrc_n5000.binpack dataset was filtered. Unfiltered parts of the dataset were left out.
The additional Leela training data from T60 nov+dec 2021 and T79 apr+may 2022 was WDL-rescored with about 12TB of syzygy 7-piece tablebases where the material difference is less than around 6 pawns. Best moves were exported to .plain data files during data conversion with the lc0 rescorer.
The exact training data can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move
experiment_leela95-dfrc96-mpv-eval-fonly-T80octnov-T79aprmayT60novdec-12tb7p-sk28-lambda7
run_0/nn-epoch899.nnue : 3.8 +/- 1.6
Removed sprintf() which generated a warning, because of security reasons.
Replace NULL with nullptr
Replace typedef with using
Do not inherit from std::vector. Use composition instead.
optimize mutex-unlocking
Warn if a global function has no previous declaration
If a global function has no previous declaration, either the declaration
is missing in the corresponding header file or the function should be
declared static. Static functions are local to the translation unit,
which allows the compiler to apply some optimizations earlier (when
compiling the translation unit rather than during link-time
optimization).
The commit enables the warning for gcc, clang, and mingw. It also fixes
the reported warnings by declaring the functions static or by adding a
header file (benchmark.h).
mstembera [Wed, 28 Dec 2022 00:44:32 +0000 (16:44 -0800)]
Fix stack initialization
This fixes a bug where on line 278 the Stack::staticEvals are
initialized to 0. However VALUE_NONE is defined to be 32002 so
this is a bug in master. It is probably due to the calculation
of improvement, where staticEval prior to rootPos can be accessed.
This is a later epoch (epoch 859) from the same experiment run that trained yesterday's master net nn-60fa44e376d9.nnue (epoch 779). The experiment was manually paused around epoch 790 and unpaused with max epoch increased to 900 mainly to get more local elo data without letting the GPU idle.
Local elo vs. nn-335a9b2d8a80.nnue at 25k nodes per move:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
run_0/nn-epoch779.nnue (nn-60fa44e376d9.nnue) : 5.0 +/- 1.2
run_0/nn-epoch859.nnue (nn-a3dc078bafc7.nnue) : 5.6 +/- 1.6
Created by retraining the master net on the previous best dataset with additional filtering. No new data was added.
More of the Leela-dfrc_n5000.binpack part of the dataset was pre-filtered with depth6 multipv2 search to remove bestmove captures. About 93% of the previous Leela/SF data and 99% of the SF dfrc data was filtered. Unfiltered parts of the dataset were left out. The new Leela T80 oct+nov data is the same as before. All early game positions with ply count <= 28 were skipped during training by modifying the training data loader in nnue-pytorch.
Trained in a similar way as recent master nets, with a different nnue-pytorch branch for early ply skipping:
In both modified methods, the variable 'result' is checked to detect
whether the probe operation failed. However, the variable is not
initialized on all paths, so the check might test an uninitialized
value.
A test position (with TB) is given by:
position fen 3K1k2/R7/8/8/8/8/8/R6Q w - - 0 1 moves a1b1 f8g8 b1a1 g8f8 a1b1 f8g8 b1a1
This is now fixed by always initializing the variable.
Created by retraining the master net with a combination of:
the previous best dataset (Leela-dfrc_n5000.binpack), with about half the dataset filtered using depth6 multipv2 search to throw away positions where either of the 2 best moves are captures
Leela T80 Oct and Nov training data rescored with best moves, adding ~9.5 billion positions
Trained effectively the same way as the previous master net:
Local testing at a fixed 25k nodes:
experiments/experiment_leela-dfrc-filtered-T80-oct-nov/training/run_0/nn-epoch779.nnue
localElo: run_0/nn-epoch779.nnue : 4.7 +/- 3.1
The new Leela T80 part of the dataset was prepared by downloading test80 training data from all of Oct 2022 and Nov 2022, rescoring with syzygy 6-piece tablebases and ~600 GB of 7-piece tablebases, saving best moves to exported .plain files, removing all positions with castling flags, then converting to binpacks and using interleave_binpacks.py to merge them together. Scripts used in this data conversion process are available at:
https://github.com/linrock/lc0-data-converter
Filtering binpack data using depth6 multipv2 search was done by modifying transform.cpp in the tools branch:
https://github.com/linrock/Stockfish/tree/tools-filter-multipv2-no-rescore
Links for downloading the training data (total size: 338 GB) are available at:
https://robotmoon.com/nnue-training-data/
MinetaS [Tue, 20 Dec 2022 07:01:05 +0000 (16:01 +0900)]
Fix a dependency bug
Instead of allowing .depend for specific build-related targets, filter
non-build-related targets (i.e. help, clean) so that other targets can
normally execute .depend target.
mstembera [Tue, 13 Dec 2022 07:22:02 +0000 (23:22 -0800)]
Don't reset increaseDepth back to true after it has been set to false
Resetting increaseDepth back to true each time on the very next iteration was not intended so this is a bug fix and a simplification.
See more discussion here #2482 (comment) Thanks to xoto10
Michael Chaly [Sat, 17 Dec 2022 09:48:03 +0000 (12:48 +0300)]
Reintroduce doEvenDeeperSearch
This patch is basically the same as a reverted patch
but now has some guarding against search being stuck - the same
way as we do with double extensions. This should help with
search explosions - albeit slowly but they eventually should be resolved.
Alfredo Menezes [Fri, 9 Dec 2022 15:11:43 +0000 (12:11 -0300)]
Extend all moves at low depth if ttMove is doubly extended
If ttMove is doubly extended, we allow a depth growth of the remaining moves.
The idea is to get a more realistic score comparison, because of the depth
difference. We take some care to avoid this extension for high depths,
in order to avoid the cost, since the search result is supposed
to be more accurate in this case.