git.sesse.net Git - stockfish/log

Use popcount intrinsic with Interl compiler

It seems that icc used our fallback version of popcount.
Now use intrinsics.

icc version 16.0.2 (gcc version 5.3.0 compatibility)
bmi2 compile
uname -r 4.5.1-1-ARCH

20xbench gives a nice speedup
./stockfish-icc-master 2161515 +- 34462
./stockfish-icc-sse42 2260857 +- 50349

Remove useless -mbmi flag in Makefile

I could not find anything documented that is necessary that prepending -mbmi to -mbmi2 gives some benefit.
Instead at
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions

The following built-in functions are available when -mbmi is used. All of them generate the machine instruction that is part of the name.
unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);

The following built-in functions are available when -mbmi2 is used. All of them generate the machine instruction that is part of the name.
unsigned int _bzhi_u32 (unsigned int, unsigned int)
unsigned int _pdep_u32 (unsigned int, unsigned int)
unsigned int _pext_u32 (unsigned int, unsigned int)
unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
unsigned long long _pext_u64 (unsigned long long, unsigned long long)

and at
https://gcc.gnu.org/ml/gcc/2014-02/msg00204.html

( "... The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.")

Apart from that we don't use all -msse -msse2 -msse3 -msse4.2 etc. but just -msse3 (or -msse4.2) only.

As regards to the speedup within noise level - this pull request is actually reversal of mcostalba#198 wherein prepending -mbmi to -mbmi2 was claimed to be 0.3% faster and here (removing -mbmi) gives 0.4% speed gain.

Isolated pawn simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 117822 W: 21697 L: 21744 D: 74381

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 92307 W: 12330 L: 12305 D: 67672

Bench: 8813983

Resolves #659

Use FMHs to assist with LMR formula.

STC:
LLR: 2.99 (-2.94,2.94) [0.00,5.00]
Total: 52232 W: 9654 L: 9304 D: 33274

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 115988 W: 15550 L: 15049 D: 85389

Bench: 7890808

Resolves #651

Use -O3 for all compilers (including ICC)

There seems to be no benefit from using -fast over -O3 with icc.
So use -O3 everywhere.

No functional change

Resolves #652

Remove some pointless micro-optimizations

Seems to give around 1% speed-up for CPUs with popcnt support.
Seems to give a very minor speed-up for CPUs without popcnt.

No functional change

Resolves #646

Fix incorrect draw detection

In this position we should have draw for repetition:

position fen rnbqkbnr/2pppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves g1f3 g8f6 f3g1
go infinite

But latest patch broke it.

Actually we had two(!) very subtle bugs, the first is that Position::set()
clears the passed state and in particular 'previous' member, so
that on passing setupStates, 'previous' pointer was reset.

Second bug is even more subtle: SetupStates was based on std::vector
as container, but when vector grows, std::vector copies all its contents
to a new location invalidating all references to its entries. Because
all StateInfo records are linked by 'previous' pointer, this made pointers
go stale upon adding more element to setupStates. So revert to use a
std::deque that ensures references are preserved when pushing back new
elements.

No functional change.

Add a second level of follow-up moves

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6438 W: 1229 L: 1077 D: 4132

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 4000 W: 605 L: 473 D: 2922

bench: 7378965

Resolves #636

StateInfo is usually allocated on the stack by search()

And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.

The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.

After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.

Also rewritten and simplified Position constructors.

Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654

No functional change.

Fix last search info carried over to mate position

When starting search in a mate or stalemate position, Stockfish does not
even care to reinitialize and start worker threads. However after search
all threads are checked for the best move.

This can lead to bestmove and info beeing carried over from the last
search.

Example session:

    setoption name threads value 7
    go movetime 4000
    position startpos moves f2f3 e7e5 g2g4 d8h4
    go movetime 4000

Actual output is like (almost always):

    [...]
    bestmove e2e4
    info depth 0 score mate 0
    info depth 20 seldepth 29 multipv 1 score cp 28 [...] pv e2e4
    bestmove e2e4

Expected output / output after fix:

    [...]
    bestmove e2e4 ponder e7e6
    info depth 0 score mate 0
    bestmove (none)

Resolves #623

Hide global visibility when not needed

Also move PieceValue definition in psqt.cpp,
where it is initialized.

Fix a warning in popcount16() with Intel compiler

No functional change.

Fix Travis Cl

Broken after "32-bit/64-bit Makefile fix" commit.

Ubuntu "Precise" 12.04.5 supports multilib only until
g++ 4.6 that is not enough to compile Stockfish.

So move to Ubuntu 14.04.4 LTS (Trusty Tahr)

No functional change.

Small passed pawn simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 21993 W: 4197 L: 4078 D: 13718

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 67213 W: 9135 L: 9077 D: 49001

Bench: 7482426

Resolves #622

Undefended King Ring

There was already a penalty for squares only defended by King (undefended)

This test records a penalty for completely undefended squares in the so called extended king-ring
(so if we exclude squares defended by a Kg8 for example, we only look at h6 g6 and f6)

We also exclude squares occupied by opponent pieces in this computation,
based on the following results

Was yellow at STC
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 112499 W: 20649 L: 20293 D: 71557

and passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 36805 W: 5100 L: 4857 D: 26848

Bench: 8430233

Resolves: #619

Backward simplication

On top of the usual conditions
a) some opponent in front (but no lever)
b) some neighbours (in front) (but no neighbour behind or same rank)
c) < rank_5

to find out if a pawn is backward we look at the squares in front of this pawn to reach the same rank as the next neighbour.

In current master, a pawn is backward if any of those squares is controlled by an enemy pawn on an adjacent file

In this version, a pawn is ALSO backward if any of those squares is occupied by an enemy pawn.

STC:
http://tests.stockfishchess.org/tests/view/56fe7efd0ebc59301a3541f1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 19051 W: 3557 L: 3433 D: 12061

LTC:
http://tests.stockfishchess.org/tests/view/56febc2d0ebc59301a354209
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40810 W: 5619 L: 5526 D: 29665

Bench: 7525245

Resolves #614

Simplify popcnt

Also a speedup(about 1%) on 64-bit w/o hardware popcnt

Retire Max15 and Full template parameters
(Contributed by Marco Costalba)

Now that we have just SW and HW versions, use
template default parameter to get rid of explicit
template parameters.

Retire bitcount.h and move the only defined
function to bitboard.h

No functional change

Resolves #620

32-bit/64-bit Makefile fix

Counter intuitively, make build ARCH=x86-32 does NOT produce a 32-bit compile
when running a 64-bit OS. Nor would ARCH=x86-64 produce a 64-bit compile when
running a 32-bit OS (assuming it compiled w/o errors).

No functional change

Resolves #621

A combo patch of two tuning patches

STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14223 W: 2700 L: 2494 D: 9029

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 66294 W: 9065 L: 8739 D: 48490

Bench: 7607385

Resolves #612

Guard against UB in lsb/msb

lsb(b) and msb(b) are undefined when b == 0. This can lead to subtle bugs, where
the resulting code behaves differently on different configurations:
- It can be the home grown software LSB/MSB
- It can be the compiler generated software LSB/MSB (when using compiler
  intrinsics without the right compiler flags to allow compiler to use hardware
  LSB/MSB). Which of course depends on the compiler.
- It can be hardware LSB/MSB generated by the compiler.
- Not to mention that hardware LSB/MSB can return different value on different
  hardware when b == 0.

No functional change

Resolves #610

Rewrite bsfq management

Use compiler intrinsics when possible to
avoid writing platform specific asm code.

Tested on Windows 7 with MSVC 2013 and mingw 4.8.3 (32 and 64 bit)
and on Linux Mint with g++ 4.8.4 and clang 3.4 (32 and 64 bit).

No functional change

Resolves #609

Bonus for loose enemies

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 30504 W: 5743 L: 5485 D: 19276

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 11936 W: 1651 L: 1493 D: 8792

Bench: 8880041

Resolves #606

Raise endgame passed pawn and material values

STC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 136149 W: 25213 L: 24588 D: 86348

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 54637 W: 7533 L: 7238 D: 39866

Bench: 8546808

Resolves #608

Simplify pawns King Safety calculation

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 130209 W: 23516 L: 23581 D: 83112

LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33541 W: 4563 L: 4460 D: 24518

Bench: 8644370

Resolves #604

A small simplification in movepick.h

No functional change

Resolves #597

Simplify Safe Checks

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 11796 W: 2211 L: 2074 D: 7511

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 14324 W: 1935 L: 1806 D: 10583

Bench: 8075202

Resolves #600

Assorted cleanup of latest commits

No functional change.

Resolves #601

Add followup moves history for move ordering

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7955 W: 1538 L: 1378 D: 5039

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 5323 W: 778 L: 642 D: 3903

Bench: 8261839

Resolves #599

Passed pawn bonus simplification

STC: (yellow)

LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 86114 W: 16063 L: 15921 D: 54130

LTC:

LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 14347 W: 2025 L: 1896 D: 10426

Bench: 8576437

Resolves #595

Tweak initiative formula

Give more weight to the pawns number and
the vertical king distance in evaluate_initiative()

Passed STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26729 W: 5067 L: 4825 D: 16837

and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 60480 W: 8338 L: 8016 D: 44126

Bench: 8295162

Resolves #594

Clean up depth reduction calculation

Might also be a slight speed up

No functional change

Resolves #593

Pass endgame value to evaluate_scale_factor()

No functional change

Resolves #592

Simplify Reduction Formula

Formula now only contains one coefficient. Making it much easier to tune.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 187443 W: 34858 L: 35028 D: 117557

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 88329 W: 11982 L: 11953 D: 64394

Bench: 7521394

Resolves #591

Revert "Remove slowMover"

This reverts commit 77fa960f8923ca83ba0391835d50f4230ac6a345.

Resolves #590

Remove slowMover

Removes a slowMover and one paramater from move_importance function.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 77023 W: 14456 L: 14433 D: 48134

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 37175 W: 5190 L: 5092 D: 26893

Resolves #589

History Stat Formula Simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 67476 W: 12561 L: 12521 D: 42394

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 111923 W: 15147 L: 15149 D: 81627

Bench: 8430465

Resolves #588

Fix futility pruning bug

PredictedDepth can be negative, causing the futility_margin to be negative.
It will be very difficult to tweak moveCount pruning and reduction formula, as they are tuned to prevent this behavior.

No functional change

Resolves #587

Remove Weights

Removed remaining redundant weights for pawn structure,
passed pawns, space and king safety by redistributing them
into individual evaluation terms.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15173 W: 2790 L: 2659 D: 9724

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 43433 W: 5936 L: 5846 D: 31651

Bench: 7156237

Resolves #586

Document HalfDensityMap

No functional change.

Resolves #584

Time management simplification

10+0.1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 41963 W: 7967 L: 7883 D: 26113

60+0.6:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 132314 W: 17939 L: 17969 D: 96406

Resolves #580

rotating symmetric patterns with increasing skipsize

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00] sprt @ 5+0.1 th 21
Total: 7068 W: 1121 L: 975 D: 4972

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00] sprt @ 12+0.12 th 21
Total: 26691 W: 3594 L: 3481 D: 19616

No functional change with a single thread

Resolves #574

Do not probe syzygy bases when castling is possible

Almost no functional change. Bench is unchanged.

Resolves #230
Resolves #573

Retire RootNode template

There is no reason to compile 3 different copies of search(). PV nodes are on
the cold path, and PvNode is a template parameter, so there is no cost in
computing:

const bool RootNode = PvNode && (ss-1)->ply == 0;

And this simplifies code a tiny bit as well.

Speed impact is negligible on my machine (i7-3770k, linux 4.2, gcc 5.2):

            nps   +/-
test    2378605  3118
master  2383128  2793
diff      -4523  2746

Bench: 7751425

No functional change.

Resolves #568

Depth margin parameter-tweak in TT-save

Verified that is improvement with multiple threads:

LLR: 2.95 (-2.94,2.94) [0.00,4.00] sprt @ 30+0.3 th 3
Total: 14817 W: 2103 L: 1915 D: 10799

LLR: 2.96 (-2.94,2.94) [0.00,4.00] sprt @ 15+0.15 th 7
Total: 10264 W: 1498 L: 1321 D: 7445

Verified that is not a significant regression with a single thread:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00] sprt @ 60+0.6 th 1
Total: 23975 W: 3294 L: 3210 D: 17471

Resolves #575

Remove redundant -std=c++0x flag

This flag is functionally identical to '-std=c++11' flag which
is part of standard flags.

No functional change

Resolves #571

Makefile: Allow specifying compiler executable

No functional change

Resolves #570

Rewrite time formula

Time management is really too complex, our aim is
to simplify it, but for time being at least rewrite
in an understandable way.

No functional change.

Assorted English grammar changes

No functional change

Resolves #567

Adjust reductions based on history and cmh tables

STC:
LLR: 4.06 (-2.94,2.94) [0.00,5.00]
Total: 149395 W: 28029 L: 27208 D: 94158

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 9628 W: 1368 L: 1217 D: 7043

bench: 8076724

Resolves #565

Update comments in LMR step

No functional change

Resolves #564

Tune time management for LTC

60+0.6:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 102533 W: 14270 L: 13842 D: 74421

Resolves #558

Retire CenterBind

And compensate in the PSQT.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27714 W: 5161 L: 5052 D: 17501

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36354 W: 5008 L: 4909 D: 26437

Bench: 8603285

Resolves #556

Fine tuning of unsupported pawn penalty

Adjust the unsupported pawn penalty when the pawn is supporting 2 pawns
(for example g7 in f6-g7-h6)

Passed STC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 23833 W: 4384 L: 4158 D: 15291

Passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 42711 W: 5918 L: 5655 D: 31138

Bench: 8390233

Resolves #549

Adjust time used for move based on previous score

Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration.

STC: 10+0.1
ELO: 5.37 +-2.9 (95%) LOS: 100.0%
Total: 20000 W: 3832 L: 3523 D: 12645

STC: 10+0.1
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 17527 W: 3334 L: 3132 D: 11061

LTC: 60+0.6
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 28233 W: 3939 L: 3725 D: 20569

LTC: 60+0.6
ELO: 2.43 +-1.4 (95%) LOS: 100.0%
Total: 60000 W: 8266 L: 7847 D: 43887

LTC: 60+0.06
LLR: 2.95 (-2.94,2.94) [-1.00,3.00]
Total: 38932 W: 5408 L: 5207 D: 28317

Resolves #547

Restore development version

Stockfish 7

Bench: 8355485

No functional change

Update AUTHORS and copyright notice

No functional change

Resolves #555

Update Copyright year

No functional change.

Resolves #554

Stockfish 7 Beta 2

Bench: 8355485

No functional change

Correct Pawn Trace Score + Code Clean up

No functional change

Resolves #542

Fix assert with very high score position

In case of a very high material score, we can
overflow VALUE_INFINITE.

This patch fixes an assert with:

position fen 7k/QQQQR3/2B5/4KN1Q/3QQ3/8/8/4R3 b - - 0 1
go depth 1

No functional change.

Resolves #546

Stockfish 7 Beta 1

Bench: 8355485

No functional change

Move some globals into main thread scope

Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.

No functional change.

Resolves #537

Revert "Fix compiling of 32 bit binary on 64-bit Windows"

This reverts commit 1e8836d921b3

Broken compile on mingw under Windows:

Config:
debug: 'yes'
optimize: 'yes'
arch: 'i386'
bits: '32'
prefetch: 'yes'
bsfq: 'no'
popcnt: 'no'
sse: 'yes'
pext: 'no'

Flags:
CXX: i686-w64-mingw32-c++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -Wextra -Wshadow -g -O3 -msse
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

mingw32-make[1]: Leaving directory 'C:/stockfish/src'
c:/MinGw/bin/mingw32-make ARCH=x86-32 COMP=mingw all
mingw32-make[1]: Entering directory 'C:/stockfish/src'
sh: C:\Program: No such file or directory
i686-w64-mingw32-c++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -Wextra -Wshadow -g -O3 -msse   -c -o benchmark.o benchmark.cpp
<builtin>: recipe for target 'benchmark.o' failed
process_begin: CreateProcess(NULL, i686-w64-mingw32-c++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -Wextra -Wshadow -g -O3 -msse -c -o benchmark.o benchmark.cpp, ...) failed.
make (e=2): Impossibile trovare il file specificato.

mingw32-make[1]: *** [benchmark.o] Error 2
mingw32-make[1]: Leaving directory 'C:/stockfish/src'
makefile:401: recipe for target 'build' failed
mingw32-make: *** [build] Error 2

No functional change.

Fix compiling of 32 bit binary on 64-bit Windows

Two versions of mingw-w64 (targeting Win64 and Win32)
can be installed on Windows too.

No functional change

Resolves #532

Remove another unnecessary Search::Stack field

No functional change

Resolves #535

New mobility bonus

Tuned the global mobility factor for each piece, as well as some +- delta,

The master mobility factor was {266,334} and tuning gave
{267, 362} +S(-2,-2) for the Knight
{249, 328} +S( 0,-2) for the Bishop
{298, 353} +S(1,1) for the Rook
{265, 358} +S(2,-1) for the Queen

Passed STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 49402 W: 9367 L: 9037 D: 30998

and LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 26831 W: 3871 L: 3658 D: 19302

Bench: 8355485

Resolves #536

Remove killer move conditions from LMR

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8459 W: 1619 L: 1477 D: 5363

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 32239 W: 4404 L: 4299 D: 23536

Bench: 7597031

Resolves #534

Remove unused field SearchStack::ttMove

No functional change

Resolves #533

Distinct iteration paths for Lazy SMP threads

STC 5+0.1, threads 7
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 6026 W: 1047 L: 901 D: 4078

LTC: 20+0.2, threads 7
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 19739 W: 2910 L: 2721 D: 14108

STC 5+0.1, threads 20
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 2493 W: 462 L: 331 D: 1700

LTC 30+0.3, threads 20
ELO: 8.86 +-3.7 (95%) LOS: 100.0%
Total: 8000 W: 1076 L: 872 D: 6052

Bench: 8012530

Resolves #525

Fix easy move bug in SMP mode

Fix a bug where we could stop the search after only 10% of time used due to a matching easy move but later switch to a different move that was never pre-screened as easy due to SMP thread select.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27227 W: 4910 L: 4800 D: 17517

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40368 W: 5826 L: 5733 D: 28809

Resolves #521

Threats retuned

STC:

LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 45239 W: 8913 L: 8591 D: 27735

LTC:

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 21046 W: 3200 L: 2989 D: 14857

Bench: 8012530

Resolves #526

Simplify time management and fix 'ponder on' bug

Simplify time management code by removing hard stops for unchanging first root moves.
Search is now stopped earlier at the end iteration if it did not have fail-lows at root.

This simplification also fixes pondering bug. Ponder flag was true by default
and cutechess-cli doesn't change it to false even though no pondering is possible.
Fix the issue by setting the default value of 'Ponder' flag to false.

10+0.1:
ELO: 3.51 +-3.0 (95%) LOS: 99.0%
Total: 20000 W: 3898 L: 3696 D: 12406

40+0.4:
ELO: 1.39 +-2.7 (95%) LOS: 84.7%
Total: 20000 W: 3104 L: 3024 D: 13872

60+0.06:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 37231 W: 5333 L: 5236 D: 26662

Stopped run at 100+1:
LLR: 1.09 (-2.94,2.94) [-3.00,1.00]
Total: 37253 W: 4862 L: 4856 D: 27535

Resolves #523
Fixes #510

Fix MultiPv and Skill in SMP.

7 threads, 5+0.1:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 55460 W: 9665 L: 9601 D: 36194

No functional change in normal playing mode

New Tuned Weights

More accurate evaluation weights

Performed better at STC

LLR: 1.32 (-2.94,2.94) [0.00,4.00]
Total: 190043 W: 37433 L: 36675 D: 115935

Passed LTC

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 30157 W: 4540 L: 4303 D: 21314

Bench: 9264977

Resolves #515

Simplify outpost code

Also inline defintions of SpaceMask and CenterBindMask.

Verified from assembly that compiler computes the values
at compile time, so it is also theoretical faster.

While there factor out scale factor evaluation.

No functional change.

Proper Makefile for cross compiling 64 or 32 bit PGO + LTO + static Windows binaries under Linux.

No functional change

Resolves #511

Introduce new Threats weights = {350, 256}

Raise the midgame threats weight by 37%.

Passed STC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 8165 W: 1675 L: 1487 D: 5003

and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 28181 W: 4141 L: 3912 D: 20128

Bench: 7824961

Resolves #512

Revert "Allow cross compilation of Windows binaries on a Linux system"

This reverts commit 388630ae285b3f9f0c8ee4f30e754bde6688c57c.

Confuses fishtest build system

Allow cross compilation of Windows binaries on a Linux system

that are PGO, LTO, and statically linked.
Credit: pasquale....@gmail.com

No functional change

Resolves #505

Clean up RootMove less operator

This is used by std::stable_sort() to sort moves from highest score to lowest score.

1) The comment is incorrect since highest to lowest means descending.
2) It's more natural to implement a less operator using another less operator rather than a greater operator.

No functional change.

Resolves #504

Fix TT comment and static_assert()

Comment is based on a misunderstanding of what unaligned memory access is. Here
is an article that explains it very clearly:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

No matter how we define TTEntry or TTCluster, there will never be any unaligned
memory access. This is because the complier knows the alignment rules, and does
the necessary adjustments to make sure unaligned memory access does not occur.

The issue being adressed here has nothing to do with unaligned memory access. It
is about cache performance. In order to achieve best cache performance:
- we prefetch the cacheline as soon as possible.
- we ensure that TT clusters do not spread across two cachelines. If they did,
we would need to prefetch 2 cachelines, which could hurt cache performance.

Therefore the true conditions to achieve this are:
1/ start adress of TT is cache line aligned. void TranspositionTable::resize()
enforces this.
2/ TT cluster size should *divide* the cache line size. Currently, we pack 2
clusters per cache lines. It used to be 1 before "TT sardines". Does not matter
what the ratio is, all we want is to fit an integer number of clusters per cache
line.

No functional change.

Resolves #506

Rewrite how threads are spawned

Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.

History Pruning: Don't prune the main killer move.

Also increased pruned depth to 4.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 23380 W: 4581 L: 4350 D: 14449

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 28934 W: 4329 L: 4105 D: 20500

Bench: 8369743

Resolves #498

Do not conceal the invocation of the benchmark program

It is better to be able to see what arguments it is being called with.

No functional change

Resolves #497

Bonus for reachable outpost

Give a bonus for outpost squares which in reach of a bishop or knight.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 22725 W: 4570 L: 4339 D: 13816

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 15019 W: 2333 L: 2157 D: 10529

Bench: 8503181

Resolves #495

Retire ThreadBase

Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.

Fix broken UCI 'wait for stop'

When we reach the maximum depth, we can finish the
search without a raise of Signals.stop. However, if
we are pondering or in an infinite search, the UCI
protocol states that we shouldn't print the best move
before the GUI sends a "stop" or "ponderhit" command.

It was broken by lazy smp. Fix it by moving the stopping
of the threads after waiting for GUI.

No functional change.

Avoid friend

operator<<(os, pos) does not need to access any private members of pos.

No functional change.

Resolves #492

Ensure that rootDepth < DEPTH_MAX

Indeed, if we use a depth >= DEPTH_MAX, we start having negative depth in the
TT (due to int8_t cast).

No functional change in single thread mode

Resolves #490

Get rid of timer thread

Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247

Pick bestmove from the deepest thread.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26930 W: 4441 L: 4214 D: 18275

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7783 W: 1017 L: 876 D: 5890

No functional change in single thread mode

Resolves #485

Assorted trivia in search.cpp

The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.

New History Bonus Formula

bonus = d^2 + d - 1

Bench: 8639247

Resolves #484

Reduce variation in rootDepth between different threads

Reduce the variation in Root Depth between different threads. This
prevents threads from searching at a depth much higher than Main Thread.

Performed well at STC 24 Threads:
ELO: 3.44 +-3.8 (95%) LOS: 96.1%
Total: 10000 W: 1627 L: 1528 D: 6845

And LTC 24 Threads
LLR: 1.43 (-2.94,2.94) [0.00,4.00]
Total: 3804 W: 500 L: 420 D: 2884
ELO : +7.31
p-value: 73.16%

Passed no regression at STC 3 Threads:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40457 W: 7148 L: 7060 D: 26249

And LTC 3 Threads:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 17704 W: 2489 L: 2364 D: 12851

Raising a pull request early as 24 Thread tests are very expensive and
this is clearly a positive gain at high thread counts and high time
controls. The change is a small parameter tweak with no additional
logic.

No functional change for single thread mode.

Resolves #481

Some code and comment cleanup

- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves #478

Use atomics instead of volatile

Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves #474

KRPPKRP endgame: Simplify ugly switch statement

No functional change

Resolves #470

Cleanup history stats

And other assorted trivia.

No functional change.

Simplify threats

Using less parameters and code to compute Threats
Includes also a few spacing edits.

Run as a simplification.

Passed STC 10+0.1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 18879 W: 3725 L: 3600 D: 11554

Passed LTC 60+0.4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74116 W: 11001 L: 10958 D: 52157

bench: 8004751

History pruning

Prune moves with negative History
and CMH scores at low depth.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 24182 W: 4672 L: 4439 D: 15071

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 12579 W: 1959 L: 1792 D: 8828

bench: 8907701