Marco Costalba [Thu, 19 Feb 2015 09:08:29 +0000 (10:08 +0100)]
Add a couple of asserts to late join
Document and clarify that we cannot rejoin on ourselves
and that we never late join if we are master and all
slaves have finished, inded in this case we exit idle_loop.
Marco Costalba [Thu, 19 Feb 2015 08:51:17 +0000 (09:51 +0100)]
Remove useless condition in late join
In case of Threads.size() == 2 we have that sp->allSlavesSearching
is always false (because we have finished our search), bestSp is
always NULL and we never late join, so there is no need to special
case here.
Tested with dbg_hit_on(sp && sp->allSlavesSearching) and
verified it never fires.
Marco Costalba [Tue, 17 Feb 2015 09:10:58 +0000 (10:10 +0100)]
Compute SplitPoint::spLevel on the fly
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.
Verified with
dbg_hit_on(sp->spLevel != level)
that the values are 100% the same out of more 50K samples.
Joona Kiiski [Sat, 14 Feb 2015 20:46:00 +0000 (20:46 +0000)]
Improve smp performance for high number of threads
Balance threads between split points.
There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:
For Bravone (1000 games): 0 ELO
For Glinscott (1000 games): +20 ELO
For bKingUs (1000 games): +50 ELO
For fastGM (1500 games): +50 ELO
The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861
Finally it was verified that there was no (significant) regression for
mstembera [Tue, 3 Feb 2015 03:09:37 +0000 (11:09 +0800)]
Profile build options
I went through all the individual compile options that differ between
-fprofile-generate/-fprofile-use and -fprofile-arcs/-fbranch-probabilities
and distilled the speed difference down to only turning off
-fno-peel-loops and -fno-tracer. Using this we still get the full speedup
(maybe a bit more because other optimizations stay on) and it's also much cleaner
because we can get rid of the "@rm -f ucioption.gc*" hack for all versions of gcc.
Marco Costalba [Sun, 25 Jan 2015 18:22:43 +0000 (19:22 +0100)]
Simplify skill level and reduce ELO
This patch has two positive effects:
- Retire a hackish formula and leave
just a natural, simple and plain one.
- Reduce strenght at very low level, but
don't impact medium/high levels.
Actually even at level 0, SF is still too
strong for many beginners (this was reported
many times for instance on Droidfish user
comments on Google Play).
Test on fishtest shows that ELO drop is around
170 ELO at level 0 (good!), 130 ELO at level 1
and smoothly reduces (as expected) until level
10 where the drop is just of 8 ELO.
Joona Kiiski [Sun, 25 Jan 2015 22:03:57 +0000 (22:03 +0000)]
Stockfish 6 Release Candidate 3
- Fix a skill level problem: Don't allow move pruning at root node
- Revert "Fix profile build for gcc on Mac OSX". Results for a faster binary in x86-64.
- Fix a MSVC warning
Marco Costalba [Tue, 20 Jan 2015 21:17:22 +0000 (22:17 +0100)]
Don't use _pext_u64() directly
This intrinsic to call BMI2 PEXT instruction is
defined in immintrin.h. This header should be
included only when USE_PEXT is defined, otherwise
we define _pext_u64 as 0 forcing a nop.
But under some mingw platforms, even if we don't
include the header, immintrin.h gets included
anyhow through an include chain that starts with
STL <algorithm> header. So we end up both defining
_pext_u64 function and at the same time defining
_pext_u64 as 0 leading to a compile error.
The correct solution is of not using _pext_u64 directly.
This patch fixes a compile error with some mingw64
package when compiling with x86-64.
Marco Costalba [Tue, 20 Jan 2015 08:13:30 +0000 (09:13 +0100)]
Try hard to retrieve a ponder move
In case we stop the search during a fail-high
it is possible we return to GUI without a ponder
move. This patch try harder to find a ponder move
retrieving it from TT. This is important in games
played with 'ponder on'.
Marco Costalba [Sat, 3 Jan 2015 09:51:38 +0000 (10:51 +0100)]
Assorted work in uci.cpp
- Change UCI::value() signature
This function should only return the value,
lowerbound and upperbound info is up to the
caller because it requires external knowledge,
out of the scope of this little helper.
- Retire 'key' command
It is not an UCI command and is absolutely
useless: never used.
All king safety related terms (shelterweakness, stormdanger,
attackunits, ..) was tuned together. Additionally for attack units a
finer granularity (factor 4) is used.
ShelterWeakness and Stormdanger array are now indexed additionally by
file pair (a/h,b/g,c/f,d/e). The special case of king blocking a pawn
is incorporated in the StormDanger array. Finally the 93 parameters
are tuned by SPSA on LTC.
mstembera [Thu, 18 Dec 2014 19:56:00 +0000 (03:56 +0800)]
Change profile-build options to produce 1% to 2% faster executables.
The "@rm ucioption.gc*" line is necessary to avoid a gcc 4.7.x bug.
Confirmed for gcc 4.7.4, 4.8.1, and 4.9.1
Suggested by Kiran Panditrao on fishcooking forum.
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/AY8gN53nG18
Marco Costalba [Sat, 13 Dec 2014 08:27:39 +0000 (09:27 +0100)]
Coding style in TT code
In particular seems more natural to return
bool and TTEntry on the same line, actually
we should pass and return them as a pair,
but due to limitations of C++ and not wanting
to use std::pair this can be an acceptable
compromise.
Ernesto Gatti [Mon, 8 Dec 2014 00:10:57 +0000 (08:10 +0800)]
Simpler PRNG and faster magics search
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.
The patch also simplifies how init_magics() searches for magics:
- Old logic: seed the PRNG always with the same seed,
then use optimized bit rotations to tailor the RNG sequence per rank.
- New logic: seed the PRNG with an optimized seed per rank.
This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:
64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls
32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls
In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).
Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.
The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).
The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:
1. All triplets of XORed keys are unique, implying that it
would take at least 7 keys to find a 64-bit collision
(test suggested by ceebo)
2. All pairs of XORed keys are unique modulo 2^32
3. The cardinality of { (key1 ^ key2) >> 48 } is as close
as possible to the maximum (65536)
Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.
Details:
Bitset card(key1^key2)
------ ---------------
RKISS
key16 64894 = 99.020% of theoretical maximum
low18 180117 = 99.293%
low32 305362 = 99.997%
Marco Costalba [Sun, 30 Nov 2014 13:59:09 +0000 (14:59 +0100)]
Explicitly pass RootMoves to TB probes
Currently Search::RootMoves is accessed and even
modified by TB probing functions in a hidden
and sneaky way.
This is bad practice and makes the code tricky.
Instead explicily pass the vector as function
argument so to clarify that the vector is modified
inside the functions.