git.sesse.net Git - stockfish/log

Movepicker: take move's loop out of switch statement

This not only cleans up the code but gives another
speed boost of 1.8%

From revision 595a90dfd0 we have increased pgo compiled binary
speed of a whopping +5.2% without any functional change !!

This is really awsome considering that we have also
cut line count by 25 lines.

Sometime we spend days for getting an extra 1% from move
generation while instead the biggest optimizations come
from anonymous and apparently dull parts of the code.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Revert "null move reorder" series

Does not seem to improve on the standard, latest results
from Joona after 2040 games are negative:

Orig - Mod: 454 - 424 - 1162

And is more or less the same I got few days ago.

So revert for now.

Verified same functionality of 595a90dfd

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Convert handling of tt moves and killers to standard form

Use the same way of loop along the move list used for
the others move kinds so to be consistent in get_next_move()

And a bit of the usual clean up too, but just a bit.

It is even a bit (+0.3%) faster now. ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Try null move before captures

Always after TT move but before captures.

This seems a better setup against version before this
patch.

After 999 games at 1+0

Mod - Orig +252 =527 -220 +11 ELO

Unfortunatly it does not seems to improve on the standard
version, with null move outside of movepicker (595a90df) with
the latest speed-up patches added in.

After 999 games at 1+0

Mod - Standard +244 =506 -249 -2 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use pointers instead of array indices in MovePicker

This avoids calculating the array entry position
at each access and gives another boost of almost 1%.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Change the flow in wich moves are generated and picked

In MovePicker we get the next move with pick_move_from_list(),
then check if the return value is equal to MOVE_NONE and
in this case we update the state to the new phase.

This patch reorders the flow so that now from pick_move_from_list()
renamed get_next_move() we directly call go_next_phase() to
generate and sort the next bunch of moves when there are no more
move to try. This avoids to always check for pick_move_from_list()
returned value and the flow is more linear and natural.

Also use a local variable instead of a pointer dereferencing in a
time critical switch statement in get_next_move()

With this patch alone we have an incredible speed up of 3.2% !!!

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Disable again null move at depth == OnePly

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use special null move technique in low depth.

Try good captures before null move when depth < 3 * OnePly.
Use this kind of null move also in Depth == OnePly.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use nullMove only through MovePicker.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Add Null move support to MovePicker.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Create useNullMove local variable

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Clean killers handling in movepicker

Original patch from Joona with added optimizations
by me.

Great cleanup of MovePicker with speed improvment of 1%

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Micro-optimze extension()

Explicitly write the conditions for pawn to 7th
and passed pawn instead of wrapping in redundant
helpers.

Also retire the now unused move_is_pawn_push_to_7th()
and the never used move_was_passed_pawn_push() and
move_is_deep_pawn_push()

Function extension() is so time critical that this
simple patch speeds up the pgo compile of 0.5% and
it is also more clear what actually happens there.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Merge branch 'master' of git-Stockfish@free2.projectlocker.com:sf

Remove a local variable from pop_1st_bit()

Remove the 'b' uint32_t local variable.
Optimized assembly is more or less the same
(one 'mov' instruction less), but now it is
written in a way more similar to the final assembly
flow so it should be easier for compiler to optimize.

Also guarantee that BitTable[] is always aligned.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Poly ampli+bias values after 73831 games

Verified correct against tuning branch.

After 999 games at 1+0

Mod vs Orig +257 =510 -232 51.20% +9 ELO

Very small increase but an increase anyway !

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Added a few new targets to the Makefile for OS X with icpc.

The following new targets were added:
   * osx-icc32: 32-bit x86 compiled with icpc.
   * osx-icc64: 64-bit x86 compiled with icpc.
   * osx-icc32-profile: 32-bit x86 compiled with icpc and pgo.
   * osx-icc64-profile: 64-bit x86 compiled with icpc and pgo.

Fix some asserts raised by is_ok()

There were two asserts.

The first was raised because is_ok() was called at the
beginning of do_castle_move() and this is wrong after
the last code reformatting because at that point the state
is already modified by the caller do_move().

The second, raised by debugIncrementalEval, was due to a
rounding error in compute_value() that occurs because
TempoValueEndgame was updated in an odd number by patch

"Merge Joona Kiiski evaluation tweaks" (3ed603cd) of 13/3/2009

This line in compute_value() is the guilty one:

result += (side_to_move() == WHITE)? TempoValue / 2 : -TempoValue / 2;

The fix is to increment TempoValueEndgame so to be even.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Fixed incorrect material key update when making promotion moves.

More use of memset() in Position::clear()

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Little do_move() micro optimizations

Also a few remaining style touches.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Better clarify how pieceList[] and index[] work

Rearrange the code a bit to be more self-documenting.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Unify patch series summary

This patch seems bigger then what actually is.

It just moves some code around and adds a bit of coding style fixes
to do_move() and undo_move() so to have uniformity of naming in both
functions.

The diffstat for the whole patch series is

239 insertions(+), 426 deletions(-)

And final MSVC pgo build is even a bit faster:

Before 448.051 nodes/sec

After 453.810 nodes/sec (+1.3%)

No functional change (tested on more then 100M of nodes)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Unify undo_ep_move(m)

Integrate undo_ep_move in undo_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Unify undo_promotion_move()

Integrate do_ep_move in undo_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Unify do_promotion_move()

Integrate do_promotion_move() in do_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Unify do_ep_move()

Integrate do_ep_move in do_move() this reduces line count
and code readibility.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

L1/L2 friendly PhaseTable[]

In Movepicker c'tor we access during initialization one of
MainSearchPhaseIndex..QsearchWithoutChecksPhaseIndex globals.

Postpone definition of PhaseTable[] just after them so that
when PhaseTable[] will be accessed later in get_next_move()
it will be already present in L1/L2.

It works like an implicit prefetching of PhaseTable[].

Also shrink PhaseTable[] to fit an L1 cache line of 16 bytes
using uint8_t instead of int.

This apparentely innocuous patch gives an astonish speed
up of 1.6% under MSVC 2010 beta, pgo optimized !

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use optimized pop_1st_bit() under Windows 64 with icc

Intel compiler can handle this code even under Windows.

So lift the costrain.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Better naming and document some endgame functions

In particular the generic scaling functions.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Finally fix prefetch on Linux

It was due to a missing -msse compiler option !

Without this option the CPU silently discards
prefetcht2 instructions during execution.

Also added a (gcc documented) hack to prevent Intel
compiler to optimize away the prefetches.

Special thanks to Heinz for testing and suggesting
improvments. And for Jim for testing icc on Windows.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Reuse 5 slots instead of 4

But this time with the guarantee of an always aligned
access so that prefetching is not adversely impacted.

On Joona PC
1+0, 64Mb hash:

Orig - Mod: 174 - 237 - 359

Instead after 1000 games at 1+0 with 128MB hash size
we are at + 1 ELO (just 4 games of difference).

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Double prefetch on Windows

After fixing the cpu frequency with RightMark tool I was
able to test speed all the different prefetch combinations.

Here the results:

OS Windows Vista 32bit, MSVC compile
CPU Intecl Core 2 Duo T5220 1.55 GHz
bench on depth 12, 1 thread, 26552844 nodes searched
results in nodes/sec

no-prefetch
402486, 402005, 402767, 401439, 403060

single prefetch (aligned 64)
410145, 409159, 408078, 410443, 409652

double prefetch (aligned 64) 0+32
414739, 411238, 413937, 414641, 413834

double prefetch (aligned 64) 0+64
413537, 414337, 413537, 414842, 414240

And now also some crazy stuff:

single prefetch (aligned 128)
410145, 407395, 406230, 410050, 409949

double prefetch (aligned 64) 0+0
409753, 410044, 409456

single prefetch (aligned 64) +32
408379, 408272, 406809

single prefetch (aligned 64) +64
408279, 409059, 407395

So it seems the best is a double prefetch at the addres + 32 or +64,
I will choose the second one because it seems more natural to me.

It is still a mystery why it doesn't work under Linux :-(

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Avoid Intel compiler optimizes away prefetching

Without this hack Intel compiler happily optimizes
away the gcc builtin call.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use aligned prefetch address

Prefetch always form a chache line boundary. It seems
that if prefetch address is not cache line aligned then
performance is adversely impacted.

Hopefully we will resuse that 32 bits of padding for something
useful in the future.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Remove old BishopPairBonus constants

Now that we have poly imbalance these ones
are no more used.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Enable prefetch also for gcc

This fix a compile error under Linux with gcc when
there aren't the intel dev libraries.

Also simplify the previous patch moving TT definition
from search.cpp to tt.cpp so to avoid using passing a
pointer to TT to the current position.

Finally simplify do_move(), now we miss a prefetch in the
rare case of setting an en-passant square but code is
much cleaner and performance penalty is almost zero.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Try to prefetch as soon as position key is ready

Move prefetching code inside do_move() so to allow a
very early prefetching and to put as many instructions
as possible between prefetching and following retrieve().

With this patch retrieve() times are cutted of another 25%

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Add TT prefetching support

TT.retrieve() is the most time consuming function
because almost always involves a very slow RAM access.

TT table is so big that is never cached. This patch
prefetches TT data just after a move is done, so that
subsequent TT.retrieve will be very fast.

Profiling with VTune shows that TT:retrieve() times are
almost cutted in half !

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use 5 TTEntry slots instead of 4

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use 32 bit key in TT

Shrink key to 32 bits instead of 64. To still avoid
collisions use the high 32 bits of position key as the
TT key and the low 32 bits to retrieve the correct
cluster index in the table.

With this patch size og TTentry shrinks to 96 bits instead
of 128 and the cluster of 4 TTEntry sums to 48 bytes instead
of 64.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Makefile: added 'make strip' target

Binaries are always built with symbol table in to easy
debugging and profiling.

It is now possible to run:

make strip

To remove symbol table from the compiled binary. This
could be useful to prepare the release version.

Patch by Heinz van Saanen.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Let LMR at root be independent of MultiPV value

Current formula enable LMR when

i + MultiPV >= LMRPVMoves

It means that, for instance, if MultiPV == 1 then LMR
will be started to be considered at move i = LMRPVMoves - 1,
while if MultiPV == 3 then it will start before,
at move i = LMRPVMoves - 3.

With this patch the formula becomes

i >= MultiPV + LMRPVMoves - 2

So that LMR will always start after LMRPVMoves - 1 moves
from the last PV move.

No functional change when MultiPV == 1

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Speed up polynomial material imbalance loop

Access pos.piece_count() only once and avoid some
branches in the inner loop.

Profiling with VTune shows a 20% speed improvement in
get_material_info(), and it is also a bit more cleaned
up this way ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

There is no need to special case KNNK ending

It is always draw, so use the corresponding proper
evaluation function.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Move halfOpenFiles[] calculation out of a loop

And put it in an already existing one so to
optimze a bit.

Also additional cleanups and code shuffles
all around the place.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Compile without DEBUG flag by default

And build also symbol table. It can easily stripped
after .exe is done and it is necessary for profiling.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Revert material balance values after 100000 games

After Joona's direct testing with ~2000 games it seems
values after 100.000 games does not give any advantage,
so revert for now.

Score of Stockfish_0 vs Stockfish_15: 491 - 392 - 1102
Score of Stockfish_0 vs Stockfish_40: 461 - 439 - 1076
Score of Stockfish_0 vs Stockfish_65: 442 - 518 - 1018 (13 elo)
Score of Stockfish_0 vs Stockfish_100: 504 - 502 - 984

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Do not adjust Minimum Split Depth automatically

Currently minimum split depth is set automatically to 6
when number of CPUs is more than 4. I believe this is a bad
idea since for example my quad (4CPU with hyperthreading) is
detected as 8CPU computer. I've manually lowered down the number
of Threads, but so far I have played all games with Minimum
Split Depth set to 6!

Since 4CPU computers with hyperthreading are quite common and
8 CPU computers extremely rear (I expect we can get a direct
jump to 16 or 32 cores), this automatic adjusting is likely
to do more harm than good. Add a note in Readme.txt, so that
those rear 8CPU owners can manually tweak the "Minimum Split
Depth" parameter

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Polished Makefile for *nix

Greately improved Makefile from Heinz van Saanen

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Supply the "upperbound" and "lowerbound" parameters in UCI search
output when the score is outside the root window.

Fixed a bug in PV extraction from the transposition table: The
previous used move_is_legal to verify that the move from the TT
was legal, and the old version of move_is_legal only works when
the side to move is not in check. Fixed this by adding a separate,
slower version of move_is_legal which works even when the side to
move is in check.

Moved the code for extracting the PV from the TT to tt.cpp, where
it belongs.

Added a new function build_pv(), which extends a PV by walking
down the transposition table.

When the search was stopped before a fail high at the root was
resolved, Stockfish would often print a very short PV, sometimes
consisting of just a single move. This was not only a little
user-unfriendly, but also harmed the strength a little in
ponder-on games: Single-move PVs mean that there is no ponder
move to search.

It is perhaps worth considering to remove the pv[][] array
entirely, and always build the entire PV from the transposition
table. This would simplify the source code somewhat and probably
make the program infinitesimally faster, at the expense of
sometimes getting shorter PVs or PVs with rubbish moves near
the end.

Initial work towards adjustable playing strength.

Added the UCI_LimitStrength and the UCI_Elo options, with an Elo
range of 2100-2900. When UCI_LimitStrength is enabled, the number
of threads is set to 1, and the search speed is slowed down according
to the chosen Elo level.

Todo:

1. Implement Elo levels below 2100 by blundering on purpose and/or
   crippling the evaluation.
2. Automatically calibrate the maximum Elo by measuring the CPU speed
   during program initialization, perhaps by doing some bitboard
   computations and measuring the time taken.

No functional change when UCI_LimitStrength is false (the default).

Added LMR at the root.

After 2000 games at 1+0

Mod vs Orig +534 =1033 -433 52.525% 1050.5/2000 +18 ELO

Remove useless mate value special handling in null search

After 1200 games (1CPU), time control 1+0:

Mod vs Orig: +331 =564 -277 +16 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Yet another small touch to endgame functions handling

It is like a never finished painting. Everyday a little touch
more.

But this time it is very little ;-)

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Remove unused members in Application class

Also rearrange a bit the remining methods.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Fix a spurious extra space

This morning it seems there is nothing better to do...

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Micro optimize extension() in search.cpp

Small micro-optimization in this very
time critical function.

Use bitwise 'or' instead of logic 'or' to avoid branches
in the assembly and use the result to skip an handful of checks.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Polynomial material balance after 100.000 games

Verified it is equivalent to the tuning branch results
with parameter values sampled after 100.000 games.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Revert Makefile changes

Some unwanted changes to Makefile slept in in patch
"Introduced the UCI_AnalyseMode option".

Revert them. No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Simplify king shelter cache handling

This is more similar to how get_material_info() and
get_pawn_info() work and also removes some clutter from
evaluate_king().

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Delay costly SEE call during captures ordering in MovePicker

When ordering moves we push all captures with negative SEE values
to badCaptures[] array during the scoring phase.

This patch delays the costly SEE call up to when the move has been
picked up in pick_move_from_list(), this way we save some SEE calls
in case we get a cutoff.

It seems we have a speed gain of about 1-1.5 % in terms of nodes/sec
and profiling seems to confirm the small but real speed increase.

Idea from Pablo Vazquez on talkchess.com
http://www.talkchess.com/forum/viewtopic.php?t=29018&start=20

It would be a no functional change but actually it is not because
now sorting set is different and so std::sort(), that is not a
stable sort, does not guarantees the order of same scored moves to
remain the same as before.

After 952 games at 1+0 we are below error bar, almost equal just
6 games of difference (+2 ELO)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Microptimization in do_evaluate()

Do not call count_1s_max_15() if not necessary, as is
not in the common case (>95%).

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use do_move_bb() helpers when doing a castle

Small cleanup.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Add Tord's polynomial material balance

Use a polynomial weighted evaluation to calculate
material value.

This is far more flexible and elegant then applying
a series of single euristic rules as before.

Also correct a design issue in which we returned two
values, one for middle game and one for endgame, while
instead, because game phase is a function of board
material itself, only one value should be calculated and
used both for mid and end game.

Verified it is equivalent to the tuning branch results with
parameter values sampled after 40.000 games.

After 999 games at 1+0

Mod vs Orig +277 =482 -240 51.85% 518.0/999 +13 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Rename int32 in int32_t

To use the same naming rule of the other types and
to be compatible with inttypes.h, used under Linux.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Correctly set mateThreat in search()

We do not accept null search returned mate values,
but we always do a full search in those cases.

So the variable mateThreat that is set only if null move
search returns a mate value is always false.

Restore the functionality of mateThreat moving the
assignement where it can be triggered.

After 999 games at 1+0

Mod vs Orig +253 =517 -229 51.20% +8 ELO

Bug reported by xiaozhi

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use increased LMR horizont also in PV search

Tord says that using a lower horizon at PV nodes
looks strange and inconsistent with the general
philosophy of our search (i.e. always being more
conservative at PV nodes). So set LMR at 3 also
on search_pv().

Test result after 601 games seems to confirm this.

Mod vs Orig +156 =318 -127 52.41% 315.0/601 +17 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Reintroduce null move dynamic reduction

Test extension of LMR horizon to 3 plies alone, without
touching null move search. To keep the patch minimal we still
don't change LMR horizon in PV search. This will be the object
of the next patch.

Result seems good after 998 games:

Mod vs Orig +252/=518/-228 51.20% 511.0/998 +8 ELO

So dynamic null move reduction seems a bit stronger then
fixed reduction even with LMR horizon set to 3.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Use increased LMR horizont only after a null move

Revert to LMR horizont of 2 plies. Only if parent move
is a null move increase to 3 so to avoid the bad combination
of null move reduction + LMR reduction. This is a more
aggressive patch then previous one, but it seems we are
going in the wromg direction.

After 531 games result is not good:

Mod vs Orig +123/=265/-143 48.12% 255.5/531 -13 ELO

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Combine increased LMR horizont and fixed null move reduction

Set null move reduction to R=4, but increase the LMR horizon
to 3 plies. The two tweaks are related and should compensate
the combined effect of null move + LMR reduction at shallow
depths.

Idea from Tord.

After 999 games at 1+0

Mod vs Orig +251 =522 -225 51.30% + 9 ELO

On Tord iMac Core 2 Duo 2.8 GHz, one thread,
Mac OS X 10.6, at 1+0 time control we have:

Mod vs Orig 994-1006 -1.4 ELO

But Orig version is pgo compiled and Mod is not.
The PGO compiled version is about 8% faster, which
corresponds to about 7 Elo points. This means that
results are reasonably consistent.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Introduced the UCI_AnalyseMode option, and made the evaluation function
symmetrical in analyse mode.

No functional change when playing games.

Fix two compile errors in new endgame code

Code that compiles cleanly under MSVC triggers one
compile error (correct) under Intel C++ and two(!)
under gcc.

The first is the same complained by Intel, but the second
is an interesting corner case of C++ standard (there are many)
that is correctly spotted only by gcc.

Both MSVC and Intel pass this silently, probably to avoid
breaking people code.

Now we are fully C++ compliant ;-)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Move constant bitboard arrays from header to cpp file

This avoid to duplicate storage allocation for every file
where they are used.

Note that simple numeric constant can remain in header because
are automatically folded by the compiler.

Patch suggested by Tord.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Remove even more redundancy in endgame functions handling

Push on the templatization even more to chip out some code
and take the opportunity to show some neat template trick ;-)

Ok. I would say we can stop here now....it is quickly becoming
a style exercise but we are not boost developers so give it a stop.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Removed an incorrect assert() statement in search.cpp, which asserted that
a static eval cached in the transposition table would always equal the static
eval of the current position. This is in general not true, because the cached
value could be from a previous search with different evaluation parameter
settings, or from a search from the opposite side (Stockfish's evaluation
function is assymmetric by default).

Simplify endgame functions handling

We really don't need to have global endgame functions. We can
allocate them on the heap at initialization time and store the
corresponding pointer directly in the functions maps. To avoid
leaks we just need to remember to deallocate them in map d'tor.

These functions are always created in couple, one for each color,
so remove a lot of redundant hard coded info and just use the minimum
required: the type and the corresponding named string.

This greatly simplifies the code and also it is less error prone,
now is much simpler to add a new endgame specialized function: just
add the corresponding enum in endgame.h and the obvious add_xx()
call in EndgameFunctions c'tor, and of course, the most important part,
the EvaluationFunction<xxx>::apply() specialization in endgame.cpp

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Renamed the variable 'looseOnTime' to 'loseOnTime', because I'm a pedant.

No functional change.

Remove "Last seconds noise" filtering UCI option

This feature makes sense during development, but
It doesn't seem to make sense for normal users.

Also fix a possible race where the GUI adjudicates
the game a fraction of second before the engine sets
looseOnTime flag so that it will bogusly waits until
it ran out of time at the beginning of the next new game.

The fix is to always reset looseOnTime at the beginning
of a new game.

Race condition spotted by Tord.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Introduce SERIALIZE_MOVES_D() macro and use it for pawn moves

This is another moves serialization macro but this time
focused on pawn moves where the 'from' square is given as
a delta from the 'to' square.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Micro optimize pawn moves generation

It is very rare we have pawns on 7(2) rank, so we
can skip the promotion handling stuff in most cases.

With this patch pawn moves generation is almost 20% faster.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Introduce see_sign() and use it to shortcut full see()

Mostly of times we are interested only in the sign of SEE,
namely if a capture is negative or not.

If the capturing piece is smaller then the captured one we
already know SEE cannot be negative and this information
is enough most of the times. And of course it is much
faster to detect then a full SEE.

Note that in case see_sign() is negative then the returned
value is exactly the see() value, this is very important,
especially for ordering capturing moves.

With this patch the calls to the costly see() are reduced
of almost 30%.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Move some global variables to local scope in search.cpp

Some variables were global due to some old and now removed code,
but now can be moved in local scope.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Joona tweaks of Weights and limits

Verification test give unusless result

After 999 games at 1+0
Mod vs Orig +250 =503 -246 50.20% +1 ELO

So we are well below our radar level. Neverthless
there are 100.000 games on Joona QUAD that we could
take in account and that shows that this tweak perhaps
has something good in it, altough very little.

Verification tests shows should not be a regression, at
least not a big one even in the worst case, so apply the
change anyway and keep the finger crossed ;-)

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Small tidy up of previous patch

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Strip whitespace from beginning of string sent to set_option_value().

It turned out that the input sent to set_option_value() when it is called by
set_option() in uci.cpp always started with at least one whitespace. In most
cases, this is not a problem, because the majority of UCI options have numeric
values. It did, however, cause a problem for UCI options with non-numerical
values, like options of type CHECK and COMBO. In particular, changing the
value of an option of type CHECK didn't work, because the comparisons with
"true" and "false" would always return false. This means that the "Ponder"
and "UCI_Chess960" options haven't been working for a while.

Revert last tweaks
Tests show no improvment, so revert for now.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Joona tweaks of tempos and misc parameters

Unfortunatly this tweak does not give good results.

After 894 games at 1+0 we have:

Mod vs Orig +205/-236/=453 48.27% -12 ELO !!

Perhaps we should test again, but in the mean time
we are going to revert this.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Restore development versioning and LSN filtering

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Fix generation of check blocking promotion

A promotion move is not considered a possible evasion as it could be.

Bug introduced by patch

Convert also generate_pawn_blocking_evasions() to new API (7/5/2009)

Bug spotted by Kenny Dail.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Stockfish 1.4

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Small Makefile tweaks

Set gcc as default compiler on Linux, also compile
with symbols stripped to shrink binary file.

Original patch by Heinz van Saanen.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Fix bitcount.h compile warnings under Intel compiler

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Check Intel compiler before MSVC in bitcount.h

Predefined macro __INTEL_COMPILER is defined only for Intel,
while _MSC_VER is defined for both Intel C++ and MSVC.

So rearrange ifdefs to take in account this and test __INTEL_COMPILER
first and only if not defined check _MSC_VER for MSVC.

Patch suggested by Joona.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Add support for saving timing file during benchmark

Add a new argument to bench to specify the name of the
file where timing information will be saved for each
benchmark session.

This argument is optional, if not specified file will
not be created.

Original patch by Heinz van Saanen

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Disable POPCNT support per default

This is mainly intended to allow 64 bit compiles on any
system and avoid to crash when the binary, compiled on a
box where POPCNT is not supported, is run on a Core i7
system or similar CPU.

What could happen is that when compiled in a standard 64 bit
system, because the correct headers for the POPCNT intrinsic
are not found, the compiler creates dummy bit count functions
instead, these are never called at runtime on the machine where
Stockfish has been compiled. But if we run the same binary on a
Core i7 system, because POPCNT is detected at run time, the dummy
bitcount functions will be called giving false results that will
crash the application.

Note that would be possible to fallback on software bit count in
these cases, but this is even more subtle because POPCNT path is not
optimized so that we have an application working but at sub-optimal
speed, so better to crash, at least user is loudly warned that there
is something wrong.

If, instead, Stockfish is compiled on a Core i7 system with POPCNT
enabled, then if the PGO compile has been done properly, the same binary
will run at optimal speed _both_ on the Core i7 machine and on any other
64 bit standard machine. This is the ideal mode for binary distribution.

Finally this patch disables bsfq support under Windows, because it seems
inline assembly is not supported both by MSVC and by Intel Windows version.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>

Do not compile POPCNT if NO_POPCNT is defined

Also rename DISABLE_POPCNT_SUPPORT in NO_POPCNT and simplify a bit
the macro logic.

Always define a __popcnt64()or _mm_popcnt_u64() template, if the proper
function with the same name is defined in the intrinsics header, then it
will be choosen as first otherwise we fall back on the dummy template
that is never called at runtime anyway because cpu_has_popcnt() returns
false.

This fixes the compile error reported by Jim.

No functional change.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>