Douglas Modern Chess, Season 2 Final

So I figured out the flaw in my scoring system. Because it uses the length of the game as part of the calculation, an engine that manages to get early draws (perhaps by threefold repetition) can game the system and get a higher score.

So I modified the system so that drawn games result in no score for either side. Also, computer chess often has arbitrated decisions to avoid tiresome endgames, which results in abnormally-shorter games, further complicating the concept of using game length as a determinant.

So the revised scoring for the semifinal was:

Raubfisch X41d3._sl         : 4.71
Stockfish 11                : 0.9
Zeus 4.1.7 M                : 0
Raubfisch_ME262_GTZ20d3._sl : -5.01

This does show a rather dramatic difference between the two Raubfisch variants, as well as between winner and second place. So I ran the final between the top two above, 10 games, time control 30 minutes plus 30 seconds a move.

The results were disappointing, of the 10 games, 9 were drawn, and those that I saw were rather boring, so I’m not going to post them. I will post the only one which had a result, and that was a mate as well. So Raubfisch X41d3._sl is crowned the winner with a score of 5.5 to 4.5 by conventional scoring.

My scoring was

Raubfisch X41d3._sl : 2.09
Stockfish 11        : -2.26

and the points system awarded

Engine                    Points  Percentage
Raubfisch X41d3._sl     : 274   : 54.8
Stockfish 11            : 224   : 44.8

The SuperFinals at TCEC usually have a lot more games, because most of them are drawn, which is very tedious.

Herewith the winning game. Stockfish was outplayed somewhere in the middle. The trapped bishop around move 48 led to disastrous loss of material.

Continue reading

Douglas Modern Chess, Season 2 Semifinal

Herewith the results of the semifinal, which was between Stockfish and three derivatives. I was actually expecting the the Raubfisch variants to come out on top, based on their performance in the heats, but it was not to be. Perhaps the longer time control affected things.

The results revealed a flaw in my own scoring system, which was supposed to prevent confusion about results, as shown below. First and fourth are mostly clear, but the 2nd and 3rd spots are more problematic. So I need to rethink my scoring before deciding who makes it into the finals.

Summary:

Games: 12; Draws: 9, DrawPercentage: 75 %
Whitewins: 0; Blackwins: 3, Draws: 9

Longer time control, and stronger engines, means more draws. Curious that white was unable to win.

Conventional scoring:

Raubfisch X41d3._sl         : 4
Zeus 4.1.7 M                : 3
Stockfish 11                : 3
Raubfisch_ME262_GTZ20d3._sl : 2

Results table:

Engine                 Win     Draw    Lose
Raubfisch X41d3._sl    2 [0/2] 4 [3/1] 0 [0/0]
Stockfish 11           1 [0/1] 4 [2/2] 1 [1/0]
Zeus 4.1.7 M           0 [0/0] 6 [3/3] 0 [0/0]
Raubfisch_ME262_GTZ2   0 [0/0] 4 [1/3] 2 [2/0]

My scoring system which takes black/white and number of moves into account:

Zeus 4.1.7 M                : 14.82
Raubfisch X41d3._sl         : 11.61
Raubfisch_ME262_GTZ20d3._sl : 6.07
Stockfish 11                : 5.94

The problem with these scores is that an engine that failed to win, despite never losing, should not rank higher than an engine that did win (twice) as well as never losing. Hence I need to rethink.

My points-based scoring system, which takes black/white into account:

Engine                        Points  Percentage
Raubfisch X41d3._sl         : 202   : 67.33 %
Stockfish 11                : 152   : 50.67 %
Zeus 4.1.7 M                : 150   : 50 %
Raubfisch_ME262_GTZ20d3._sl : 102   : 34 %

These scores are better.

Cutechess scoring:

Rank Name                         Elo +/- Games Score Draws
1    Raubfisch X41d3._sl          120 162 6     66.7% 66.7%
2    Zeus 4.1.7 M                   0   0 6     50.0% 100.0%
3    Stockfish 11                   0 173 6     50.0% 66.7%
4    Raubfisch_ME262_GTZ20d3._sl -120 162 6     33.3% 66.7%

Cutechess also appears to rank Zeus above Stockfish, but it may just be sorting alphabetically based on score, without taking anything else into account.

So you can see the different scoring systems produce conflicting results, which I need to resolve before running the final.

Here are the games themselves. Time control was 20 minutes plus 20 seconds per move.

The only mate was between Raubfisch X41d3._sl and Zeus 4.1.7 M, the rest were decided by adjudication.

Continue reading

Douglas Modern Chess, Season 2

Now that Season 18 is mostly done over at https://tcec-chess.com/, I thought I’d run another round of the Modern Chess, using updated versions of the engines.

I did try to add LC0 and SugarNN, but could not get them to function properly. It may be because my GPU is rather old, or some back-end dependency was not available.

I did some preliminary testing, to give some newcomers a chance to qualify. The following engines did not survive. Ethereal was a surprise, given how well it did compared to Fire on TCEC.

Ethereal 12
Defenchess 2.3 dev
Godel 7.0
Raven 0.8
zct-032500-64-ja

The following engines did make the cut. In essence it’s a bit of a Stockfish bloodfest, since apart from Stockfish itself, there are three derivatives.

Stockfish 11 (not the latest available, but latest official release)
Raubfisch_ME262_GTZ20d3._sl (based on Stockfish)
Raubfisch X41d3._sl (based on Stockfish)
Zeus 4.1.7 M (based on Stockfish)
Komodo 11
xiphos-0.6-linux-sse  (not updated since Season 1)
Fire_7.1_x64 (not updated since Season 1)
rofChade 2.203

The results from Heat 1 will follow.

 

 

Season 1 Final

For the first Douglas Modern chess final, I decided on a 10-game match between the two contenders, with time control set at 30 minutes plus 15 seconds per move. I was worried that, sans opening books, they would simply play the same games over and over, but it turns out that they’re smarter than that.

There were more draws this round, given that the engines were as equal in strength as I could get, without the neural network ones. I have not succeeded in getting those (eg Lc0, AllieStein, Stoofvlees, etc) to work yet.

Of the ten games, Stockfish won half (as both black and white) and drew the rest. So we declare Stockfish the official champion.

The games in general were longer, some hit the 50 move rule, and there were those annoying shuffle endings as well. Anyway, here’s the numbers and the games.

Engine               Win     Draw    Lose
stockfish 111119 64  5 [3/2] 5 [2/3] 0 [0/0]
xiphos-0.6-linux-sse 0 [0/0] 5 [3/2] 5 [2/3]

Conventional scoring:

stockfish 111119 64  : 7.5
xiphos-0.6-linux-sse : 2.5

Games: 10; Draws: 5, DrawPercentage: 50 %
Whitewins: 3; Blackwins: 2, Draws: 5

Cute Chess scoring:

Score of stockfish 111119 64 vs xiphos-0.6-linux-sse: 5 - 0 - 5 [0.750]
Elo difference: 190.8 +/- 162.0, LOS: 98.7 %, DrawRatio: 50.0 %

10 of 10 games finished.

Points

stockfish 111119 64  : 374 : 74.8 %
xiphos-0.6-linux-sse : 124 : 24.8 %

Here are the games themselves. The “Mate of the Match” award goes to the 4th game.

I’m going to take a break from this now, though may post results on Github or write a paper, depending on how things go with work. Must finish the Giza papers.

Continue reading

Season 1 Semi finals

This was a double-round-robin between the top three CPU engines.

Engine               Win     Draw    Lose
xiphos-0.6-linux-sse 1 [0/1] 6 [4/2] 1 [0/1]
Ethereal 20191110    0 [0/0] 3 [1/2] 5 [3/2]
stockfish 111119 64  5 [3/2] 3 [1/2] 0 [0/0]

Conventional scoring:

stockfish 111119 64  : 6.5
xiphos-0.6-linux-sse : 4
Ethereal 20191110    : 1.5

Games: 12; Draws: 6, DrawPercentage: 50 %
Whitewins: 3; Blackwins: 3, Draws: 6

Cute Chess scoring:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 stockfish 111119 64           255     286       8   81.3%   37.5%
   2 xiphos-0.6-linux-sse            0     125       8   50.0%   75.0%
   3 Ethereal 20191110            -255     286       8   18.8%   37.5%

12 of 12 games finished.

Points

stockfish 111119 64  : 324 : 81 %
xiphos-0.6-linux-sse : 200 : 50 %
Ethereal 20191110    : 76  : 19 %

So Ethereal 20191110 drops out, and the other two go through to the final, which will be the longer game format.

Here are the games themselves. The “Mate of the Match” award go to the Ethereal 20191110 vs xiphos-0.6-linux-sse game (second one below).

Continue reading

Season 1 Round 4 Heat 1

This was supposed to be a double-round but I had to restart it because I selected the wrong version of Ethereal. And on the restart, forgot to make it a double round. Curiously, when I stopped it, Stockfish had a less-than-perfect score. So on the one hand, it had some losses or draws, and on the other hand, at least the engines are not simply playing the same game over and over. I will check the end boards when all is done and look for duplicates. So far there have not been any.

Anyway, here’s the results.

Engine               Win     Draw    Lose
Ethereal 20191110    1 [1/0] 3 [1/2] 2 [1/1]
stockfish 111119 64  6 [3/3] 0 [0/0] 0 [0/0]
Defenchess_2.2       0 [0/0] 2 [2/0] 4 [1/3]
xiphos-0.6-linux-sse 1 [1/0] 3 [1/2] 2 [1/1]

Conventional scoring:

stockfish 111119 64  : 6
Ethereal 20191110    : 2.5
xiphos-0.6-linux-sse : 2.5
Defenchess_2.2       : 1

Games: 12; Draws: 4, DrawPercentage: 33.33 %
Whitewins: 5; Blackwins: 3, Draws: 4

Cute Chess scoring:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 stockfish 111119 64           inf     nan       6  100.0%    0.0%
   2 xiphos-0.6-linux-sse          -58     226       6   41.7%   50.0%
   3 Ethereal 20191110             -58     226       6   41.7%   50.0%
   4 Defenchess_2.2               -280     nan       6   16.7%   33.3%

12 of 12 games finished.

Points

stockfish 111119 64  : 300 : 100 %
Ethereal 20191110    : 124 : 41.33 %
xiphos-0.6-linux-sse : 124 : 41.33 %
Defenchess_2.2       :  48 : 16 %

So Defenchess_2.2 drops out, and the other three go through to the next round.
Here are the games themselves. The “Mate of the Match” award goes to  the Ethereal 20191110 vs stockfish 111119 64 match (first one below).

Continue reading

Season 1 Round 3 Heat 2

Another somewhat unexpected result … I was expecting critter to finish last.

Engine                 Win     Draw    Lose
critter-16a-64bit      2 [2/0] 0 [0/0] 4 [1/3]
xiphos-0.6-linux-sse   2 [1/1] 2 [2/0] 2 [0/2]
laser                  1 [1/0] 1 [0/1] 4 [2/2]
stockfish 111119 64    5 [3/2] 1 [0/1] 0 [0/0]

Games: 12; Draws: 2, DrawPercentage: 16.67
Whitewins: 7; Blackwins: 3, Draws: 2

Conventional scoring:

stockfish 111119 64  : 5.5
xiphos-0.6-linux-sse : 3
critter-16a-64bit    : 2
laser                : 1.5

Cute Chess scoring:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 stockfish 111119 64           417     nan       6   91.7%   16.7%
   2 xiphos-0.6-linux-sse            0     271       6   50.0%   33.3%
   3 critter-16a-64bit            -120     nan       6   33.3%    0.0%
   4 laser                        -191     nan       6   25.0%   16.7%

12 of 12 games finished.

Points

stockfish 111119 64  : 274 : 91.33 %
xiphos-0.6-linux-sse : 148 : 49.33 %
critter-16a-64bit    : 96  : 32 %
laser                : 74  : 24.67 %

So critter-16a-64bit and laser drop out, and the other two go through to the final round robin.

The line-up for the final round robin will be:

Defenchess_2.2
Ethereal 20191110
stockfish 111119 64
xiphos-0.6-linux-sse

This may produce more draws than the previous rounds.

Here are the games themselves. The “Mate of the Match” award goes to the xiphos-0.6-linux-sse vs critter-16a-64bit game (second one below).

Continue reading

Season 1 Round 3 Heat 1

This result was the first one that surprised me … I was not expecting Defenchess to come out on top. In fact I thought it was going to be eliminated.

Engine            Win     Draw    Lose
Defenchess_2.2    3 [2/1] 2 [0/2] 1 [1/0]
andscacs          1 [1/0] 3 [2/1] 2 [0/2]
Ethereal 20191110 2 [0/2] 3 [2/1] 1 [1/0]
Fire_7.1_x64      1 [1/0] 2 [1/1] 3 [1/2]

Conventional scoring:

Defenchess_2.2    : 4
Ethereal 20191110 : 3.5
andscacs          : 2.5
Fire_7.1_x64      : 2

Games: 12; Draws: 5, DrawPercentage: 41.67 %
Whitewins: 4; Blackwins: 3, Draws: 5

Cute Chess scoring:

Rank Name              Elo +/- Games Score Draws
   1 Defenchess_2.2    120 333 6     66.7% 33.3%
   2 Ethereal 20191110  58 226 6     58.3% 50.0%
   3 andscacs          -58 226 6     41.7% 50.0%
   4 Fire_7.1_x64     -120 333 6     33.3% 33.3%

12 of 12 games finished.

Points:

Defenchess_2.2    : 200 : 66.67 %
Ethereal 20191110 : 178 : 59.33 %
andscacs          : 122 : 40.67 %
Fire_7.1_x64      :  98 : 32.67 %

So andscacs and  Fire_7.1_x64  drop out, and the other two go through to the next round.
Here are the games themselves. The “Mate of the Match” award goes to the andscacs vs Fire_7.1_x64 game (last one below).

Continue reading

Line-up for round 3

The line-up for round three will be these 8 engines, in two heats. I resorted the list alphabetically (ASCII-wise) to avoid the overload of strong engines in one heat. We may lose two from heat 1 and one from heat 2.

Defenchess_2.2
Ethereal
Fire_7.1_x64
andscacs

critter-16a-64bit
laser
stockfish 111119 64
xiphos-0.6-linux-sse

 

Season 1 Round 2 Heat 3

The heavyweight round.

Engine               Win     Draw    Lose
laser                4 [2/2] 2 [1/1] 2 [1/1]
texel64              0 [0/0] 2 [1/1] 6 [3/3]
xiphos-0.6-linux-sse 3 [2/1] 4 [2/2] 1 [0/1]
arasanx-64           2 [1/1] 0 [0/0] 6 [3/3]
stockfish 111119 64  6 [3/3] 2 [1/1] 0 [0/0]

Conventional scoring:

stockfish 111119 64  : 7
laser                : 5
xiphos-0.6-linux-sse : 5
arasanx-64           : 2
texel64              : 1

Games: 20; Draws: 5, DrawPercentage: 25 %
Whitewins: 8; Blackwins: 7, Draws: 5

Cute Chess scoring:

Rank Name                Elo +/- Games Score Draws
   1 stockfish 111119 64 338 nan 8     87.5% 25.0%
   2 xiphos-0.6-linux-sse 89 190 8     62.5% 50.0%
   3 laser                89 261 8     62.5% 25.0%
   4 arasanx-64         -191 nan 8     25.0%  0.0%
   5 texel64            -338 nan 8     12.5% 25.0%

20 of 20 games finished.

Points

stockfish 111119 64  : 350 : 87.5 %
laser                : 250 : 62.5 %
xiphos-0.6-linux-sse : 248 : 62 %
arasanx-64           : 100 : 25 %
texel64              : 50  : 12.5 %

So arasanx-64 and  texel64 drop out, and the other two go through to the next round.
Here are the games themselves. The “Mate of the Match” award goes to the xiphos-0.6-linux-sse vs arasanx-64 game.

Continue reading