Rethinking chess scoring

There is a well-established scoring system used for chess tournaments, which allocates 1 point for a win, zero for a loss, and ½ point each for a draw.

The problem with this system is that due to chess being a turn-based game, white, who moves first, has a slight advantage, usually pegged at around 52%:48% against black. The current scoring system does not take this advantage into account.

On balance, white should win, and black should lose, so if it ends as a draw, that is a victory of sorts for black.

There was an alternative scoring system that allocated 3 points for a black win, 2 points for a white win, and 1 point each for a draw, but the weighted distribution for those numbers is wrong.

So I propose an alternative system, which is easy to understand and implement, and which accounts for the odds in favour of white.

For a win, white would get 48 points. If black wins, he gets 52 points. In case of a draw, white gets 24 points, and black gets 26 points. Losers get no points.

I am actually pondering a more complex version of this paradigm, but it’s not finalised yet.

So let’s tale a look at some examples, firstly a small tournament I ran on my PC with the Douglas Modern start position. The final results were:

Engine                          Win         Draw            Lose
stockfish                       13 [7/6]    3 [1/2]         0 [0/0]
Ethereal                        6 [3/3]     6 [4/2]         4 [1/3]
laser                           6 [4/2]     6 [1/5]         4 [3/1]
andscacs                        4 [2/2]     7 [4/3]         5 [2/3]
xiphos-sse                      4 [3/1]     9 [4/5]         3 [1/2]
Fire_7.1_x64                    3 [2/1]     10 [5/5]        3 [1/2]
komodo-10-linux                 3 [1/2]     10 [5/5]        3 [2/1]
rofChade                        2 [0/2]     8 [5/3]         6 [3/3]
spike_1.2                       0 [0/0]     3 [2/1]         13 [6/7]

I’ve sorted the table by number of wins, and in the case of a tie, by number of wins as black.

CuteChess scored that tournament as follows:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 stockfish                     394     nan      16   90.6%   18.8%
   2 Ethereal                       44     143      16   56.3%   37.5%
   3 laser                          44     143      16   56.3%   37.5%
   4 xiphos-sse                     22     117      16   53.1%   56.3%
   5 komodo-10-linux                 0     107      16   50.0%   62.5%
   6 Fire_7.1_x64                    0     107      16   50.0%   62.5%
   7 andscacs                      -22     134      16   46.9%   43.8%
   8 rofChade                      -89     126      16   37.5%   50.0%
   9 spike_1.2                    -394     nan      16    9.4%   18.8%

Which is based on the conventional scoring, which was:

stockfish           : 14.5
laser               : 9
Ethereal            : 9
xiphos-sse          : 8.5
komodo-10-linux     : 8
Fire_7.1_x64        : 8
andscacs            : 7.5
rofChade            : 6
spike_1.2           : 1.5

In contrast, if we apply the proposed points scoring, then we get:

stockfish               : 724
laser                   : 450
Ethereal                : 448
xiphos-sse              : 422
komodo-10-linux         : 402
Fire_7.1_x64            : 398
andscacs                : 374
rofChade                : 302
spike_1.2               : 74

This highlights the subtle differences between laser and Ethereal, and between komodo and Fire.

Let us now examine a larger sample, the controversial Season 16 Premier League results over at TCEC. This tournament was controversial because the Leela fanbois were expecting a showdown between Stockfish and Leela. However, due to issues with a 3rd-party DLL file, Stockfish died twice against Allie, which let Allie gain 2 extra points, and thus ended up in the final against Stockfish.

Engine                          Win         Draw            Lose
Stockfish 190826                14 [11/3]   25 [10/15]      3 [0/3]
AllieStein v0.5-dev_            14 [7/7]    24 [14/10]      4 [0/4]
LCZero v0.22.0-nT40B            9 [7/2]     33 [14/19]      0 [0/0]
Komodo 2381.00                  4 [4/0]     36 [17/19]      2 [0/2]
Stoofvlees II a12               3 [2/1]     32 [17/15]      7 [2/5]
Houdini 6.03                    3 [3/0]     31 [15/16]      8 [3/5]
KomodoMCTS 2381.00              1 [1/0]     31 [15/16]      10 [5/5]
ScorpioNN v3.0.1-n_m            0 [0/0]     28 [18/10]      14 [3/11]

By conventional scoring, we get these results:

Stockfish 190826                : 26.5
AllieStein v0.5-dev_1359f44-n10 : 26
LCZero v0.22.0-nT40B.4-160      : 25.5
Komodo 2381.00                  : 22
Stoofvlees II a12               : 19
Houdini 6.03                    : 18.5
KomodoMCTS 2381.00              : 16.5
ScorpioNN v3.0.1-n_maddex_INT8  : 14

Which produced the SuFi between Stockfish and Allie, which Stockfish won.

However, if we used my proposed points system, then the results look like this:

Stockfish 190826                : 1314
AllieStein v0.5-dev_1359f44-n10 : 1296
LCZero v0.22.0-nT40B.4-160      : 1270
Komodo 2381.00                  : 1094
Stoofvlees II a12               : 946 
Houdini 6.03                    : 920 
KomodoMCTS 2381.00              : 824 
ScorpioNN v3.0.1-n_maddex_INT8  : 692

In conventional scoring, there is a 0.5 gap between each of the top 3. In the points system, the difference is clearer, there are 18 points between 1st and 2nd, and 26 points between 2nd and 3rd.

We could also “standardise” these scores to produce a percentage in a way similar to Cute Chess. If we divide by the number of games, we get a score out of 50, which we can simply double to get a score out of 100. This could be used to compare results across tournaments, regardless of the number of games in each tournament. For example, the above becomes

Stockfish 190826                : 1314  : 62.57
AllieStein v0.5-dev_1359f44-n10 : 1296  : 61.71
LCZero v0.22.0-nT40B.4-160      : 1270  : 60.48
Komodo 2381.00                  : 1094  : 52.1
Stoofvlees II a12               : 946   : 45.05
Houdini 6.03                    : 920   : 43.81
KomodoMCTS 2381.00              : 824   : 39.24
ScorpioNN v3.0.1-n_maddex_INT8  : 692   : 32.95

Comments welcome as always.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *