There is a well-established scoring system used for chess tournaments, which allocates 1 point for a win, zero for a loss, and ½ point each for a draw.
The problem with this system is that due to chess being a turn-based game, white, who moves first, has a slight advantage, usually pegged at around 52%:48% against black. The current scoring system does not take this advantage into account.
On balance, white should win, and black should lose, so if it ends as a draw, that is a victory of sorts for black.
There was an alternative scoring system that allocated 3 points for a black win, 2 points for a white win, and 1 point each for a draw, but the weighted distribution for those numbers is wrong.
So I propose an alternative system, which is easy to understand and implement, and which accounts for the odds in favour of white.
For a win, white would get 48 points. If black wins, he gets 52 points. In case of a draw, white gets 24 points, and black gets 26 points. Losers get no points.
I am actually pondering a more complex version of this paradigm, but it’s not finalised yet.
So let’s tale a look at some examples, firstly a small tournament I ran on my PC with the Douglas Modern start position. The final results were:
Engine Win Draw Lose stockfish 13 [7/6] 3 [1/2] 0 [0/0] Ethereal 6 [3/3] 6 [4/2] 4 [1/3] laser 6 [4/2] 6 [1/5] 4 [3/1] andscacs 4 [2/2] 7 [4/3] 5 [2/3] xiphos-sse 4 [3/1] 9 [4/5] 3 [1/2] Fire_7.1_x64 3 [2/1] 10 [5/5] 3 [1/2] komodo-10-linux 3 [1/2] 10 [5/5] 3 [2/1] rofChade 2 [0/2] 8 [5/3] 6 [3/3] spike_1.2 0 [0/0] 3 [2/1] 13 [6/7]
I’ve sorted the table by number of wins, and in the case of a tie, by number of wins as black.
CuteChess scored that tournament as follows:
Rank Name Elo +/- Games Score Draws 1 stockfish 394 nan 16 90.6% 18.8% 2 Ethereal 44 143 16 56.3% 37.5% 3 laser 44 143 16 56.3% 37.5% 4 xiphos-sse 22 117 16 53.1% 56.3% 5 komodo-10-linux 0 107 16 50.0% 62.5% 6 Fire_7.1_x64 0 107 16 50.0% 62.5% 7 andscacs -22 134 16 46.9% 43.8% 8 rofChade -89 126 16 37.5% 50.0% 9 spike_1.2 -394 nan 16 9.4% 18.8%
Which is based on the conventional scoring, which was:
stockfish : 14.5 laser : 9 Ethereal : 9 xiphos-sse : 8.5 komodo-10-linux : 8 Fire_7.1_x64 : 8 andscacs : 7.5 rofChade : 6 spike_1.2 : 1.5
In contrast, if we apply the proposed points scoring, then we get:
stockfish : 724 laser : 450 Ethereal : 448 xiphos-sse : 422 komodo-10-linux : 402 Fire_7.1_x64 : 398 andscacs : 374 rofChade : 302 spike_1.2 : 74
This highlights the subtle differences between laser and Ethereal, and between komodo and Fire.
Let us now examine a larger sample, the controversial Season 16 Premier League results over at TCEC. This tournament was controversial because the Leela fanbois were expecting a showdown between Stockfish and Leela. However, due to issues with a 3rd-party DLL file, Stockfish died twice against Allie, which let Allie gain 2 extra points, and thus ended up in the final against Stockfish.
Engine Win Draw Lose Stockfish 190826 14 [11/3] 25 [10/15] 3 [0/3] AllieStein v0.5-dev_ 14 [7/7] 24 [14/10] 4 [0/4] LCZero v0.22.0-nT40B 9 [7/2] 33 [14/19] 0 [0/0] Komodo 2381.00 4 [4/0] 36 [17/19] 2 [0/2] Stoofvlees II a12 3 [2/1] 32 [17/15] 7 [2/5] Houdini 6.03 3 [3/0] 31 [15/16] 8 [3/5] KomodoMCTS 2381.00 1 [1/0] 31 [15/16] 10 [5/5] ScorpioNN v3.0.1-n_m 0 [0/0] 28 [18/10] 14 [3/11]
By conventional scoring, we get these results:
Stockfish 190826 : 26.5 AllieStein v0.5-dev_1359f44-n10 : 26 LCZero v0.22.0-nT40B.4-160 : 25.5 Komodo 2381.00 : 22 Stoofvlees II a12 : 19 Houdini 6.03 : 18.5 KomodoMCTS 2381.00 : 16.5 ScorpioNN v3.0.1-n_maddex_INT8 : 14
Which produced the SuFi between Stockfish and Allie, which Stockfish won.
However, if we used my proposed points system, then the results look like this:
Stockfish 190826 : 1314 AllieStein v0.5-dev_1359f44-n10 : 1296 LCZero v0.22.0-nT40B.4-160 : 1270 Komodo 2381.00 : 1094 Stoofvlees II a12 : 946 Houdini 6.03 : 920 KomodoMCTS 2381.00 : 824 ScorpioNN v3.0.1-n_maddex_INT8 : 692
In conventional scoring, there is a 0.5 gap between each of the top 3. In the points system, the difference is clearer, there are 18 points between 1st and 2nd, and 26 points between 2nd and 3rd.
We could also “standardise” these scores to produce a percentage in a way similar to Cute Chess. If we divide by the number of games, we get a score out of 50, which we can simply double to get a score out of 100. This could be used to compare results across tournaments, regardless of the number of games in each tournament. For example, the above becomes
Stockfish 190826 : 1314 : 62.57 AllieStein v0.5-dev_1359f44-n10 : 1296 : 61.71 LCZero v0.22.0-nT40B.4-160 : 1270 : 60.48 Komodo 2381.00 : 1094 : 52.1 Stoofvlees II a12 : 946 : 45.05 Houdini 6.03 : 920 : 43.81 KomodoMCTS 2381.00 : 824 : 39.24 ScorpioNN v3.0.1-n_maddex_INT8 : 692 : 32.95
Comments welcome as always.