Evaluation Relevance Reduction

external link:


Stockfish
development versions


latest versions of Stockfish

internal links:


PGN-Engine-Analyzer

analysis programme for generation of PGN files with engine variations and evaluating them using extensive data

modification of TCEC-PGN-files

in Notepad++ for Aquarium und Scid vs. PC

keyboard layout for chess annotation

with special symbols in Windows programs via AutoHotkey script

article links:
















Preliminary note:
This article was written in the original in German. The English translation comes from the author too. A sufficient quality of the translation cannot be guaranteed.

Update from December 2023:
New input field: 'minimum evaluation > 0 for White win = 100%
for calculating the WDL analyses #(.##)':
This value should empirically be around 2.35 when using the Stockfish engine; starting from this evaluation, positions are statistically considered 100% won and from the negative amount of this evaluation 100% lost; almost all WDL analyses in the programme are based on this value and gain significantly in precision through it.


Notes to the form:

Not all 13 input fields are to be filled in with parameters. If the programme misses information, various error messages are displayed in red and flashing.


Displaying the results requires permission to execute Javascript code in the browser.






Parameter laden Felder leeren Parameter speichern






 Eingaben 
 Zugbewertungen und Engine-WDL-Daten 
 hohe Zugbewertung 

 suboptimale Zugbewertung - schlechter als hohe Zugbewertung 




 Parameter der Anwender-Bewertungs-Relevanz-Reduktion 
 0,75-Partieresultat-Probabilität: Bewertung > 0 


 0,75-plus-Partieresultat-Probabilität: Bewertung und 0,75-plus 






Resultate:



Daten der Engine-WDL-Statistik
Bewertung Farbe Quelle Gewinn-% Remis-% Verlust-% Halbzug ⇐ -N ⇔ 0 ⇔ N ⇒

Bewertungs-Vergleich Anwender/Engine-BRR
probabilistisches Partieresultat = 0,75
Bewertungs-Relevanz = 0,50
probabilistisches Partieresultat = ?
Bewertungs-Relevanz = ?
Anwender-BRR Engine-WDL-BRR Engine-WDL-BRR ∅ Anwender-BRR Engine-WDL-BRR Engine-WDL-BRR ∅

Bewertungsdifferenzen
absolute Bewertungsdifferenz
relevante Bewertungsdifferenz bei Anwender-BRR
relevante Bewertungsdifferenz bei Engine-WDL-BRR

Probabilistische Partieresultate
hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
aus Sicht von Weiß
aus Sicht von Schwarz

Optimumquote der suboptimalen Zugbewertung
Anwender-BRR
Engine-WDL-BRR

Zugbewertungssymbole (‼ ! !? ?! ? ??) und Grenzwerte
extensives Schema: 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
Zugbewertungssymbol
Grenzwert ! ⇒ ‼
Grenzwert !? ⇒ !
Grenzwert ./. ⇒ !?
Grenzwert ?! ⇐ ./.
Grenzwert ? ⇐ ?!
Grenzwert ?? ⇐ ?

Zugbewertungssymbole (‼ ! !? ?! ? ??) und Grenzwerte
restriktives Schema: 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
Zugbewertungssymbol
Grenzwert ! ⇒ ‼
Grenzwert !? ⇒ !
Grenzwert ./. ⇒ !?
Grenzwert ?! ⇐ ./.
Grenzwert ? ⇐ ?!
Grenzwert ?? ⇐ ?

Stellungsbewertungssymbole und Grenzwerte bei Anwender/Engine-WDL-BRR
Grenzwert-Justierung an identischen Stellungsbewertungssektoren
9 Sektoren: 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
Stellungsbewertungssymbole
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
Grenzwerte
Anwender-BRR Engine-WDL-BRR
klarer/extremer Vorteil Weiß (+– ⇒ ++–)
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)
klarer/extremer Vorteil Schwarz (––+ ⇐ –+)

Stellungsbewertungssymbole und Grenzwerte bei Anwender/Engine-WDL-BRR
Grenzwert-Justierung an identischen Stellungsbewertungssektoren
7 Sektoren: 1/7 1/7 1/7 1/7 1/7 1/7 1/7
Stellungsbewertungssymbole
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
Grenzwerte
Anwender-BRR Engine-WDL-BRR
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)

Stellungsbewertungssymbole und Grenzwerte
bei Anwender-BRR: Grenzwert-Justierung an probabilistischen Partieresultaten
bei Engine-WDL-BRR: Grenzwert-Justierung an beiden Bewertungen
9 Sektoren
Stellungsbewertungssymbole
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
Grenzwerte
Anwender-BRR Engine-WDL-BRR
klarer/extremer Vorteil Weiß (+– ⇒ ++–)
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)
klarer/extremer Vorteil Schwarz (––+ ⇐ –+)

Stellungsbewertungssymbole und Grenzwerte
bei Anwender-BRR: Grenzwert-Justierung an probabilistischen Partieresultaten
bei Engine-WDL-BRR: Grenzwert-Justierung an beiden Bewertungen
7 Sektoren
Stellungsbewertungssymbole
Farbe hohe Bewertung suboptimale Bewertung
Anwender-
BRR
Engine-
WDL-BRR
Anwender-
BRR
Engine-
WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
Grenzwerte
Anwender-BRR Engine-WDL-BRR
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)

Extrem‑Bewertungen im positiven/negativen Bereich
Anwender-BRR Irrelevanz-Start-Bewertungen
Engine-WDL-BRR Start-Bewertungen mit Bewertungs-Relevanz = 0


Graph der Anwender-BRR:



Flot 0.8.3 – Copyright © 2007 - 2014 IOLA and Ole Laursen



Graph der Engine-WDL-BRR:



Flot 0.8.3 – Copyright © 2007 - 2014 IOLA and Ole Laursen



Move and position evaluations
together with NAG and Informator symbols


Chess players usually assess moves and positions on the board by using such symbols as follows:

brilliant move (‼) - NAG $ 3,
impressive move (!) - NAG $ 1,
attractive move (!?) - NAG $ 5,
questionable move (?!) - NAG $ 6,
weak move (?) - NAG $ 2,
miserable move (??) - NAG $ 4,

balanced position or draw (=) - NAG $ 10,
slight advantage for White (⩲ or +/=) - NAG $14,
slight advantage for Black (⩱ or =/+) - NAG $15,
moderate advantage for White (± or +/-) - NAG $16,
moderate advantage for Black (∓ or -/+) - NAG $17,
clear advantage for White (+-) - NAG $18,
clear advantage for Black (-+) - NAG $19,
extreme advantage for White (++-) - NAG $20,
extreme advantage for Black (--+) - NAG $21.

In addition the unclear position (∝) - NAG $13 should be mentioned. Actually it does not belong here because it just states that (supposedly) a position evaluation is not possible.

Reservation: the above descriptions for all these symbols are own creations and of course not binding. More information you will find here.

By the way 'NAG' means 'Numeric Annotation Glyphs'.

Such move and position assessments are quite practical: they waste little space and at a glance they reveal an evaluation range. Only the question arises, how such evaluations come about. Rule of thumb? Or is there a bit more accurate way? It would be really an advance if they were defined by any chess program evaluations in pawn units with which chess engines numerically express positional imbalances, that is positional advantages or disadvantages. But where to take such definitions if not steal? From which position evaluation of a chess engine one can, for example, speak of a slight advantage for White, from 0.10 pawn units or from 0.20 – apart from individual over- or understatements of the engines in the level of their evaluations? And how can an objective scale be found here?

In the further course of this article, several mathematically derived proposals with corresponding formulas will be submitted. Before that, however, various statistical and mathematical foundations have to be worked out.

Blunder relevance or more polite: evaluation relevance


Chess programs usually rate positions in hundredths of pawn units, and comparing the spit out variants of the number cruncher in one position reveals the evaluation difference and margin of error respectively between the best and an inferior variant.

But how relevant are actually faulty moves and their evaluation? Example: in a lost position after compensationless loss of the queen one gives away for no reason additionally another figure. The chess program will acknowledge this mishap with a much higher evaluation in favour of the opponent. But how relevant is such a difference between the new and the previous position evaluation in a practically already lost game? Objectively – in other words apart from subjective faulty moves of the opponent – in fact not at all! In all probability the bungler will not be able to save the game even without the recent faulty move with best play on both sides.

To take it to the extreme: What is the objective threshold at which a game can objectively be considered won or lost? Depends. One could ironically say: the more bungler the higher. The higher the evaluation, the sooner one can count on the fact that the advantage will no longer be messed up, whereby one may invest more confidence with today's chess computer programs than with Homo sapiens. And if you are dealing with a potential patzer, you should not, for example, throw in the towel prematurely in an apparent position of loss, as Kasparov formerly did in the 2nd match against Deep Blue in 1997.

Computer chess statistics


So what to do? One takes leave of the human bungler chess, turns to the strongest chess engines. Now, in principle, one can take two paths:

Engine WDL ERR:

Since mid-2020, Stockfish has provided win/draw/loss ("WDL" for win-draw-loss) assessment ratios alongside the actual evaluations. In the words of the Stockfish development team:

'UCI_ShowWDL
If enabled, show approximate WDL statistics as part of the engine output. These WDL numbers model expected game outcomes for a given evaluation and game ply for engine self-play at fishtest LTC conditions (60+0.6s per game).'

These WDL statistics or probabilities consider the course of the game insofar as they take the evaluated half-move into account. The formulas on which they are based can be found in the Stockfish program code ('win rate model'). The use of these statistics need not necessarily be limited to game analyses made with Stockfish, because this engine is the ultimate in positional analysis and therefore sets the standard for evaluation.

The highlight of the engine WDL statistics is not only the derivation of the evaluation relevancies and differences, optimum rates, probabilistic game results, move and position evaluation symbols including thresholds discussed in this article in a similar way as in the User ERR presented below. The average values resulting from it (cf. 'evaluation comparison user/engine ERR', line 3, columns 3 and 6 in the programme) can provide valuable clues for adjusting the parameters of the User ERR.

By the way, here is a little programme trick: Entering '0' (zero) in the two move evaluation fields and the 'half-move' field deletes the programme's internal memory for these two average values, which are retained when the parameters are loaded and saved.

Experiments after the Stockfish update of 22 June 2023 suggest that there is an absolute win/loss probability of 100% at Stockfish evaluations of approximately ±2.35. This value was used as the absolute relevance limit when calculating the WDL values in the form above. Values beyond this do not play any role for relevant WDL evaluation differences, WDL move evaluation symbols, WDL position evaluation symbols, etc. Furthermore, it was experimentally found that for evaluations of about 0.98, the probabilistic game result for white is about 0.75 and for evaluations of about 1.18, the probabilistic game result for white is about 0.875.

One small limitation should not go unmentioned. The automatic determination of the win, draw and loss percent values for an evaluation by entering the half-move leads to results that differ slightly from those calculated by Stockfish itself. The Stockfish code in the sub-programme 'win_rate_model' produces bizarre results outside of Stockfish. The variable 'v' there, which is related to the evaluation, must be multiplied by an unknown factor. A few comparisons of the percentages coming from the 'win_rate_model' code with the values produced directly by Stockfish suggest that this factor should be around 328. This number 328 appears explicitly as 'NormalizeToPawnValue' elsewhere in the Stockfish code. The calculation parameters contained in the latest update of the 'win_rate_model' of 22nd June 2023 produce results that can be reasonably reconciled with the values produced by the Stockfish engine if the factor to be multiplied is raised to 330.3.

User ERR:

The traditional variant presented in this article is to analyse the engine games by asking at what evaluation these programmes won their games - or not. The most meaningful games can probably be found on the Internet under 'TCEC' ('Top Chess Engine Championship') in the 'Superfinals'. Reasons: long thinking time, opponents were the two apparently best chess engines in each case and all position evaluations are step by step comprehensible.

From a statistical point of view, is there a kind of 'point of no return', an evaluation - apart from a concrete mate announcement, of course - from which the victory is undoubtedly settled and a drawing liquidation can no longer be considered? Theoretically no. The following table shows that chess engines were not able to convert in various TCEC Superfinals evaluations of up to 5.01 into victories. And nobody is able to say where the absolute evaluation limit for such evaluation errors - best game assumed in the following moves - can lie, since nobody is allowed to determine this limit with an infinite number of test games.

Even if such outliers happen very rarely, they prohibit the equation of any evaluation (even of 5.01 - as you could see) with win or loss. In other words, there is no evaluation generated 'point of no return'.

Now one must turn to the question, in which evaluations special average game results have to be located. Of particular interest seem to be evaluations where, once achieved, the average result of all games concerned amounts to 0.75 (from White's point of view). Such a value can be achieved by an equal number of wins and draws or by a number of defeats and a triple number of wins. For the sake of completeness, losses are also mentioned here, although they rarely occur when this special balance evaluation is reached.

For clarification first of all the results of the Superfinals in the Seasons 9 ff. in tabular form as well as the FIDE Candidates' Tournament 2018 with the evaluations of Stockfish 8 with a 30 seconds thinking time to be found on 'www.chessbomb.com'.

tournament analysis
engine
wins evaluation e=0.75
at average
game result
0.75(:0.25)
maximum
evaluation e>0.75
without win
average
game result
at maximum
evaluation e>0.75
without win
alternative
pair of values:
evaluation e>0.75 /
average
game result
≅ 0.875
9 Stockfish 16 1.75 0.62
10 Houdini 15 2.00 0.66
12 Stockfish 29 1.48 0.52
13 Stockfish 16 2.79 1.14
14 Stockfish 10 2.42 1.45
FIDE
Candidates'
Tournament 2018
Stockfish 8 20 0.67 16.68 0.9762 2.39 / 0.8800
Superfinal
TCEC Nr. 16
Stockfish
19092522
14 1.24 3.33 0.9667 1.65 / 0.8684
Superfinal
TCEC Nr. 16
AllieStein
v0.5-dev_7b41f8c-n11
5 3.96 8.18 0.9167 8.03 / 0.8571
Superfinal
TCEC Nr. 17
LCZero
v0.24-sv-t60-3010
17 1.34 5.01 0.9722 1.89 / 0.8810
Superfinal
TCEC Nr. 17
Stockfish
20200407DC
12 1.49 2.76 0.9615 1.89 / 0.8750
Superfinal
TCEC Nr. 18
Stockfish
202006170741
23 0.87 3.74 0.9792 1.41 / 0.8710
Superfinal
TCEC Nr. 18
LCZero
v0.25.1-svjio-t60-3972-mlh
16 0.69 2.12 0.9706 1.57 / 0.8636


The above analysis explained using the example of Superfinal No. 17 and the winning engine LCZero v0.24-sv-t60-3010:

LCZero won 17 games. 83 games thus ended in draws or losses for LCZero. And in all these games is now the 17th lowest evaluation to look for which LCZero indicated in his favour. Mind you, a positive evaluation that could not be realized to win. So you count the 17 highest evaluations and the lowest of them is 1.26. Therefore 17 draws or losses exist in which in each case an evaluation of at least 1.26 is encountered. In other words: LCZero achieved in 34 games an evaluation of 1.26 and in 17 games respectively, the result was a draw/loss or 1-0.

But a small complication is included in these numbers: LCZero had to acknowledge defeat in the 16th game, although it had already spat out an evaluation of 1.89 and 1.89 is situated above the previously determined evaluation threshold of 1.26. Because of this '0'- result, it is not possible to determine an average game result of 0.75 based on the actual results. Because this amounts to

( 17 1 ) + ( 16 0.5 ) + ( 1 0 ) 34 = 0.7353 {(17 cdot 1) + (16 cdot 0.5) + (1 cdot 0)} over {34} = 0.7353


instead of 0.75. If the real numbers are stubborn, mathematics must intervene. The formula for the average game results between 0.5 and 0.75 on the y-coordinate axis is a linear function and is

average game result = ( 2 e =0.75 ) + evaluation 4 e =0.75 average game result = {(2 cdot e_"=0.75") + evaluation} over {4 cdot e_"=0.75"}


Sought-after is the ominous 0.75 game result evaluation (abbreviated 'e=0.75'). So it must be transformed:

e =0.75 = evaluation ( 4 average game result ) 2 e_"=0.75" = evaluation over {(4 cdot average game result) - 2}


In the present LCZero case, therefore, it is to be calculated:

e =0.75 = 1.26 ( 4 0.7353 ) 2 = 1.3387 e_"=0.75" = 1.26 over {(4 cdot 0.7353) - 2} = 1.3387


The result is situated slightly above the actual e=0.75, which was to be expected.

In September 2017 the engine Houdini 6 was released. You can read the following on this website:

'The evaluations have again been calibrated to correlate directly with the win expectancy in the position. A +1.00 pawn advantage gives a 75% chance of winning the game against an equal opponent at blitz time control. At +1.50 the engine will win 90% of the time, and at +2.50 about 99% of the time. To win nearly 50% of the time, you need and advantage of about +0.60 pawn.'

Houdini kept his word. In the TCEC Superfinal Season 10 against Komodo, Houdini gained 15 victories and in the 15 draws or losses with the highest evaluations of Houdini, the minimum evaluation was 0.57. An almost precise landing.

The above table allows the cautious conclusion to be drawn that the Stockfish versions used since the TCEC Superfinal 13 give significantly higher evaluations than their previous versions. One thing should not fall by the wayside when interpreting these results: Stockfish 10 was given a 'contempt' of 0.24 (Stockfish 9: 0.20), which should raise the respective evaluation. It therefore seems obvious to subtract this contempt margin from the evaluation thresholds listed in the table for one's own analysis purposes.A tip, however, is allowed: Analyses with Stockfish should only be carried out with ‘contempt’ switched off in order not to artificially drive up the evaluations.

Finally it should be noted that the TCEC website has recently been updated with win draw probabilities and locates for the engine Stockfish the e=0.75 with around 1.56 (Superfinal 17) or even 1.91 (Superfinal 18). In view of the previous table a quite plausible value. However, it is critical that there only percentages for 'W' (win?) and 'D' (draw? - 100% - 'W' percentage) are given, but that the loss probability is swept under the carpet. The TCEC-e=0.75 of 1.56 assumed above is necessarily based on the assumption that 'D' also includes the probability of loss.

Mathematical evaluation relevance reduction


Let's note: on the way of evaluation between 0.00 and infinity (∞) its relevance decreases continuously. Starting at 100% in the case of a 0.00 evaluation over 50% at the e=0.75 evaluation (the TCEC value of 1.56 is assumed below as an example for clarification) it ends at infinity with 0%.

One example: the evaluation for the best move is 2.00. Now a mishap happens: a faulty move because of a figure loss with an evaluation of -3.00. The absolute evaluation difference is -5.00. How relevant is this figure loss? Obviously less than -5.00.

In detail:
between the evaluations 2.00 and 1.56 the relevance of less than 50% is growing continuously;
at 1.56 it should amount to 50%; for this is the mean value between 100% and 0%; furthermore, the probabilistic game result of 0.75 at the evaluation 1.56 is the mean value between 0.5 at the 0.00 evaluation and 1 at a maximum engine evaluation;
at 0.00 the relevance reaches its maximum value of 100%;
-1.56 is again resulting in 50% and
at -3.00 it ends with a value well below 50%.

The sum of these percentages would now be interesting. By way of calculation doable, but somewhat complicated. The mathematical adepts have certainly long recognized that this up and down would have to be expressed with a mathematical function, for which the following applies: The more one moves away from the y-axis on both sides, the smaller the ordinates, the respective evaluation relevance amounts along these points on the x-axis, until they finally approach the x-axis on both sides at infinity as asymptotes. The x-axis thus represents the evaluations (on the part of an engine), the y-axis the evaluation relevance amounts.

At this point an exponential function of the general form f(x) = a^(x*b) was proposed in the first article version. Such exponential functions have the advantage that the point P(0;1) is always fulfilled and they approach the x-axis in (positive) infinity. The disadvantage of such a function, however, includes the fact that it can only determine 2 points, the already mentioned point P(0;1) and the point P(e=0.75;0.5). However, a further definition point P(e>0.75;r>0.75) would be urgently needed for better precision, for example to be able to capture the highest TCEC engine evaluations without win and the corresponding game results which are far above 0.75.

Solution: 3 equations for 3 negative and 3 positive sectors along the x-axis (x stands for engine evaluation):

1st positive and negative sector:

y Rel = 1 | x | 2 e =0.75 y_Rel = 1 - lline x rline over {2 cdot e_"=0.75"} linear equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}
 


2nd positive and negative sector:

y Rel = 2 e =0.75 r >0.75 2 | x | r >0.75 e >0.75 + | x | 2 ( e >0.75 e =0.75 ) y_Rel = -{{2 cdot e_"=0.75" cdot r_">0.75" - 2 cdot lline x rline cdot r_">0.75" - e_">0.75" + lline x rline} over {2 cdot left(e_">0.75" - e_"=0.75" right)}} linear equation with 𝔻 {x | -e>0.75 ≤ x ≤ -e=0.75 or e=0.75 ≤ x ≤ e>0.75}
 


3rd positive and negative sector:

y Rel = r >0.75 | x | e >0.75 y_Rel = {r_">0.75"}^{lline x rline over e_">0.75"} exponential equation with 𝔻: {x | -∞ < x ≤ -e>0.75 or e>0.75 ≤ x < ∞}
 


The evaluation relevance functions are set. But how is the really relevant evaluation difference calculated over a certain distance on the x-axis, for example between 2.00 and -3.00? The evaluation relevance function only returns the respective y-value of a special point along the x-axis. As ingenious as it is simple: via integral function. All values between the x-axis and the function curve summed, i.e. the area there, between the best evaluation (for example 2.00) and the inferior evaluation (for example -3.00) represent the definite integral of this function - i.e. the relevant evaluation difference.

To calculate the integral, the antiderivatives of the evaluation relevance functions are required. These are as follows:

1st positive and negative sector:

y Int = x x | x | 4 e =0.75 y_"Int" = x-{x cdot lline x rline}over {4 cdot e_"=0.75"} quadratic equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}
 


2nd positive and negative sector:

y Int = x ( 4 e =0.75 r >0.75 2 | x | r >0.75 2 e >0.75 + | x | ) 4 ( e >0.75 e =0.75 ) y_"Int" = -{x cdot left (4 cdot e_"=0.75" cdot r_">0.75" - 2 cdot lline x rline cdot r_">0.75" - 2 cdot e_">0.75" + lline x rline right ) over {4 cdot left (e_">0.75" - e_"=0.75" right)}} quadratic equation with 𝔻 {x | e=0.75 ≤ x ≤ e>0.75}
 


3rd positive sector:

y Int = e >0.75 r >0.75 x e >0.75 ln ( r >0.75 ) y_"Int" = e_">0.75" cdot {r_">0.75"}^{x over e_">0.75"} over {ln(r_">0.75")} exponential equation with 𝔻 {x | e>0.75 ≤ x < ∞}
 


3rd negative sector:

y Int = e >0.75 r >0.75 x e >0.75 ln ( r >0.75 ) y_"Int" = -{e_">0.75" over {{r_">0.75"}^{x over e_">0.75"} cdot ln(r_">0.75")}} exponential equation with 𝔻 {x | -∞ < x ≤ -e>0.75}
 


Note with the equations above, that the computer program Maxima uses the notation log(x) instead of the usual notation ln(x) for the natural logarithm. By the way also Javascript ('Math.log()'). If the above equations with 'ln' should be used in such programs, 'ln' would have to be replaced by 'log'.

If you experiment with the interactive form above, you will soon realize that in extreme evaluations the relevant evaluation difference hardly changes when these evaluations are entered even more extreme. Example for White:
high evaluation = 15
suboptimal evaluation = 0
e=0.75 = 2
e>0.75 = 3
probabilistic game result at e>0.75 = 0.85 (corresponds to a r>0.75 = 0.3)
result of the relevant evaluation difference = 2.64

If the high evaluation is increased to 18, the relevant evaluation difference amounts to 2.65. And a high evaluation of 1000 results in a relevant evaluation difference of 2.65. The same results occur if the suboptimal evaluation is -15, -18 or -1000 and the high evaluation amounts to 0.

The relevant evaluation differences are rounded up or down to 2 decimal places in the form. If you now want to calculate the high or suboptimal evaluation (in future called ‘irrelevance start evaluation’), from which every further increase or reduction to infinity will lead to an increase of the relevant evaluation difference (with 2 decimal places) by 0.01 at some point with a maximum probability of 50%, you need the following formula:

e >0.75 r >0.75 irrelevance start evaluation e >0.75 ln ( r >0.75 ) = 0.005 -{e_">0.75" cdot {r_">0.75"}^{"irrelevance start evaluation" over e_">0.75"} over {ln(r_">0.75")}} = 0.005


solved according to irrelevance start evaluation and taking into account high and low results ("±"):

irrelevance start evaluation = ± e >0.75 ln ( ln ( r >0.75 ) 200 e >0.75 ) ln ( r >0.75 ) "irrelevance start evaluation" = +- {{e_">0.75" cdot ln left (-{{ln(r_">0.75")} over {200 cdot e_">0.75"}} right)} over {ln(r_">0.75")}}


The result with the above parameters amounts to ±15.477.

The formula shows that the irrelevance start evaluation is independent of the 2nd evaluation (0 in the above case) and of e=0.75 (2 in the above case).

This formula normally applies to the localization of the irrelevance start evaluation in the 3rd positive and negative sector. With unusual values of e>0.75 and r>0.75, the irrelevance start evaluation slips into the 2nd positive and negative sector, so that far more complicated formulas are required. This happens when the following applies:

e >0.75 < 0.005 ln ( r >0.75 ) r >0.75 e_">0.75" < -{{0.005 cdot ln(r_">0.75")} over r_">0.75"}


For example, if e>0.75 < 0.978 and the probabilistic game result at e>0.75 = 0.99. Or if e>0.75 < 0.0277 and the probabilistic game result at e>0.75 = 0.875. Highly unrealistic!

The evaluation relevance reduction and all the delicacies mentioned in this article (automatic move and position evaluation symbols as well as the probabilistic game results) have been implemented
in the ScpcPGN program, available free of charge on this website
and in the AquaPGN program (latest update 12th August 2020), available free of charge on this website.

Probabilistic game results


Why is there talk of 'probabilistic' game results? Because they are derived from an engine evaluation and other parameters and therefore contain a stochastic statement about the presumed average game outcome. The situation was different in the discussion of the TCEC results, where only the 'average' game results were mentioned, because there was game material available with which the factual average game results could be calculated.

The probabilistic game result is always presented here from the point of view of White. If White wins, the result is 1-0, vice versa 0-1, and in the case of a draw ½-½. If you take the leading number in each case, you have the probabilistic game result used here.

It can be derived directly from the evaluation relevance:

for positive evaluations:
probabilistic game result = 1 - (evaluation relevance / 2);

for negative evaluations:
probabilistic game result = evaluation relevance / 2.

An engine evaluation of exactly 0.00 with an evaluation relevance of 1.00 results in a probabilistic game result of 0.50, i.e. a presumed draw. A probabilistic game result of approximately 1.00 would be an almost certain win for White, and one of approximately 0.00 would be an almost certain win for Black. 1.00 and 0.00 are mathematically never exactly reached. And an engine rating of exactly e=0.75 leads to a result of 0.75, i.e. a value that lies exactly between win for White and Draw. The results are therefore easier to interpret from White's point of view.

Clarification: the probabilistic game result is in no way equivalent to a win probability.

Many people make this mistake. For example, the programme Nibbler manages to confuse the - in reality - probabilistic game result with the 'Winrate', although in the starting position after 1. e4, for example, this 'Winrate' exceeds 50%, while the actual win probability in the 'WDL' display is only a modest 15%. But the programme author apparently does not notice this.

It applies lapidary:

win probability = probabilistic game result draw probability 2 win probability = probabilistic game result - draw probability over 2


In order to fulfill the duty of a chronicler also the game result equations:

1st positive und negative sector:

y PGR = 1 2 + x 4 e =0.75 y_pPr = 1 over 2 + x over {4 cdot e_"=0.75"} linear equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}
 


2nd positive sector:

y PGR = 2 e =0.75 r >0.75 2 x r >0.75 + 3 e >0.75 4 e =0.75 + x 4 ( e >0.75 e =0.75 ) y_pPr = {2 cdot e_"=0.75" cdot r_">0.75" -2 cdot x cdot r_">0.75" + 3 cdot e_">0.75" - 4 cdot e_"=0.75" + x} over {4 cdot (e_">0.75" - e_"=0.75")} linear equation with 𝔻 {x | e=0.75 ≤ x ≤ e>0.75}
 


2nd negative sector:

y PGR = 2 e =0.75 r >0.75 + 2 x r >0.75 e >0.75 x 4 ( e >0.75 e =0.75 ) y_pPr = -{{2 cdot e_"=0.75" cdot r_">0.75" +2 cdot x cdot r_">0.75" - e_">0.75" - x} over {4 cdot (e_">0.75" - e_"=0.75")}} linear equation with 𝔻 {x | -e>0.75 ≤ x ≤ -e=0.75}
 


3rd positive sector:

y PGR = r >0.75 x e >0.75 2 2 y_pPr = -{{{r_">0.75"}^ {x over e_">0.75"} -2} over 2} exponential equation with 𝔻 {x | e>0.75 ≤ x < ∞}

 


3rd negative sector:

y PGR = 1 2 r >0.75 x e >0.75 y_pPr = 1 over {2 cdot {r_">0.75"}^{x over e_">0.75"}} exponential equation with 𝔻 {x | -∞ < x ≤ -e>0.75}
 


Of course, the probabilistic game results can also be found in the interactive form.

How you should not do it though:

Sune Fischer and Pradu Kannan have examined the mathematical relation between 'winning probability W and the pawn advantage P' in the article 'Pawn Advantage, Win Percentage, and Elo'.

Whether 'winning probability' really means the real (lower) ‘winning probability’ or perhaps only the (higher, since draws are taken into account) probabilistic game result can be deduced from the article elsewhere:

'When applying the condition that the win probability is 0.5 if there is no pawn advantage …'

If ‘the win probability is 0.5’ and the 'pawn advantage' is zero, the loss probability would necessarily also have to be 0.5 in order to evaluate the position as balanced. But where then are the draws, which should approach with a win probability of 50% this mark, with low loss probability?! It seems that the authors' knowledge of chess game is quite limited. This nonsense must therefore be corrected to the effect that the authors are not referring to the 'win probability’ but to the probabilistic game result discussed in this article, which includes draws and losses. This is how the calculation works: A probabilistic game result of 0.5 is equivalent to an evaluation – or if you like a 'pawn advantage' – of 0.00.

'Data was taken from a collection of 405,460 computer games in PGN format. Whenever exactly 5 plys in a game had gone by without captures, the game result was accumulated twice in a table indexed by the material configuration. … Only data pertaining to the material configuration was taken. This was considered reasonable because the material configuration is the most important quantity that affects the result of a game.'

If by 'material configuration' the material balance is meant as the difference of the mutual figure values is to be assumed, because it is stated elsewhere:

'For each material configuration, a pawn value was computed using conventional pawn-normalized material ratios that are close to those used in strong chess programs (P=1, N=4, B=4.1, R=6, Q=12).'

Apart from the fact that these figure values seem to be quite generous, the material balance is very coarse compared to the evaluations of chess engines, which are based on much more difficile criteria and last but not least on considerable search depths. But all this would still be bearable if the relation between win probability and figure balance presented by the authors were stringent. Meanwhile, an ominous parameter 'K' appears in their ultimate formula:

W = 1 1 + 10 P K or y PGR = 1 1 + 10 x K W = 1 over {1 + 10^{-P over K}} ~ "or" ~ y_PGR = 1 over {1 + 10^{-x over K}}


And they estimate this parameter 'K' at '4' – roughly.

If you resolve this formula to K, you get:

K = ln ( 10 ) P ln ( W W 1 ) or K = ln ( 10 ) x ln ( y PGR y PGR 1 ) K = {ln(10) cdot P} over {ln(-{W over {W - 1}})} ~ or ~ K = {ln(10) cdot x} over {ln(-{y_PGR over {y_pPr - 1}})}


And if you insert into this formula, for example, the Ps and Ws determined above for the winning engines of TCEC 17 (LCZero) and 18 (Stockfish), you get very different Ks between 1.7 and 3.2.

Conversely, a K of 4 with a probabilistic game result of 0.75 would result in an evaluation of 1.91, which is not very realistic according to the above table values. This assessment is confirmed by the following test: Determine within the Stockfish WDL calculation the evaluations for different half-moves, each with a probabilistic game result of 0.75. One obtains
in half-move 1 an evaluation of 1.50,
in half-move 10 an evaluation of 1.40,
in half-move 100 an evaluation of 1.15
and never an evaluation of 1.91.

Obviously, it is illusory to try to mathematically force the desired relation into a single sigmoid function with only one parameter ('K'). In contrast, the form 'Interactive Evaluation Relevance Reduction' presented at the beginning of this article works to calculate the probabilistic game results in the user ERR with a total of 5 formulas and 3 parameters and in the Stockfish WDL with very accurate win, draw and loss probabilities. Precision instead of simplification!

Concretisation of the move evaluation sectors


It may seem tasteless to derive in the following the move evaluation symbols quasi automated from engine evaluations, as they are often chosen based on a deeper understanding of the position and are not oriented towards engine evaluations. Example: In a position there is quite clearly only one reasonable move that every child can find, all other moves would be miserable. It would be more than stupid to attest this one move the quality feature '‼'. Or a little more subtle: In lost position, a move that is objectively weak, i.e. theoretically refutable, is setting a trap that holds the chance of revival. A typical "interesting move (!?) - NAG $5", which should perhaps not be characterized with "?" or the like. Nevertheless, in many cases it can make sense by all means to determine such move evaluation symbols from a comparison of the engine evaluations for two alternative moves, especially if there is no opportunity to examine a position more carefully, for example in automatic game analyses.

The intention of Grandmaster Robert Hübner cannot be followed in this way. In the English-speaking Wikipedia he is quoted as follows:

'German grandmaster Robert Hübner prefers an even more specific and restrained use of move evaluation symbols: 'I have attached question marks to the moves which change a winning position into a drawn game, or a drawn position into a losing one, according to my judgment; a move which changes a winning game into a losing one deserves two question marks ...‘'

Uncertain assessments such as 'winning position', 'drawn game', 'drawn position' or 'losing one' do not become more suitable for programming by the addition 'according to my judgment'.

The starting point for the classification of the move evaluation symbol is, of course, the real made move, on the other hand the best alternative move for bad moves and the second-best alternative move for good moves. For these two moves - as explained above - the relevance reduced evaluation difference has to be determined and this in turn has to be translated into the move evaluation symbol. Thereby the definite integral of the entire evaluation range from -∞ to +∞ divided into not only 6, but 7 or 8 sectors of equal size. There are not only the 6 sectors for which a move evaluation symbol is to be assigned, but also the neutral sector of a move that is approximately equal to the best or second-best move. Half of this neutral sector comes in the positive evaluation direction and half in the negative. One can either use a neutral sector with the same integral size as the remaining sectors or a neutral sector twice as large, consisting of 2 sectors with the usual integral size, one for each evaluation direction. This would total to either 7 or 8 equal integral areas (on the latter variant 2 integral sectors for the neutral sector).

Mind you: We are talking here about integral sectors and sizes respectively in the sense of definite integrals, i.e. the relevant evaluation differences, not to be confused with the absolute differences between 2 move evaluations on the x-axis. For a given relevant evaluation difference, the latter are quite different, depending on where the move evaluations are located on the x-axis. The further they move away from the y-axis, i.e. from the move evaluation 0.00, the more their distance to each other increases with a given relevant evaluation difference.

Mathematically it is even possible - based on von e=0.75, e>0.75, r>0.75 as well as a given move evaluation - to calculate that limit value of a new move evaluation which would result in case of a move with any move evaluation symbol. Hard to understand, therefore an example: Given is a faulty move of White with an evaluation of -0.30 and a e=0.75 of 1.50, a e>0.75 of 3.00 and a probabilistic game result at e>0.75 of 0.875. From which evaluation would an alternative good move of White compared to this weak and at the same time next best move deserve the move evaluation symbol '‼'? Depending on the used scheme, the answer will be for example 1.52 or 1.62.

Of course, such move evaluation symbols only come into effect if correspondingly high definite integrals - pardon: relevant evaluation differences - are available at all. A correct move of white with an engine evaluation of 100.00 will hardly earn a'!?','!' or '‼', even if the second-best move is only 10.00. This positive evaluation difference is simply irrelevant and is therefore confirmed with a relevant evaluation difference of almost 0.00. A won position can usually not be spoiled with the second best moves. That is just the effect of the evaluation relevance reduction.

How big should these relevant evaluation differences for the move evaluation symbols turn out now? Possibly with the exception of the neutral sector, the entire integral area could be divided into equal parts, or the subdivision could be aligned in that way that a brilliant move can already be stated if it exceeds the win draw balance and the next best move has to make do with an evaluation of 0.00. The first alternative deals with the move evaluation symbols more economically, the second is more generous.

Extensive move evaluation symbol scheme
1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7:


Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is

brilliant move (‼) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a better evaluation,
impressive move (!) - 1/14 + 1/7 = 3/14 of the total integral towards a better evaluation,
attractive move (!?) - 1/14 of the total integral towards a better evaluation,
questionable move (?!) - 1/14 of the total integral towards a suboptimal evaluation,
weak move (?) - 1/14 + 1/7 = 3/14 of the total integral towards a suboptimal evaluation and
miserable move (??) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a suboptimal evaluation.

From this, the thresholds of the move evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code.

Generally, move evaluation symbols are assigned more generously here than in the following scheme '1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8'.

Restrictive move evaluation symbol scheme
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8:


Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is

brilliant move (‼) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a better evaluation,
impressive move (!) - 1/8 + 1/8 = 1/4 of the total integral towards a better evaluation,
attractive move (!?) - 1/8 of the total integral towards a better evaluation,
questionable move (?!) - 1/8 of the total integral towards a suboptimal evaluation,
weak move (?) - 1/8 + 1/8 = 1/4 of the total integral towards a suboptimal evaluation and
miserable move (??) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a suboptimal evaluation.

Generally, move evaluation symbols are assigned less generously here than in the previous scheme '1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 1/7'.

In the interactive form, the thresholds between the symbols in both scheme tables are listed, as far as algebra allows it, i.e. as far as the margin of relevant evaluation difference remaining after the initial evaluation allows an award. If not, the character string '-----' is output.

Optimum rate:

In the results under the form 'Interactive Evaluation Relevance Reduction' you will also find the 'optimum rate of the suboptimal move evaluation'. This contains the precise numerical expression for the move evaluation symbol of the suboptimal move evaluation (nothing, ?!, ?, ??).

It is calculated as follows:

1 - (relevant evaluation difference / total integral)

The total integral is the definite integral over the entire x-axis with the evaluations from -∞ to +∞.

So the optimum rate is regularly below 100% and reaches the optimum of 100% only exceptionally with 2 move evaluations without relevant evaluation difference.

Concretisation of the position evaluation sectors


The 9 evaluation sectors listed at the beginning of the article can now be described in more detail using the developed mathematical foundations. Four evaluation sectors in each case are positive and negative. The balanced position shall apply to minimal advantages for White and Black around the value zero. The sector of the minimum advantage for White or Black is 50% of the total balanced sector.

9 position evaluation sectors with threshold adjustment at the probabilistic game results:

Here an assumption takes place that is not mandatory, but very plausible: The end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' should coincide exactly with the evaluation e=0.75 for which the probabilistic game result amounts to 0.75. Conversely for Black: The end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' should coincide exactly with the evaluation -e=0.75, for which the probabilistic game result amounts to 0.25 from White's point of view. With this basic assumption is accompanied that a slight or moderate advantage probabilistically represents a tendency to draw and a clear or extreme advantage probabilistically represents a tendency to win.

Further assumption: The end of the sector 'clear advantage for White' and the beginning of the sector 'extreme advantage for White' should exactly coincide with the evaluation e>0.75. Conversely for Black: The end of the sector 'clear advantage for Black' and the beginning of the sector 'extreme advantage for Black' should coincide exactly with the evaluation -e>0.75.

When using this scheme, it would probably be useful to adjust the probabilistic game result at e>0.75 to 0.875, so that it lies exactly in the middle between 0.75 and 1.00.

Now some mathematics again:

The task now is to quantify these individual advantage sectors. For example if one would compare a white move with an overwhelming advantage of 100.00 to a patzer move leading to a draw (0.00), the absolute evaluation difference would be 100.00, but the relevant valuation difference would be only the practically complete definite integral of all functions in the exclusively positive range of the x-axis (which in turn is identical to the definite integral in the exclusively negative range of the x-axis).

By the way, the mathematical formula for the complete integral from -∞ to +∞ is:

2 e >0.75 r >0.75 ln ( r >0.75 ) 2 e =0.75 r >0.75 ln ( r >0.75 ) + e >0.75 ln ( r >0.75 ) + 2 e =0.75 ln ( r >0.75 ) 4 e >0.75 r >0.75 2 ln ( r >0.75 ) {2 cdot e_">0.75" cdot r_">0.75" cdot ln(r_">0.75")- 2 cdot e_"=0.75" cdot r_">0.75" cdot ln(r_">0.75") + e_">0.75" cdot ln(r_">0.75") + 2 cdot e_"=0.75" cdot ln(r_">0.75") - 4 cdot e_">0.75" cdot r_">0.75"} over {2 cdot ln(r_">0.75")}


Next thought experiment: If one would now compare a white move with an advantage of e=0.75 exactly at the border between moderate and clear advantage with a patzer move that leads to a draw (0.00), the absolute evaluation difference would be e=0.75, but the relevant evaluation difference would only be the complete definite integral in the 1st positive sector of the x-axis. As mathematical formula: 0.75 * e=0.75.

If one now sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage, between the latter and begin of the moderate advantage and again between the latter and the begin of the clear advantage in each case for White/Black, the integral value of 0,75 * e=0.75 would have to be divided into 3 sectors:

20% = 0.15 * e=0.75 for the sector balanced position from 0.00,
40% = 0.30 * e=0.75 for the sector slight advantage for White/Black and
40% = 0.30 * e=0.75 for the sector moderate advantage for White/Black.

From this, the thresholds of the position evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code.

7 position evaluation sectors with threshold adjustment at the probabilistic game results:

'Extreme advantage for White (+--) or Black (-++) - NAG $20/$21' may not be everyone’s cup of tea. For these contemporaries now a now a repetition of the previous proposal, but this time with only 7 evaluation sectors without extremes.

Here, the end of the sector 'slight advantage for White' and the beginning of the sector 'moderate advantage for White' coincide exactly with e=0.75, for which the probabilistic game result amounts to 0.75, and the end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' coincide exactly with e>0.75. Conversely for Black: The end of the sector 'slight advantage for Black' and the beginning of the sector 'moderate advantage for Black' coincide exactly with -e=0.75, for which the probabilistic game result amounts to 0.25 from the white point of view, and the end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' coincide exactly with the evaluation -e>0.75. This basic assumption is accompanied by the fact that slight or moderate advantage probabilistically represents a tendency to draw and clear advantage probabilistically represents a tendency to win.

If one here sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage and between the latter and begin of the moderate advantage in each case for White/Black, the integral value of 0,75 * e=0.75 would have to be divided into 2 sectors:

1/3 = 0.25 * e=0.75 for the sector balanced position from 0.00 and
2/3 = 0.50 * e=0.75 for the sector slight advantage for White/Black.

9 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/9 1/9 1/9 1/9 1/18 1/18 1/9 1/9 1/9 1/9 of the total integral:

If one discards the above guideline of threshold adjustment at the probabilistic game results and again prefers 4.5 positive or negative position evaluation sectors this time, however, of equal quantity, the evaluation areas would turn out as shares of the total integral as follows:

1/18 for the sector balanced position from 0.00,
1/9 for the sector slight advantage for White/Black,
1/9 for the sector moderate advantage for White/Black,
1/9 for the sector clear advantage for White/Black and
1/9 for the sector extreme advantage for White/Black.

7 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 of the total integral:

If one discards the above guideline of threshold adjustment at the probabilistic game results and is also not a friend of 4.5 positive or negative position evaluation sectors with extremes, this scheme with sectors of equal quantity remains:

1/14 for the sector balanced position from 0.00,
1/7 for the sector slight advantage for White/Black,
1/7 for the sector moderate advantage for White/Black and
1/7 for the sector clear advantage for White/Black.

The interactive form lists the position evaluation symbols and the limit values between the symbols, the latter in a separate line for each of the 4 schemes.

A tip by the way: If the well-disposed reader strives to use the position evaluation symbols, however not getting hold of them, the following link to the AqChessUnicode font could be helpful. This by the way is also attached to the chess GUI Aquarium.

And those who would not be averse from entering these special chess characters directly with the keyboard for commenting in texts and call a Windows operating system their own may find out about the 'Keyboard layout for chess annotation with special symbols in Windows programs via AutoHotkey'.

The human factor


An evaluation of around 1.50 pawn units to achieve an average game result of 0.75 applies to a largely optimal chess play, as the best chess engines practice it nowadays, but not necessarily also for chess players, not even for grandmasters, who also play far too often bullshit and should therefore theoretically make do with a clearly higher e=0.75. The reason for this would be their tendency to make mistakes, which lets them draw or even lose games which were believed to be already won. One objection to this, however, is the fact that this measured value would be pushed down again by the mistakes of their opponents of the genus Homo sapiens, because their mistakes often lead to wins which were not necessarily inevitable and for good chess engines such positions under pressure might have been defensible. In this way, many actual draws with temporarily high evaluations could be statistically included in the number of wins without pushing the e=0.75 up or, vice versa, even minimizing it, since with every additional win a lower evaluation in the waiting list rises to the new e=0.75. In this respect a suboptimal chess play would be upgraded by the suboptimal opponent's play. Which impact of the chess playing Homo sapiens for the e=0.75 will take more effect is uncertain.

If chess grandmasters still had the guts to face the best chess engines, their true e=0.75 might not be determined either. After all, when would they have a clear advantage in such games or even carry off wins? Maybe in extreme handicap games? With them it could be tested how many pawns would have to be taken away from the computer opponent in the starting position in order to wangle wins and draws on a significant scale for the got off scot-free master. Or how a given opening would have to be constructed to release the chess engine into a questionable position. So the grandmasterly e=0.75 could be determined after all. But since contemporary chess luminaries have been avoiding such comparisons more and more for a long time in order to escape disgrace, such a question hardly arises any more.

Since such game material from matches between man and machine is hardly available, there is currently and presumably also for eternal times only left the half-baked possibility to evaluate games between humans. Whereby one should always keep in mind that the resulting scores were diluted by the dubious playing style of the opponent. Forget it.

No sooner said than done by analysis of 144 world championship matches between Karpov and Kasparov in the years 1984 to 1990. The very last game remains unconsidered, since Kasparov settled a draw with Karpov there with a clear advantage, although the win - as it says in chess slang - was only a question of technique. A draw was enough for him to win the world championship title. All games were superficially analyzed by Stockfish with a short reflection time and an average depth of just over 20 half moves.

To make a long story short: Kasparov won 21 times, Karpov 19 times. The 21 and 19 highest evaluations respectively in draw games were between 3.67 and 1.00 for Kasparov and between 7.80 and 1.04 for Karpov. If you like, you can read from this a win draw balance of at least 1.00 …

Despite a positive evaluation of at least 1.26, the game was still set in the sand in 5 games. Kasparov even messed up the 18th match in the 1986 world championship fight despite a clear 3.67!

Excursus: "Draw range"


At this point the term "draw range", which is wandering around like a ghost from time to time, should be critically scrutinized a little. Because it suggests wrongly that it would coincide with the evaluation range "balanced position or draw (=) - NAG $10". To the reader's chagrin, however, there comes to light a pretty different understanding of this term.

Variant 1:

"Houdini insists on Txc6 and specified at depth 25 an evaluation of 0.76, which probably does not exceed the draw range yet." (Thema "Endspielkönnen gefragt" von Joe Boden Datum 2013-02-09 13:03)

"Therefore one believes with Houdini that a (won) endgame is still in the draw range when it shows +0.80..." (Schachfeld).

This suggests that on the basis of a position evaluation of a chess engine in the low range a statement could be made about the draw outcome of the game. Clearly every game win starts small, namely with a minimal advantage, even maybe after the first move. And if one position the chess engine on the first moves after a game won in this way and let yourself be convinced that the game did by no way start with an initial advantage of significantly more than +0.80, you might start to think long and hard. And the argumentative counter attack by later failures, which are said to have caused the disaster, won’t work, if the patzer is called Stockfish for example and has an ELO of approximately 3500. Take to heart the TCEC loss games of Stockfish. There you will find a lot of games that ended in disaster for this engine despite a negative "draw range" of about -0.76 or -0.80, although it is not well-known for negligently dealing with its positions within the alleged "draw range". Who else but Stockfish should be able to keep such positions in a draw?

Variant 2:

"If during a game no side has a winning advantage, it is also said that "the game is within the draw range"." (Wikipedia).

'Draw range
Scope for a position evaluation, which will lead in the end with the best possible play on both sides to a draw. In the example, White is worse, but is still in the draw range, because he can prevent the pawn from promotion with his king. But if he had the idea to play 1.Kh1, e.g. hoping for 1...f2 and stalemate, he would have left the draw range and Black could now force the victory with best play, namely by 1...Kg4 including gain of the opposition. Whether the starting position of the chess game is in the draw range, or whether perhaps White could force the victory, is too complex to be answered." (www.schwachspieler.de).

Here, the term "draw range" is associated with an ominous "scope for a position evaluation" in the case of a forced draw by certain moves with the best play, which can apparently be proven. In connection with a demonstrable draw, however, even to utter the word "range" is a sign of distorted logic. The draw is 0.00, nothing else. In this case a chess program would have to deliver not only a position evaluation of 0.00, but also one or more draw variants, which are mandatory according to the rules of logic or according to endgame tablebases. This only works in special positions, especially in all maximum 7-man positions, which are completely analysed, all others are simply so complex that one has to be content with a position evaluation between zero and checkmate without being able to draw any compelling conclusions about the outcome of the game. And if a chess program in a real draw position would show a rubbish evaluation differing from 0.00, the program would have a code problem and this would not justify the alogical term "draw range".

If, as usual, a draw would not be provable, one should certainly not use the term "draw range" to lead the reader to believe in would-be knowledge that one cannot have in view of the complexity of a chess game. Then only statistics/probabilistics (the actual topic of this article) govern with regard to all considerations about the outcome of the game and opening databases with win, draw and loss rates of one and the same position can tell a tale about it.

Contact: mail@konrod.info






 

Ende Gelände ♦ Aus die Maus ♦ Schicht im Schacht ♦ Klappe zu - Affe tot

So long ♦ See You Later, Alligator - In A While, Crocodile ♦ Over And Out