as if chess moves could ever have an objective quality

i like totally unnecessary complications

2025-11-06

Spoilers, sort of, I guess, for The Tale of the Top-Tier Intellect. Also, Happy Birthday to Celene!

Tessa’s face screwed up in thought. “The reality that’s more complicated than the big straight global line of Elo scores might look like...
(…)
Down in actual reality there’s lots of small skill-difference arrows, not perfectly aligned, but lined up in mostly the same direction as the imaginary big Elo-difference arrow, weighed up across the sort of chess positions that probably arise when you and Mr. Neumann play in practice.” Tessa sighed performatively. “It really is a classic midwit trap, Mr. Humman, to be smart enough to spout out words about possible complications, until you’ve counterargued any truth you don’t want to hear. But not smart enough to know how to think through those complications, and see how the unpleasant truth is true anyways, after all the realistic details are taken into account.”
(…)
“The thing you’re not realizing, young lady, is that no matter how many fancy words you use, they won’t be as complicated as real reality, which is infinitely complicated.”

At this very moment—which, if you had the full theory of multiversal anthropics, would¹¹ (I arbitrarily posit for the sake of this blog post despite mostly thinking it unlikely.) seem like a normal sort of moment for this sort of thing to happen—Tessa was woken up out of the simulation of her Universe. It turned out that one of those living outside the Matrix (which was really more like an algorithm involving a complicated collection of many matrices, but they called it the Matrix) was interested in her discussion with Mr. Humman, and they had to start pulling people out anyways because the simulation was about to reach a particular common end state.

“It does seem to me, Ms. Tessa, that you have, on net, the better side of your argument with Mr. Humman,” began the Outsider, who had introduced itself as Forma. “Indeed, Mr. Humman strikes me as so foolish that I might think he were a character in some satire or ventfic, did I not know that he was a normal simulated being inside the Matrix. But this talk of assigning a small skill-difference arrow to each move, while perhaps a refinement of the standard Elo concept, doesn’t seem to really capture much of the real reality underlying Elo scores—though that is probably not, as Mr. Humman argued, primarily due to it not being as infinitely complex as the infinitely-complex reality. To point at one issue: what makes one move better than another, exactly?”

Tessa was no expert chess player, but she’d played enough chess online to know that move quality is often quantified by its centipawn loss, and that losing an average pawn with no compensation is usually enough for a move to be considered a mistake, rather than an inaccuracy. “I’m not sure,” Tessa had to admit, not currently having access to Gemini, “how exactly the calculation is done. But there are computer programs that measure the quality of a move in a consistent manner, and these are often used in chess analysis.”

Despite not being quite humanoid, Forma made a motion that you might compare to a slight shake of one’s head. “There are many such programs, and they do not always agree. In some sorts of positions, like those appearing in endgame tablebases or those where one player has a forced mating sequence, there might be a clear enough conclusion to produce total consensus. But in other positions, which make up the vast majority, slight disagreements can be common. And sometimes, though it is rarer, these disagreements can be substantial. Furthermore, the assessment of a move often depends on the amount of computing resources available to the chess engine—if you let it think for long enough, it may deduce some surprising fact about a variation, and decide to change its mind. Now, it happens to turn out that it probably won’t matter too much which chess algorithm you choose or precisely how long you let it think—at least if you’re only interested in comparing Mr. Neumann and Mr. Humman. But this did not have to be the case, and it’s not obvious why anyone skeptical of Elo wouldn’t, say, think that the right move in one game is the wrong move in another.”

Tessa frowned thoughtfully. “Okay, sure, especially in complicated midgame positions, maybe you can have scenarios where one engine thinks a move loses thirty centipawns, and another thinks it loses fifty. And maybe engines sometimes change their mind after thinking longer. But…” she paused, choosing her words carefully. “I think this is missing the forest for the trees. You could still, in theory, come up with a measure of move quality, even if you have to arbitrarily choose some imperfect measure, and then take some sort of weighted average of the difference in quality between Mr. Neumann and Mr. Humman’s moves over the space of chess positions. Sure, sometimes this sort of averaging will give results that aren’t conclusive, and if a player has only a very tiny edge on another then we might worry that a different measure of move quality might give the edge to the other player, but those cases will just be the players with very similar Elos—it would be ridiculous to look at something as messy as ‘chess player quality’ and expect to derive an actual total ordering, but a large gap in Elo still really does indicate something like a gap in skill.”

“Hahaha,” Forma said out loud, since the equivalent for laughter along its kind would be uninterpretable to Tessa. “So you did not notice your mistake, then. It is probably more or less as you say, if we are speaking of humans around Mr. Humman’s skill level—though I haven’t actually thoroughly checked. But you have not, I think, actually thought through the complications and determined how Mr. Humman is wrong, even if empirically you can tell that he must be.”

“You see, I do happen to think,” Forma continued, “that a move which is right in one game is often wrong in another. Among elite chess players, it’s very common to extensively study the favored opening lines of an upcoming opponent, or to steer away from the sorts of positions that your opponent tends to perform unusually well in²² Indeed, you could (in principle) memorize a line or set of lines that wins against any given chess engine, so long as it doesn’t randomize its move choices in order to make such a rigidly-tailored strategy impractical… or end up nondeterministic due to varying computational power and such.

In principle, it could avoid that trap even while deterministic by simply being good enough to draw against perfect play—but probably modern engines are nowhere near that good. Though even if they were, it would be hard to prove—you can’t exactly use process of elimination to show that it loses to none of the possible lines. For now, though, I expect you could eventually construct one, even if it’s really hard to do so.. Though you could perhaps attain Mr. Assi’s level in chess without worrying about this sort of thing, International Masters generally benefit substantially from doing so—unlike Mr. Humman, who probably ought to focus more on the fundamentals of chess strategy. And in the actual top tier of chess (among humans) it often goes beyond picking a favorite among many similarly good options. Maybe you wouldn’t ever expect to benefit from playing a blunder, but you might study and plan to use an opening line that gives up as much as 30 centipawns according to a chess engine, if you expect your opponent won’t be prepared for it. If you’re playing blitz and expect to handle a highly complicated position better than your opponent will under the time pressure, it isn’t unheard of to sacrifice half a pawn or more in computer-judged value—though that would be seen as a little… aggressive. The fundamental point is that, among the best human players, a very critical aspect of chess skill is not captured by your so-called ‘actual reality’ fails to capture. You’d likely capture the difference between Mr. Neumann and Mr. Humman nonetheless, and almost certainly the difference between Mr. Humman and Mr. Assi, but Elo scores still capture important truths of chess that this more refined metric does not.”

Tessa supposed that her clever explanation of how one chess player could still be better than another despite having different strengths and weaknesses was somewhat incomplete, and not only in ways that aren’t relevant to actual chess. But despite the nitpicking, she thought she could see how to save face—or at least, she hoped she could figure it out by the time she was done talking. “It seems to me like the fundamental complication you’re pointing at comes from how a weighted average over the space of chess moves fails to capture the fact that the players' strategies will have an impact on which positions come up more often. But…” she trailed off for a moment, trying to remember precisely what she was originally thinking. “But when I described that more refined model of chess skill, I was mostly imagining the weighing focusing on the positions that might show up when Mr. Neumann and Mr. Humman in particular play each other, which at least partially accounts for the case where one player steers towards positions they can handle better than their opponent will.”

“That said,” she continued, having scored at least one point against her new argumentative opponent, “I will admit that this probably doesn’t totally address the issue. Even if we use a weighting that accounts for how engine-evaluated skill with a certain sort of position is a larger advantage when you’re better at steering into that position, and for how the positions that are likely to arise vary between pairs of players—this still doesn’t capture that the quality of a move is sometimes—or actually, always, in a sense—located in the impact it has on the weighting, which worries me more than when I was just imagining an eval bar. And, for that matter, the basic idea of comparing how well Mr. Neumann and Mr. Humman perform on each position seems sort of misguided now that I think about it more—what does it matter, if Mr. Neumann couldn’t play the Ethiopian³³ Tessa had no idea what’s involved in the Ethiopian, not even whether it’s short for the Ethiopian opening, the Ethiopian variation, the Ethiopian defense, or some other term—she had only heard the curious fact that Mr. Humman didn’t play his favorite opening against Mr. Assi, which one of the two other Skewers residents Mr. Assi was playing had noticed by looking over at Mr. Humman’s board. (Not that this person knew many openings very well, but they had studied responses to the initial moves that Mr. Humman most often played against them, even though they only rarely practice openings.) quite as well as Mr. Humman can, so long as Mr. Neumann can comfortably play the Queen’s Gambit? It only matters that Mr. Neumann can play well against the Ethiopian.”

“Huh,” replied Forma. “I hadn’t actually realized your suggestion was that bad, while still imagining a less-specific weighted average. It was probably foolish of me to say that your metric likely captures the difference between Mr. Neumann and Mr. Humman, then. Though I expect Mr. Assi could play the Ethiopian not all that much worse than Mr. Humman does if he had reason to try it, especially if he had fifteen minutes to study the opening theory in advance.”

It was not immediately clear to Tessa how she ought to feel, about her conversation partner calling itself foolish for such a reason, but after a moment she decided she might as well be proud to have made a fool of Forma. It’s not like there’s anything shameful in acknowledging that an ex tempore guess turned out to be flawed.

“Why,” Tessa asked after a few moments, “did you wake me out of the simulation just to immediately have this conversation? Are you that interested in the philosophy of chess?”

“My society, especially the parts of it interested in complicated combinatorial games invented by simulated worlds and in the simulated culture surrounding those games, is inhabited predominately by—what’s the English word?—trolls,” explained Forma, ignoring Tessa’s question. “Most chess players you encounter here aren’t particularly trying to maximize their odds of winning, not even if you make sure to include anyone who only falls short of that due to not wanting to drift too far from some conception of Sportsmanship. We have tournaments in similar formats to yours, because we copied them, but the ‘winner’ doesn’t particularly receive status. We compete to implement the most interesting strategies. One of the most famous players taught themselves to play in a manner encoded by the value of π—they interpret each of the twenty possible opening moves as corresponding to an interval, [0,1) through [19,20). They always open with b1, because it is third lexicographically and so corresponds to the range [3.0,4.0), which contains π. Then, after their opponent responds, they divide up [3.0,4.0) into even smaller intervals depending on how many legal moves they have, and continue on in this fashion. We have far more working memory than humans, but that’s impressive even for us. And you don’t want to hear about our engines.”

Tessa blinked. “…You just wanted to screw with me? I guess I can respect it.”

“What?” Forma asked. “No. I wanted to ask whether you think our skill—or maybe I should say, our winning propensity—can be quantified.”

After thinking about it for a moment, Tessa hadn’t the slightest idea.

My problem, really, is that I am too clever, and when I hear arguments like “the Bongcloud can sometimes be an effective psychological warfare technique”, I file them into my brain as an important notion about how chess strategy works. How could the depth of such tactics be encapsulated in a mere function from the set of board positions to an interval of reals, or to any other partially ordered set?

I suspect that there is some way you could in principle construct a species of aliens in order to produce a chess skill ladder nearly orthogonal to that humans have. Or, maybe approaching orthogonality requires doing some really pathological things—like having aliens which aren’t even trying to win—but you can get something where the correlation is surprisingly small.

You might think that there’s grounding for an objective measure of move quality in terms of Zermelo’s theorem, even if only a three-valued one⁴⁴ Or I guess positions are Wins, Draws, and Losses, but moves go from a winning position to another winning position, a win to a draw, a win to a loss, a draw to a draw, a draw to a loss, or a loss to a loss.

You could also say that winning faster is more winninger, and losing slower is less losinger, and that minmaxing on that is best. Minmaxing doesn't actually get you your win as fast as possible or get you your loss as slow as possible unless your opponent is also minmaxing the same criterion, but it is nonetheless a thing you could say. (It’s not even an unreasonable thing to say, mate-in-n is totally a useful concept, I just don’t think it’s a perfectly satisfactory concept of objective move quality if you start applying it to like, midgames.). I will certainly grant that a checkmate is objectively at least as good as any other move⁵⁵ Insofar as you’re not concerned about things like “psyching them out for the next game.”, that a move which forces your opponent to deliver stalemate is objectively better than resignation and cannot be improved upon unless you have enough material to deliver a helpmate, and that playing a move that’s part of a long forced mating sequence is no worse than any other option except insofar as you might screw up some later move.

But I also think it is possible for it to be a better idea to play a move that’s game-theoretically losing than to play a move that’s game-theoretically winning. Maybe the forced win is a long treacherous line that you’re probably going to screw up, and it starts with a queen sacrifice that means your opponent will get an easy win if you screw the line up. Maybe your opponent’s forced win involves strategies that no one would ever have a hope of seeing. The expected outcome of the game could easily just have almost nothing to do with what would happen under perfect play.

I would guess⁶⁶ It is only a guess. We simply cannot check which moves are which except in endgames—though you could try to stare at patterns in endgames and maybe make a better guess about the dynamics for midgames and openings. that cases where a losing move is better than a winning move are more the exception than the rule, at least in high-level play, but I am pretty sure they’re not a pathological exception—and I’m not sure that them being about the same quality is even very exceptional⁷⁷ Though it could be the case that in most positions that show up, the “reasonable” moves are mostly, if they are split, only split between winning and drawing or between drawing and losing. That would sort of make sense.. And if you also consider moves from drawn positions to losses, I don’t think it’s even very uncommon for those to beat ones that keep the game drawish.

Humanity’s chess meta ended up the way it did for like, reasons, and of course Elo and engine position evaluations do basically work most of the time. I am confident that an ASI which obviously just trounces humanity is possible in principle, and even if I’m not too confident about exactly what conditions might bring such a thing about, I would be surprised (and probably rather relieved, if the argument isn't “something terrible happens before humanity gets far enough to make one”) to see a compelling reason to be confident that it won’t happen for at least several decades.

It’s just that I’m rather fond of perverse chess strategies and pathological metagames.

(I arbitrarily posit for the sake of this blog post despite mostly thinking it unlikely.)
↩
Indeed, you could (in principle) memorize a line or set of lines that wins against any given chess engine, so long as it doesn’t randomize its move choices in order to make such a rigidly-tailored strategy impractical… or end up nondeterministic due to varying computational power and such.
In principle, it could avoid that trap even while deterministic by simply being good enough to draw against perfect play—but probably modern engines are nowhere near that good. Though even if they were, it would be hard to prove—you can’t exactly use process of elimination to show that it loses to none of the possible lines. For now, though, I expect you could eventually construct one, even if it’s really hard to do so.
↩
Tessa had no idea what’s involved in the Ethiopian, not even whether it’s short for the Ethiopian opening, the Ethiopian variation, the Ethiopian defense, or some other term—she had only heard the curious fact that Mr. Humman didn’t play his favorite opening against Mr. Assi, which one of the two other Skewers residents Mr. Assi was playing had noticed by looking over at Mr. Humman’s board. (Not that this person knew many openings very well, but they had studied responses to the initial moves that Mr. Humman most often played against them, even though they only rarely practice openings.)
↩
Or I guess positions are Wins, Draws, and Losses, but moves go from a winning position to another winning position, a win to a draw, a win to a loss, a draw to a draw, a draw to a loss, or a loss to a loss.
You could also say that winning faster is more winninger, and losing slower is less losinger, and that minmaxing on that is best. Minmaxing doesn't actually get you your win as fast as possible or get you your loss as slow as possible unless your opponent is also minmaxing the same criterion, but it is nonetheless a thing you could say. (It’s not even an unreasonable thing to say, mate-in-n is totally a useful concept, I just don’t think it’s a perfectly satisfactory concept of objective move quality if you start applying it to like, midgames.)
↩
Insofar as you’re not concerned about things like “psyching them out for the next game.”
↩
It is only a guess. We simply cannot check which moves are which except in endgames—though you could try to stare at patterns in endgames and maybe make a better guess about the dynamics for midgames and openings.
↩
Though it could be the case that in most positions that show up, the “reasonable” moves are mostly, if they are split, only split between winning and drawing or between drawing and losing. That would sort of make sense.
↩