It’s a fundamental limitation of how LLMs work. They simply don’t understand how to follow a set of rules in the same way as a traditional computer/game is programmed.
Imagine you have only long-term memory that you can’t add to. You might get a few sentences of short-term memory before you’ve forgetting the context of the beginning of the conversation.
Then add on the fact that chess is very much a forward-thinking game and LLMs don’t stand a chance against other methods. It’s the classic case of “When all you have is a hammer, everything looks like a nail.” LLMs can be a great tool, but they can’t be your only tool.
I’ve been waiting for them to make this improvement since they were first introduced. Any day now…