Eval cache #992

lightvector · 2024-10-08T14:43:08Z

Experimental branch with an option (useEvalCache=true) that caches evals of nodes greater than a configurable number of visits (evalCacheMinVisits) and remembers them permanently for the duration of that running instance of KataGo, to bias subsequent searches of the same positions to converge to the correct answer faster.

Right now there is no way to clear this cache other than restarting KataGo, and there is no way to save the cache to disk in order to reuse it on a subsequent run. These are things that could be added if this were made into a proper feature.

kaorahi · 2024-10-12T16:35:38Z

This feature suggests a new style of whole-game analysis, at least for less powerful PCs. In the following example, doing "2sec scans three times" seems better than doing a "single 6sec scan".

Before getting into the main result, I'd like to mention the issue I'm focusing on. Here's black's winrate chart from a single 2sec scan for each move (please ignore the vertical bars).

The red and green segments show bad and good moves --- in other words, moves that lower or raise the player's winrate. Note that no green segments would appear if the AI were perfect. In this chart, you can see alternating red and green segments along a monotonically decreasing part of black's winrate. This makes it hard to tell which moves are bad and how bad they are. Generally, small green segments are OK, but long ones are problematic unless the move truly surpasses the AI.

The next charts show the results from the "single 6sec scan" (left), "2sec scans three times" (center), and "2sec scans three times without eval cache" (right).

The center chart highlights the bad moves more clearly than the left chart. While the right chart has a similar tendency, the center chart appears slightly clearer. Of course, KataGo was restarted for each chart, and GUI's analysis cache was disabled during this experiment.

lightvector · 2024-10-15T00:44:56Z

@kaorahi - Very cool. Is this phenomenon consistent across multiple different games? It would be interesting to know if one could also find other ways to try to objectively measure evaluation quality and compare to see if making use of the eval cache in this way improves the quality by those other measurements.

kaorahi · 2024-10-16T11:59:34Z

Sorry, this is just a quick initial report, and I don't have time for serious experiments right now. I'll report more results when I get them.

lightvector added 6 commits September 15, 2024 11:28

More hint gen options, avg log policy

eee8cdd

Add friendly pass ok into book and graph hash

e2f926e

Update tests for fpok hash

8434664

More conservative setting of forceNonTerminal for graphhash purposes

b6d5285

Track graph hash on nodes plus forceNonTerminal hack

0fe296e

Experimental eval cache implementation

cc95761

lightvector changed the title ~~Eval cache rebased~~ Eval cache Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval cache #992

Eval cache #992

lightvector commented Oct 8, 2024

kaorahi commented Oct 12, 2024

lightvector commented Oct 15, 2024

kaorahi commented Oct 16, 2024

Eval cache #992

Are you sure you want to change the base?

Eval cache #992

Conversation

lightvector commented Oct 8, 2024

kaorahi commented Oct 12, 2024

lightvector commented Oct 15, 2024

kaorahi commented Oct 16, 2024