Skip to content

Selection: Don't trust initial visits

Videodr0me edited this page Jun 13, 2018 · 4 revisions

Often the value head is flip-flopping in capture sequences or sacrificial lines. In these cases a node might not get another visit for a long time (q to low) even though the next visit would overturn the previous assessment. If a sufficently larger NN absorbs all tactics this would probably cease to be an issue, but it is unclear if this is a realistic perspective. Also, such a large NN that absorbs all tactics would beg the question: Why search at all?

In the meantime i tried some simple mods to adress this. Basically parent_q is used not only for 0 visit nodes when deciding which node to expand (FPU), but also for nodes with more visits. This can be termed a "prior", as "averaging" or as a weighting between parent q and successive values backpropagated from the node in question. The Q for deciding which node to expand includes parent q as a (virtual additional) sample for n visits:

Variant Match Result vs. Standard Leela
all n +157 -192 =651 Win: 48.25% Elo: -12.17 LOS: 3.05%
n < 3 +178 -177 =645 Win: 50.05% Elo: 0.35 LOS: 52.12%
n < 2 +145 -211 =644 Win: 46.70% Elo: -22.96 LOS: 0.02%

(games=1000 visits=1000)

Clone this wiki locally