-
Notifications
You must be signed in to change notification settings - Fork 0
Description
In regressor._optimize_coefficients the following lines can be found:
if network.count_parameters()[0] == 0:
returnNote, that count_parameters()[0] gives the actual amount of trainable parameters for the max alpha operations at that time.
Maybe this is meant to save computational effort, when all relevant Operations (according the alphas at the given time) do not contain parameters. However, it also is a deviation from the core DARTS algorithm. And a bad one in my eyes. In case of bad luck, either during training or even at initialization, it can now happen, that the alpha of operations does not increase, due to having bad coefficients at that time, and the coefficients of this operation not being fixed, because the alpha is so bad. I did a little testing with some arbitrary equations and on a small sample size, giving the following 95%-confidence interval on the train loss:
Looking into basically the same run (same init, same dataset, seeded everything) with only changing the if statement above, either of three cases seems to occur:
1. No difference, there is always at least one Operation with trainable parameters and high alpha:
2. Stochasticity of training hits, but no performance difference:
Possibly only one timestep without update (at init) and then the runs deviate.
3. Updates until end consistently outperforms
E.g.
tldr: The tiny performance increase is not worth it and the lines should be removed.



