Skip to content

Bug: Condition for omitting parameter upates #33

@lfrommelt

Description

@lfrommelt

In regressor._optimize_coefficients the following lines can be found:

    if network.count_parameters()[0] == 0:
        return

Note, that count_parameters()[0] gives the actual amount of trainable parameters for the max alpha operations at that time.

Maybe this is meant to save computational effort, when all relevant Operations (according the alphas at the given time) do not contain parameters. However, it also is a deviation from the core DARTS algorithm. And a bad one in my eyes. In case of bad luck, either during training or even at initialization, it can now happen, that the alpha of operations does not increase, due to having bad coefficients at that time, and the coefficients of this operation not being fixed, because the alpha is so bad. I did a little testing with some arbitrary equations and on a small sample size, giving the following 95%-confidence interval on the train loss:

Image

Looking into basically the same run (same init, same dataset, seeded everything) with only changing the if statement above, either of three cases seems to occur:

1. No difference, there is always at least one Operation with trainable parameters and high alpha:

Image

2. Stochasticity of training hits, but no performance difference:

Possibly only one timestep without update (at init) and then the runs deviate.

Image

3. Updates until end consistently outperforms

E.g.

Image

tldr: The tiny performance increase is not worth it and the lines should be removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions