Bug: Condition for omitting parameter upates

In `regressor._optimize_coefficients`  the following lines can be found:
```python
    if network.count_parameters()[0] == 0:
        return
```
Note, that `count_parameters()[0]` gives the actual amount of trainable parameters for the max alpha operations at that time.

Maybe this is meant to save computational effort, when all relevant Operations (according the alphas at the given time) do not contain parameters. However, it also is a deviation from the core DARTS algorithm. And a bad one in my eyes. In case of bad luck, either during training or even at initialization, it can now happen, that the alpha of operations does not increase, due to having bad coefficients at that time, and the coefficients of this operation not being fixed, because the alpha is so bad. I did a little testing with some arbitrary equations and on a small sample size, giving the following 95%-confidence interval on the train loss:

![Image](https://github.com/user-attachments/assets/3f3cdb9d-329f-4108-a930-b1359f09deee)

Looking into basically the same run (same init, same dataset, seeded everything) with only changing the if statement above, either of three cases seems to occur:

### 1. No difference, there is always at least one Operation with trainable parameters and high alpha:

![Image](https://github.com/user-attachments/assets/8f5b8a09-454b-4b0e-a2f0-2f6d680ebcc7)

### 2. Stochasticity of training hits, but no performance difference:
Possibly only one timestep without update (at init) and then the runs deviate.

![Image](https://github.com/user-attachments/assets/9eade437-72d7-4e29-9f14-32e2e1766a78)

### 3. Updates until end consistently outperforms
E.g.

![Image](https://github.com/user-attachments/assets/93cce998-91a5-4161-8cd8-b47375a8b6cc)

tldr: The tiny performance increase is not worth it and the lines should be removed. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Condition for omitting parameter upates #33

1. No difference, there is always at least one Operation with trainable parameters and high alpha:

2. Stochasticity of training hits, but no performance difference:

3. Updates until end consistently outperforms

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Condition for omitting parameter upates #33

Description

1. No difference, there is always at least one Operation with trainable parameters and high alpha:

2. Stochasticity of training hits, but no performance difference:

3. Updates until end consistently outperforms

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions