Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SGLCV fails with too few observations for CV #73

Open
jonas-hag opened this issue May 4, 2022 · 1 comment
Open

SGLCV fails with too few observations for CV #73

jonas-hag opened this issue May 4, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@jonas-hag
Copy link

Describe the bug

When there are too few observations for the CV, SGLCV fails with an uninformative UnboundLocalError. This happens with groupyr 0.2.6, if I recall correctly I didn't have the problem with 0.2.4

Steps/Code to Reproduce

import numpy as np
import groupyr as gr

y = np.array([8.35686197e-01, 7.79143707e-01, 9.68885893e-01, 6.00364059e-01,
 8.90818433e-01, 4.50071502e-01, 5.50324868e-04, 3.23702083e-01,
 3.26413651e-01])
X = np.array([[0.95834536, 0.24640152, 0.91383425, 0.36952137],
 [0.18028435, 0.34682591, 0.43773007, 0.7074315],
 [0.54305304, 0.55150522,0.03017366, 0.07321698],
 [0.49662785, 0.17114838, 0.61342598, 0.15094963],
 [0.66625233, 0.38015984, 0.51422898, 0.66124242],
 [0.95193769, 0.10298654, 0.03773045, 0.21904723],
 [0.34889582, 0.04983091, 0.13862843, 0.23390294],
 [0.05570983, 0.65507907, 0.74365214, 0.99539654],
 [0.01563651, 0.75173544, 0.56747472, 0.31385082]]
)
l1_ratio = 0.0008299164840661392
groups = [np.array([0, 1]), np.array([2, 3])]

model = gr.SGLCV(
            l1_ratio=l1_ratio,
            groups=groups,
            scale_l2_by="group_length",
            cv=5,
            random_state=1234
        ).fit(X=X, y=y)

Expected Results

A clear error message why it didn't work.

Actual Results

/path/to/lib/python3.8/site-packages/sklearn/metrics/_regression.py:796: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
[the UndefinedMetricWarning is repeated several times]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/path/to/lib/python3.8/site-packages/groupyr/sgl.py", line 1120, in fit
    self.l1_ratio_ = best_l1_ratio
UnboundLocalError: local variable 'best_l1_ratio' referenced before assignment

However, if I use less folds, it works:

model = gr.SGLCV(
            l1_ratio=l1_ratio,
            groups=groups,
            scale_l2_by="group_length",
            cv=3,
            random_state=1234
        ).fit(X=X, y=y)

Comment

I think the error is because one fold only has 1 observation which I guess leads to a wrong R^2 metric and later on to some uncaught errors in the groupyr code. I'm not well versed with scikit-learn, so I don't know if a fix would be better in the scikit-learn code or in groupyr. However, it would be nice to get an informative error message instead of an error due to groupyr internals.

Versions

groupyr 0.2.6
scikit-learn 1.0.2
scikit-optimize 0.9.0
@jonas-hag jonas-hag added the bug Something isn't working label May 4, 2022
@welcome
Copy link

welcome bot commented May 4, 2022

👋 Thanks for opening your first issue here! We appreciate your help making groupyr better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant