Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework validation for measurements and pending_experiments #456

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Scienfitz
Copy link
Collaborator

@Scienfitz Scienfitz commented Jan 3, 2025

Fixes #453

Notes re measurements validation

  • Two utilities (one for parameters and one for targets) for input validation has been extracted. Additional validations for binary targets have been added. The utilities contain parts from add_measurements and fuzzy_row_match
  • As a result fuzzy_row_match does not perform any validation anymore
  • add_measurements now simply calls the utilities
  • tests for invalid parameter input have been extended

Notes re pending_experiments validation

  • The parameter input validation utility is now called as part of Bayesian recommenders, causing proper error messages instead of crypto and seemingly unrelated ones
  • A recommender based test for invalid values in pending_experiments has been added

Remaining problem:
The places of validation for measurements and pending_experiments are inconsistent. The former is done in the campaign and not in any recommender, the latter is not done in the campaign but done in the recommender. One of the issues is that numerical_measurements_must_be_within_tolerance is not available for recommenders, but its required for the parameter input validation.

Suggestion:
Imo we dont want to add more keywords to .recommend or so, hence the pending_experiments validation currently assumes numerical_measurements_must_be_within_tolerance=False and cannot be configured otherwise. To me it seems the simplest solution would be to completely get rid of any numerical_measurements_must_be_within_tolerance keywords and make it an environment variable, we wouldn't have to worry about its availability anymore and it would require adding it to more and more signatures.

@Scienfitz Scienfitz added the enhancement Expand / change existing functionality label Jan 3, 2025
@Scienfitz Scienfitz self-assigned this Jan 3, 2025
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Scienfitz, thanks the refactor 🏗️ Below my comments

baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Show resolved Hide resolved
tests/test_input_output.py Outdated Show resolved Hide resolved
tests/test_pending_experiments.py Outdated Show resolved Hide resolved
tests/test_pending_experiments.py Outdated Show resolved Hide resolved
tests/test_input_output.py Show resolved Hide resolved
@Scienfitz Scienfitz force-pushed the feature/exp_input_validation branch 3 times, most recently from 0a42492 to a412305 Compare January 6, 2025 12:36
Copy link
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

baybe/utils/dataframe.py Show resolved Hide resolved
baybe/utils/dataframe.py Show resolved Hide resolved
baybe/telemetry.py Show resolved Hide resolved
@Scienfitz Scienfitz force-pushed the feature/exp_input_validation branch from 7056db6 to 969dea4 Compare January 14, 2025 13:20
@Scienfitz Scienfitz requested a review from AdrianSosic January 14, 2025 13:21
@Scienfitz
Copy link
Collaborator Author

Scienfitz commented Feb 6, 2025

After a pause I reanalzyed the outstanding issues here:

  1. Validation should be brought to both the campaign and recommenders. This will ensure proper valdiation for measurements and pending_experiments for users using / not using campaings. There needs to be a mechanism how a campaign can tell the recommender not to perform any validation, otherwise validation would be repeated (it could also an option to simply accept that).

  2. The flag numerical_measurements_must_be_within_tolerance is only available in .add_measurements but for validating pending_experiments in a Campaign it also needs to be available in .recommend

Solution Proposal:

  • numerical_measurements_must_be_within_tolerance must become an attribute of the Campaign again. In this manner its available for both recommend and add_measurements. I would prefer that over including it in the recommend signature or so
  • RecommenderProtocol.recommend should get an additional kwarg called numerical_measurements_must_be_within_tolerance. This could be of type bool | None. If its True/False then the recommender would just validate measurements / pending_experiments with the respective flag. If its None, validation would be skipped, conveniently solving 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pending experiments with CustomDiscreteParameter value not in search space gives cryptic error
3 participants