Skip to content

REF: Major refactor of 0.3.0 version #82

Merged
itellaetxe merged 14 commits intomainfrom
refactor_0.3.0
Apr 22, 2026
Merged

REF: Major refactor of 0.3.0 version #82
itellaetxe merged 14 commits intomainfrom
refactor_0.3.0

Conversation

@itellaetxe
Copy link
Copy Markdown
Contributor

Major refactor was needed to:

  • Clean up the messy code
  • Improve the modularity of the code
  • Speed things up (poetry is slow)
  • Pandas is slow and unintuitive compared to polars

Summary of changes

  • Removed Hyperopt dependency as the upstream repo was only giving problems with the deprecated pkg_resources
  • Simplifies model registration with the registry. Parameters and their ranges and their ranges are now added in a more robust way.
  • Migrated all pandas usage to polars.
  • Moved all I/O helpers and utils to a separate module.
  • Implemented better UI orchestration integrating it with the separated utils module.
  • Migrated poetry to uv.
  • Added python 3.12 as a supported version.
  • Adapted all tests and test management to the new uv system.

Hyperopt/HPSklearn usage was removed from the modeling flow. Dependency definitions were updated to drop hpsklearn-compneurobilbao. CLI/UI/help paths were aligned to the new registry-based model/scaler handling. Tests were updated where needed. Compatibility fix for RMSE computation was added so it works with the current sklearn API.
Added a new UI service layer: ui_services.py. Moved/refactored these concerns out of Interface into service helpers: Storage dict initialization. Feature dataframe slicing by subject/covariate/system. Runtime dimension updates from flags. Model/classifier construction from args. Rewired Interface in ui.py to delegate to those helpers. Added constructor defaults in Interface for runtime dimensions (naming, subject_types, covars, systems) to keep non-command test/use paths robust.
Moved reusable IO logic to ui/data.py. Refactored Interface data methods in ui.py to delegate to those helpers while keeping current behavior/messages.
Faster, easier, lighter
Adapted test to be a bit less of an edge case, leading to ambiguous orderings
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a broad 0.3.0 refactor focused on modernizing the Python stack (uv/setuptools), improving modularity, and migrating the codebase from pandas to polars while centralizing model/scaler registration.

Changes:

  • Migrates tabular operations from pandas to polars across core code and tests.
  • Introduces centralized registries for models/scalers/metrics and updates UI/CLI to query them.
  • Splits UI orchestration into dedicated ui.data (load/validate) and ui.services (construction/shaping) helpers, and updates tooling/CI for uv + Python 3.12.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/test_ageml/test_visualizer.py Updates fixture to support NaN removal for both pandas- and polars-like inputs.
tests/test_ageml/test_utils.py Switches test data to polars and tightens exception-type assertions.
tests/test_ageml/test_ui.py Migrates UI tests to polars, adds helper for editing cells, and updates registry-driven error strings.
tests/test_ageml/test_processing.py Updates processing tests for polars outputs and revises correlation expectations/assertions.
tests/test_ageml/test_modelling.py Aligns modelling tests with registry-based error messages and removal of hyperopt.
tests/test_ageml/test_commands.py Migrates CLI command tests to polars CSV writing.
src/ageml/visualizer.py Adds matplotlib compatibility for boxplot label argument changes.
src/ageml/utils.py Updates feature_extractor to support polars and exclude id from features.
src/ageml/ui/services.py New module for UI orchestration helpers (storage dicts, dataframe shaping, model/classifier builders).
src/ageml/ui/data.py New module for CSV loading plus schema/consistency validation in polars.
src/ageml/ui/init.py Refactors Interface to use new UI helper modules, polars dataframes, and registries.
src/ageml/registries.py Adds centralized registries for models/scalers/metrics with default registrations.
src/ageml/processing.py Migrates summary dataframe generation to polars.
src/ageml/modelling.py Removes hyperopt + class-level dicts; uses registries and keeps a single sklearn pipeline path.
src/ageml/messages.py Updates CLI help strings to use registries instead of AgeML.*_dict.
src/ageml/datasets/toy_features.csv Normalizes the first column header to index for CSV loaders.
src/ageml/datasets/toy_factors.csv Normalizes the first column header to index for CSV loaders.
src/ageml/datasets/toy_covar.csv Normalizes the first column header to index for CSV loaders.
src/ageml/datasets/toy_clinical.csv Normalizes index column and boolean casing for polars CSV parsing.
src/ageml/datasets/synthetic_data.py Migrates synthetic dataset generation/loading/saving from pandas to polars.
src/ageml/commands.py Reuses shared argument-parsing helpers for CLI parsing (no duplicated key=value parsing).
src/ageml/argument_parsing.py New module for shared parsing of named params + hyperparameter definitions.
pyproject.toml Migrates packaging from Poetry to setuptools and declares uv-style dependency groups.
noxfile.py Switches nox sessions to use uv sync / uv run.
docs/CONTRIBUTING.md Updates contributor workflow to uv and revises repository structure docs.
README.md Adds a uv-based developer quickstart section.
.gitignore Updates lockfile guidance comments for uv.
.github/workflows/lint_test_coverage.yml Adds Python 3.12 to CI matrix and replaces Poetry install with uv.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

with pytest.raises(ValueError) as exc_info:
processing.find_correlations(X, Y)
assert exc_info.type == ValueError
assert exc_info.type is ValueError
Comment on lines 86 to +99
with pytest.raises(ValueError) as exc_info:
processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1))
assert exc_info.type == ValueError
assert exc_info.type is ValueError
with pytest.raises(ValueError) as exc_info:
processing.covariate_correction(np.array([2.0, np.nan]), Z)
assert exc_info.type == ValueError
assert exc_info.type is ValueError
with pytest.raises(ValueError) as exc_info:
processing.covariate_correction(X, Z, beta=np.array([2.0, np.nan]).reshape(-1, 1))
assert exc_info.type == ValueError
assert exc_info.type is ValueError

# Check ValueError raises with incompatible shapes
with pytest.raises(ValueError) as exc_info:
processing.covariate_correction(X, np.array([1, 2]))
assert exc_info.type == ValueError
assert exc_info.type is ValueError
Comment on lines 109 to 112
with pytest.raises(ValueError) as exc_info:
processing.CVMetricsHandler(task_type='asdf')
assert exc_info.type == ValueError
assert exc_info.type is ValueError
assert str(exc_info.value) == 'task_type must be either "regression" or "classification"'
Comment thread tests/test_ageml/test_processing.py
Comment thread docs/CONTRIBUTING.md Outdated
Comment thread src/ageml/ui/data.py Outdated
Comment thread src/ageml/processing.py
Comment on lines 214 to +234
@@ -222,11 +222,13 @@ def get_summary_dataframe(self) -> pd.DataFrame:
for split in ['train', 'test']:
for metric in metrics:
for stat, value in summary[split][metric].items():
if stat == '95ci' and isinstance(value, tuple):
value = str(value)
data.append({
'split': split,
'metric': metric,
'statistic': stat,
'value': value
})

return pd.DataFrame(data) No newline at end of file
return pl.DataFrame(data) No newline at end of file
itellaetxe and others added 4 commits April 8, 2026 15:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 31 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/ageml/ui/init.py:1543

  • df_factors is filtered using is_in(...), which preserves the original order of self.df_factors and may not match the row order of df_sub. Since factors_vs_deltas assumes factors/covariates/deltas are aligned row-wise, this can produce incorrect correlations. Align factors to df_sub by joining on id (preserving df_sub order) or sorting both frames by id before converting to NumPy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ageml/modelling.py
Comment on lines 116 to 120
if all(conditions):
hyperparam_types = AgeML.model_hyperparameter_types[self.model_type]
hyperparam_types = model_info['hyperparameter_types']
invalid_hyperparams = [param for param in self.hyperparameter_params.keys() if param not in hyperparam_types.keys()]
if len(invalid_hyperparams) > 0:
raise ValueError(f"Hyperparameter(s) {invalid_hyperparams} not available for the selected model '{self.model_type}'.")
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_hyperparameter_params (and the user-facing error message) allows categorical values (e.g. param=a,b,c), but set_hyperparameter_grid relies on hyperparameter_params matching numeric ranges/types from the registry. As-is, categorical values will later cause an unpacking error when building the grid. Consider either rejecting non-numeric hyperparameter definitions up-front (with a clear error) or extending the grid builder to handle categorical lists.

Copilot uses AI. Check for mistakes.
with pytest.raises(ValueError) as exc_info:
processing.find_correlations(X, Y)
assert exc_info.type == ValueError
assert exc_info.type is ValueError
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the call that raises, so it never executes. Unindent the assert (or remove it) so the test actually verifies the exception type.

Suggested change
assert exc_info.type is ValueError
assert exc_info.type is ValueError

Copilot uses AI. Check for mistakes.
Comment on lines 85 to 89
# Check ValueError raies with NaNs
with pytest.raises(ValueError) as exc_info:
processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1))
assert exc_info.type == ValueError
assert exc_info.type is ValueError
with pytest.raises(ValueError) as exc_info:
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the exception-raising call, so it is unreachable. Move the type assertion outside the with block (same issue repeats in the subsequent pytest.raises blocks in this test).

Copilot uses AI. Check for mistakes.
Comment on lines 109 to 112
with pytest.raises(ValueError) as exc_info:
processing.CVMetricsHandler(task_type='asdf')
assert exc_info.type == ValueError
assert exc_info.type is ValueError
assert str(exc_info.value) == 'task_type must be either "regression" or "classification"'
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert exc_info.type is ValueError is placed inside the with pytest.raises(...) block after the line that raises, so it will never run. Unindent it so the test actually checks the captured exception.

Copilot uses AI. Check for mistakes.
Comment thread src/ageml/messages.py
Comment on lines 25 to +28
"Model type and model parameters to use. First argument is the type and the following \n"
"arguments are input as keyword arguments into the model. They must be seperated by an '='.\n"
"Example: -m linear_reg fit_intercept=False normalize=True\n"
f"Available Types: {list(AgeML.model_dict.keys())} (Default: linear_reg)"
f"Available Types: {ModelRegistry.list_models()} (Default: linear_reg)"
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in help text: "seperated" should be "separated".

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 5d0eb69. The help-text typo in src/ageml/messages.py was corrected from seperated to separated.

Comment thread src/ageml/messages.py
Agent-Logs-Url: https://github.com/compneurobilbao/ageml/sessions/cb99170e-ce65-47a6-b67d-40b23c461091

Co-authored-by: itellaetxe <92918634+itellaetxe@users.noreply.github.com>
DEL: Removes trash '__main__.py' module
@itellaetxe itellaetxe merged commit ac14ffc into main Apr 22, 2026
4 checks passed
@itellaetxe itellaetxe deleted the refactor_0.3.0 branch April 22, 2026 11:13
@itellaetxe itellaetxe mentioned this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants