REF: Major refactor of 0.3.0 version #82
Conversation
Hyperopt/HPSklearn usage was removed from the modeling flow. Dependency definitions were updated to drop hpsklearn-compneurobilbao. CLI/UI/help paths were aligned to the new registry-based model/scaler handling. Tests were updated where needed. Compatibility fix for RMSE computation was added so it works with the current sklearn API.
Added a new UI service layer: ui_services.py. Moved/refactored these concerns out of Interface into service helpers: Storage dict initialization. Feature dataframe slicing by subject/covariate/system. Runtime dimension updates from flags. Model/classifier construction from args. Rewired Interface in ui.py to delegate to those helpers. Added constructor defaults in Interface for runtime dimensions (naming, subject_types, covars, systems) to keep non-command test/use paths robust.
Moved reusable IO logic to ui/data.py. Refactored Interface data methods in ui.py to delegate to those helpers while keeping current behavior/messages.
Faster, easier, lighter
Adapted test to be a bit less of an edge case, leading to ambiguous orderings
There was a problem hiding this comment.
Pull request overview
This PR is a broad 0.3.0 refactor focused on modernizing the Python stack (uv/setuptools), improving modularity, and migrating the codebase from pandas to polars while centralizing model/scaler registration.
Changes:
- Migrates tabular operations from
pandastopolarsacross core code and tests. - Introduces centralized registries for models/scalers/metrics and updates UI/CLI to query them.
- Splits UI orchestration into dedicated
ui.data(load/validate) andui.services(construction/shaping) helpers, and updates tooling/CI for uv + Python 3.12.
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_ageml/test_visualizer.py | Updates fixture to support NaN removal for both pandas- and polars-like inputs. |
| tests/test_ageml/test_utils.py | Switches test data to polars and tightens exception-type assertions. |
| tests/test_ageml/test_ui.py | Migrates UI tests to polars, adds helper for editing cells, and updates registry-driven error strings. |
| tests/test_ageml/test_processing.py | Updates processing tests for polars outputs and revises correlation expectations/assertions. |
| tests/test_ageml/test_modelling.py | Aligns modelling tests with registry-based error messages and removal of hyperopt. |
| tests/test_ageml/test_commands.py | Migrates CLI command tests to polars CSV writing. |
| src/ageml/visualizer.py | Adds matplotlib compatibility for boxplot label argument changes. |
| src/ageml/utils.py | Updates feature_extractor to support polars and exclude id from features. |
| src/ageml/ui/services.py | New module for UI orchestration helpers (storage dicts, dataframe shaping, model/classifier builders). |
| src/ageml/ui/data.py | New module for CSV loading plus schema/consistency validation in polars. |
| src/ageml/ui/init.py | Refactors Interface to use new UI helper modules, polars dataframes, and registries. |
| src/ageml/registries.py | Adds centralized registries for models/scalers/metrics with default registrations. |
| src/ageml/processing.py | Migrates summary dataframe generation to polars. |
| src/ageml/modelling.py | Removes hyperopt + class-level dicts; uses registries and keeps a single sklearn pipeline path. |
| src/ageml/messages.py | Updates CLI help strings to use registries instead of AgeML.*_dict. |
| src/ageml/datasets/toy_features.csv | Normalizes the first column header to index for CSV loaders. |
| src/ageml/datasets/toy_factors.csv | Normalizes the first column header to index for CSV loaders. |
| src/ageml/datasets/toy_covar.csv | Normalizes the first column header to index for CSV loaders. |
| src/ageml/datasets/toy_clinical.csv | Normalizes index column and boolean casing for polars CSV parsing. |
| src/ageml/datasets/synthetic_data.py | Migrates synthetic dataset generation/loading/saving from pandas to polars. |
| src/ageml/commands.py | Reuses shared argument-parsing helpers for CLI parsing (no duplicated key=value parsing). |
| src/ageml/argument_parsing.py | New module for shared parsing of named params + hyperparameter definitions. |
| pyproject.toml | Migrates packaging from Poetry to setuptools and declares uv-style dependency groups. |
| noxfile.py | Switches nox sessions to use uv sync / uv run. |
| docs/CONTRIBUTING.md | Updates contributor workflow to uv and revises repository structure docs. |
| README.md | Adds a uv-based developer quickstart section. |
| .gitignore | Updates lockfile guidance comments for uv. |
| .github/workflows/lint_test_coverage.yml | Adds Python 3.12 to CI matrix and replaces Poetry install with uv. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with pytest.raises(ValueError) as exc_info: | ||
| processing.find_correlations(X, Y) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError |
| with pytest.raises(ValueError) as exc_info: | ||
| processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1)) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
| with pytest.raises(ValueError) as exc_info: | ||
| processing.covariate_correction(np.array([2.0, np.nan]), Z) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
| with pytest.raises(ValueError) as exc_info: | ||
| processing.covariate_correction(X, Z, beta=np.array([2.0, np.nan]).reshape(-1, 1)) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
|
|
||
| # Check ValueError raises with incompatible shapes | ||
| with pytest.raises(ValueError) as exc_info: | ||
| processing.covariate_correction(X, np.array([1, 2])) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError |
| with pytest.raises(ValueError) as exc_info: | ||
| processing.CVMetricsHandler(task_type='asdf') | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
| assert str(exc_info.value) == 'task_type must be either "regression" or "classification"' |
| @@ -222,11 +222,13 @@ def get_summary_dataframe(self) -> pd.DataFrame: | |||
| for split in ['train', 'test']: | |||
| for metric in metrics: | |||
| for stat, value in summary[split][metric].items(): | |||
| if stat == '95ci' and isinstance(value, tuple): | |||
| value = str(value) | |||
| data.append({ | |||
| 'split': split, | |||
| 'metric': metric, | |||
| 'statistic': stat, | |||
| 'value': value | |||
| }) | |||
|
|
|||
| return pd.DataFrame(data) No newline at end of file | |||
| return pl.DataFrame(data) No newline at end of file | |||
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Minor typos also corrected
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 31 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
src/ageml/ui/init.py:1543
df_factorsis filtered usingis_in(...), which preserves the original order ofself.df_factorsand may not match the row order ofdf_sub. Sincefactors_vs_deltasassumes factors/covariates/deltas are aligned row-wise, this can produce incorrect correlations. Align factors todf_subby joining onid(preservingdf_suborder) or sorting both frames byidbefore converting to NumPy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if all(conditions): | ||
| hyperparam_types = AgeML.model_hyperparameter_types[self.model_type] | ||
| hyperparam_types = model_info['hyperparameter_types'] | ||
| invalid_hyperparams = [param for param in self.hyperparameter_params.keys() if param not in hyperparam_types.keys()] | ||
| if len(invalid_hyperparams) > 0: | ||
| raise ValueError(f"Hyperparameter(s) {invalid_hyperparams} not available for the selected model '{self.model_type}'.") |
There was a problem hiding this comment.
parse_hyperparameter_params (and the user-facing error message) allows categorical values (e.g. param=a,b,c), but set_hyperparameter_grid relies on hyperparameter_params matching numeric ranges/types from the registry. As-is, categorical values will later cause an unpacking error when building the grid. Consider either rejecting non-numeric hyperparameter definitions up-front (with a clear error) or extending the grid builder to handle categorical lists.
| with pytest.raises(ValueError) as exc_info: | ||
| processing.find_correlations(X, Y) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError |
There was a problem hiding this comment.
The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the call that raises, so it never executes. Unindent the assert (or remove it) so the test actually verifies the exception type.
| assert exc_info.type is ValueError | |
| assert exc_info.type is ValueError |
| # Check ValueError raies with NaNs | ||
| with pytest.raises(ValueError) as exc_info: | ||
| processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1)) | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
| with pytest.raises(ValueError) as exc_info: |
There was a problem hiding this comment.
The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the exception-raising call, so it is unreachable. Move the type assertion outside the with block (same issue repeats in the subsequent pytest.raises blocks in this test).
| with pytest.raises(ValueError) as exc_info: | ||
| processing.CVMetricsHandler(task_type='asdf') | ||
| assert exc_info.type == ValueError | ||
| assert exc_info.type is ValueError | ||
| assert str(exc_info.value) == 'task_type must be either "regression" or "classification"' |
There was a problem hiding this comment.
The assert exc_info.type is ValueError is placed inside the with pytest.raises(...) block after the line that raises, so it will never run. Unindent it so the test actually checks the captured exception.
| "Model type and model parameters to use. First argument is the type and the following \n" | ||
| "arguments are input as keyword arguments into the model. They must be seperated by an '='.\n" | ||
| "Example: -m linear_reg fit_intercept=False normalize=True\n" | ||
| f"Available Types: {list(AgeML.model_dict.keys())} (Default: linear_reg)" | ||
| f"Available Types: {ModelRegistry.list_models()} (Default: linear_reg)" |
There was a problem hiding this comment.
Typo in help text: "seperated" should be "separated".
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Applied in 5d0eb69. The help-text typo in src/ageml/messages.py was corrected from seperated to separated.
Agent-Logs-Url: https://github.com/compneurobilbao/ageml/sessions/cb99170e-ce65-47a6-b67d-40b23c461091 Co-authored-by: itellaetxe <92918634+itellaetxe@users.noreply.github.com>
DEL: Removes trash '__main__.py' module
Major refactor was needed to:
Summary of changes
pkg_resourcesregistry. Parameters and their ranges and their ranges are now added in a more robust way.pandasusage topolars.utilsmodule.poetrytouv.uvsystem.