REF: Major refactor of 0.3.0 version by itellaetxe · Pull Request #82 · compneurobilbao/ageml

itellaetxe · 2026-03-16T15:00:23Z

Major refactor was needed to:

Clean up the messy code
Improve the modularity of the code
Speed things up (poetry is slow)
Pandas is slow and unintuitive compared to polars

Summary of changes

Removed Hyperopt dependency as the upstream repo was only giving problems with the deprecated pkg_resources
Simplifies model registration with the registry. Parameters and their ranges and their ranges are now added in a more robust way.
Migrated all pandas usage to polars.
Moved all I/O helpers and utils to a separate module.
Implemented better UI orchestration integrating it with the separated utils module.
Migrated poetry to uv.
Added python 3.12 as a supported version.
Adapted all tests and test management to the new uv system.

Hyperopt/HPSklearn usage was removed from the modeling flow. Dependency definitions were updated to drop hpsklearn-compneurobilbao. CLI/UI/help paths were aligned to the new registry-based model/scaler handling. Tests were updated where needed. Compatibility fix for RMSE computation was added so it works with the current sklearn API.

Added a new UI service layer: ui_services.py. Moved/refactored these concerns out of Interface into service helpers: Storage dict initialization. Feature dataframe slicing by subject/covariate/system. Runtime dimension updates from flags. Model/classifier construction from args. Rewired Interface in ui.py to delegate to those helpers. Added constructor defaults in Interface for runtime dimensions (naming, subject_types, covars, systems) to keep non-command test/use paths robust.

Moved reusable IO logic to ui/data.py. Refactored Interface data methods in ui.py to delegate to those helpers while keeping current behavior/messages.

Faster, easier, lighter

Adapted test to be a bit less of an edge case, leading to ambiguous orderings

Copilot

Pull request overview

This PR is a broad 0.3.0 refactor focused on modernizing the Python stack (uv/setuptools), improving modularity, and migrating the codebase from pandas to polars while centralizing model/scaler registration.

Changes:

Migrates tabular operations from pandas to polars across core code and tests.
Introduces centralized registries for models/scalers/metrics and updates UI/CLI to query them.
Splits UI orchestration into dedicated ui.data (load/validate) and ui.services (construction/shaping) helpers, and updates tooling/CI for uv + Python 3.12.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/test_ageml/test_visualizer.py	Updates fixture to support NaN removal for both pandas- and polars-like inputs.
tests/test_ageml/test_utils.py	Switches test data to polars and tightens exception-type assertions.
tests/test_ageml/test_ui.py	Migrates UI tests to polars, adds helper for editing cells, and updates registry-driven error strings.
tests/test_ageml/test_processing.py	Updates processing tests for polars outputs and revises correlation expectations/assertions.
tests/test_ageml/test_modelling.py	Aligns modelling tests with registry-based error messages and removal of hyperopt.
tests/test_ageml/test_commands.py	Migrates CLI command tests to polars CSV writing.
src/ageml/visualizer.py	Adds matplotlib compatibility for `boxplot` label argument changes.
src/ageml/utils.py	Updates `feature_extractor` to support polars and exclude `id` from features.
src/ageml/ui/services.py	New module for UI orchestration helpers (storage dicts, dataframe shaping, model/classifier builders).
src/ageml/ui/data.py	New module for CSV loading plus schema/consistency validation in polars.
src/ageml/ui/init.py	Refactors Interface to use new UI helper modules, polars dataframes, and registries.
src/ageml/registries.py	Adds centralized registries for models/scalers/metrics with default registrations.
src/ageml/processing.py	Migrates summary dataframe generation to polars.
src/ageml/modelling.py	Removes hyperopt + class-level dicts; uses registries and keeps a single sklearn pipeline path.
src/ageml/messages.py	Updates CLI help strings to use registries instead of `AgeML.*_dict`.
src/ageml/datasets/toy_features.csv	Normalizes the first column header to `index` for CSV loaders.
src/ageml/datasets/toy_factors.csv	Normalizes the first column header to `index` for CSV loaders.
src/ageml/datasets/toy_covar.csv	Normalizes the first column header to `index` for CSV loaders.
src/ageml/datasets/toy_clinical.csv	Normalizes `index` column and boolean casing for polars CSV parsing.
src/ageml/datasets/synthetic_data.py	Migrates synthetic dataset generation/loading/saving from pandas to polars.
src/ageml/commands.py	Reuses shared argument-parsing helpers for CLI parsing (no duplicated key=value parsing).
src/ageml/argument_parsing.py	New module for shared parsing of named params + hyperparameter definitions.
pyproject.toml	Migrates packaging from Poetry to setuptools and declares uv-style dependency groups.
noxfile.py	Switches nox sessions to use `uv sync` / `uv run`.
docs/CONTRIBUTING.md	Updates contributor workflow to uv and revises repository structure docs.
README.md	Adds a uv-based developer quickstart section.
.gitignore	Updates lockfile guidance comments for uv.
.github/workflows/lint_test_coverage.yml	Adds Python 3.12 to CI matrix and replaces Poetry install with uv.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    with pytest.raises(ValueError) as exc_info:
        processing.find_correlations(X, Y)
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError


    with pytest.raises(ValueError) as exc_info:
        processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1))
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError
    with pytest.raises(ValueError) as exc_info:
        processing.covariate_correction(np.array([2.0, np.nan]), Z)
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError
    with pytest.raises(ValueError) as exc_info:
        processing.covariate_correction(X, Z, beta=np.array([2.0, np.nan]).reshape(-1, 1))
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError

    # Check ValueError raises with incompatible shapes
    with pytest.raises(ValueError) as exc_info:
        processing.covariate_correction(X, np.array([1, 2]))
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError


    with pytest.raises(ValueError) as exc_info:
        processing.CVMetricsHandler(task_type='asdf')
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError
    assert str(exc_info.value) == 'task_type must be either "regression" or "classification"'


@@ -222,11 +222,13 @@ def get_summary_dataframe(self) -> pd.DataFrame:
        for split in ['train', 'test']:
            for metric in metrics:
                for stat, value in summary[split][metric].items():
+                    if stat == '95ci' and isinstance(value, tuple):
+                        value = str(value)
                    data.append({
                        'split': split,
                        'metric': metric,
                        'statistic': stat,
                        'value': value
                    })

-        return pd.DataFrame(data)
+        return pl.DataFrame(data)


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…ow tracking

Minor typos also corrected

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 28 out of 31 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/ageml/ui/init.py:1543

df_factors is filtered using is_in(...), which preserves the original order of self.df_factors and may not match the row order of df_sub. Since factors_vs_deltas assumes factors/covariates/deltas are aligned row-wise, this can produce incorrect correlations. Align factors to df_sub by joining on id (preserving df_sub order) or sorting both frames by id before converting to NumPy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T10:13:06Z

        if all(conditions):
-            hyperparam_types = AgeML.model_hyperparameter_types[self.model_type]
+            hyperparam_types = model_info['hyperparameter_types']
            invalid_hyperparams = [param for param in self.hyperparameter_params.keys() if param not in hyperparam_types.keys()]
            if len(invalid_hyperparams) > 0:
                raise ValueError(f"Hyperparameter(s) {invalid_hyperparams} not available for the selected model '{self.model_type}'.")


parse_hyperparameter_params (and the user-facing error message) allows categorical values (e.g. param=a,b,c), but set_hyperparameter_grid relies on hyperparameter_params matching numeric ranges/types from the registry. As-is, categorical values will later cause an unpacking error when building the grid. Consider either rejecting non-numeric hyperparameter definitions up-front (with a clear error) or extending the grid builder to handle categorical lists.

Copilot · 2026-04-22T10:13:07Z

    with pytest.raises(ValueError) as exc_info:
        processing.find_correlations(X, Y)
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError


The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the call that raises, so it never executes. Unindent the assert (or remove it) so the test actually verifies the exception type.

Suggested change

assert exc_info.type is ValueError

assert exc_info.type is ValueError

Copilot · 2026-04-22T10:13:07Z

    # Check ValueError raies with NaNs
    with pytest.raises(ValueError) as exc_info:
        processing.covariate_correction(X, np.array([1, 2, np.nan]).reshape(-1, 1))
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError
    with pytest.raises(ValueError) as exc_info:


The assert exc_info.type is ValueError line is inside the with pytest.raises(...) block after the exception-raising call, so it is unreachable. Move the type assertion outside the with block (same issue repeats in the subsequent pytest.raises blocks in this test).

Copilot · 2026-04-22T10:13:07Z

    with pytest.raises(ValueError) as exc_info:
        processing.CVMetricsHandler(task_type='asdf')
-        assert exc_info.type == ValueError
+        assert exc_info.type is ValueError
    assert str(exc_info.value) == 'task_type must be either "regression" or "classification"'


The assert exc_info.type is ValueError is placed inside the with pytest.raises(...) block after the line that raises, so it will never run. Unindent it so the test actually checks the captured exception.

Copilot · 2026-04-22T10:13:08Z

    "Model type and model parameters to use. First argument is the type and the following \n"
    "arguments are input as keyword arguments into the model. They must be seperated by an '='.\n"
    "Example: -m linear_reg fit_intercept=False normalize=True\n"
-    f"Available Types: {list(AgeML.model_dict.keys())} (Default: linear_reg)"
+    f"Available Types: {ModelRegistry.list_models()} (Default: linear_reg)"


Typo in help text: "seperated" should be "separated".

@copilot apply changes based on this feedback

Applied in 5d0eb69. The help-text typo in src/ageml/messages.py was corrected from seperated to separated.

Agent-Logs-Url: https://github.com/compneurobilbao/ageml/sessions/cb99170e-ce65-47a6-b67d-40b23c461091 Co-authored-by: itellaetxe <92918634+itellaetxe@users.noreply.github.com>

DEL: Removes trash '__main__.py' module

itellaetxe added 5 commits March 16, 2026 12:32

REF: Moved IO modules to helpers, delegated them

454f44b

Moved reusable IO logic to ui/data.py. Refactored Interface data methods in ui.py to delegate to those helpers while keeping current behavior/messages.

REF/DEP: Migrated pandas to polars for DataFramemanagement

82a9e28

REF: Migrated from poetry to uv

e18fe82

Faster, easier, lighter

This was referenced Mar 16, 2026

Hyperopt Libary causing problems #80

Closed

Extend new AgeML commands with ordering, feature handling, optimization, and refactoring #81

Open

TST: Fixed correlations test edge case

b4a8579

Adapted test to be a bit less of an edge case, leading to ambiguous orderings

itellaetxe requested a review from Copilot March 17, 2026 12:20

Copilot started reviewing on behalf of itellaetxe March 17, 2026 12:20 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

itellaetxe and others added 4 commits April 8, 2026 15:28

Apply suggestions from code review

1b30aba

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

DOC: Updates input file specification to include 'id' as column for r…

a17c100

…ow tracking

REF/FIX: Continues refactoring towards polars

80a3743

Minor typos also corrected

DEP: uv.lock added substituting poetry.lock

495cdc4

itellaetxe requested a review from Copilot April 21, 2026 14:56

Copilot started reviewing on behalf of itellaetxe April 21, 2026 14:57 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

itellaetxe requested a review from Copilot April 22, 2026 10:05

Copilot started reviewing on behalf of itellaetxe April 22, 2026 10:05 View session

This was referenced Apr 22, 2026

Unreliable indexing without participant ids #66

Closed

Log more precise command used #69

Closed

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Copilot started work on behalf of itellaetxe April 22, 2026 10:15 View session

fix: correct separated typos in CLI help text

5d0eb69

Agent-Logs-Url: https://github.com/compneurobilbao/ageml/sessions/cb99170e-ce65-47a6-b67d-40b23c461091 Co-authored-by: itellaetxe <92918634+itellaetxe@users.noreply.github.com>

Copilot finished work on behalf of itellaetxe April 22, 2026 10:23

Copilot started work on behalf of itellaetxe April 22, 2026 10:23 View session

FIX: Ensure proper alignment of factors and ages in factor correlation

228a8b8

Copilot finished work on behalf of itellaetxe April 22, 2026 10:27

ENH: Adds given arguments to primary log to improve reproducibility

a672405

ENH: Adds AGEML_VERSION for reproducibility

35e57b9

DEL: Removes trash '__main__.py' module

itellaetxe merged commit ac14ffc into main Apr 22, 2026
4 checks passed

itellaetxe deleted the refactor_0.3.0 branch April 22, 2026 11:13

itellaetxe mentioned this pull request Apr 22, 2026

Data handling #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: Major refactor of 0.3.0 version #82

REF: Major refactor of 0.3.0 version #82
itellaetxe merged 14 commits intomainfrom
refactor_0.3.0

itellaetxe commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

itellaetxe Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	assert exc_info.type is ValueError
	assert exc_info.type is ValueError

Conversation

itellaetxe commented Mar 16, 2026

Major refactor was needed to:

Summary of changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

itellaetxe Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants