Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogluon palf #40

Open
wants to merge 29 commits into
base: autogluon
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a67faf4
Update python version, xgboost, autogluon, explainerdashboard, mlflow…
Ludecan Jan 24, 2025
c1d6779
Check the fit step for being a child class of `ModelStep` instead of …
Ludecan Jan 24, 2025
e4639c9
Implement AutoGluonModelStep and AutoGluonModel classes
Ludecan Jan 24, 2025
ba88cce
Add quantile regression as a possible task
Ludecan Jan 24, 2025
4df5fa3
Register the AutoGluonModel model class
Ludecan Jan 24, 2025
b1dfa45
Add autogluon example config for the ames problem
Ludecan Jan 24, 2025
87f393b
Add ames housing example data
Ludecan Jan 24, 2025
e51937e
add ruff
diegomarvid Sep 20, 2024
c54685a
fixing ruff issues
diegomarvid Nov 1, 2024
2431563
add dependencies for doc coverage
diegomarvid Nov 1, 2024
4c460fc
fix tests
diegomarvid Nov 1, 2024
d8c5d73
update gh action
diegomarvid Nov 1, 2024
5638160
remove doc coverage
diegomarvid Nov 1, 2024
adeb81b
add sebita's skip level docstrings
diegomarvid Nov 1, 2024
14ba8b5
Update pandas, scikit-learn and optuna-dashboard. Add ipywidgets for …
Ludecan Jan 26, 2025
927c1ad
Add feature on readme
Ludecan Jan 26, 2025
4e957db
Remove manual AutoGluonModel class, Marvel had already implemented it
Ludecan Jan 26, 2025
c6c2c56
Fix a bug in the encoder where integer columns could incorrectly resu…
Ludecan Jan 26, 2025
39b7f9b
Update AutoGluon model class to support create and fit parameters
Ludecan Jan 26, 2025
90a55c6
Set test set size to 20%. The 10% size we had was causing instability…
Ludecan Jan 26, 2025
c2fac18
Update AutoGluon params
Ludecan Jan 26, 2025
2c5059c
Update libs because of dependabot detected vulnerabilities
Ludecan Jan 26, 2025
b35dd9c
Merge remote-tracking branch 'origin/main' into autogluon_palf
Ludecan Jan 26, 2025
840fdf6
Update ruff
Ludecan Jan 26, 2025
307c3a7
Simplify cells of the run_interactive script
Ludecan Jan 26, 2025
3f96ef9
Update github actions. Specify python version
Ludecan Jan 26, 2025
48ff95e
Fix ruff errors
Ludecan Jan 26, 2025
28e335a
Fix ruff format
Ludecan Jan 26, 2025
fcc3f52
Fix several pytests
Ludecan Jan 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions .github/workflows/style-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: 3.11

- name: Install Poetry
uses: snok/install-poetry@v1
Expand All @@ -21,20 +23,20 @@ jobs:
# Allow loading a cached venv created in a previous run if the lock file is identical
- name: Load cached venv if it exists
id: venv-cache
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock', '**/pyproject.toml') }}

- name: Install dependencies
if: steps.venv-cache.outputs.cache-hit != 'true'
run: poetry install --no-interaction
run: poetry install --no-interaction --only dev

- name: Check format with black
run: poetry run black --check .
- name: Check linting with Ruff
run: poetry run ruff check

- name: Check style with flake8
run: poetry run flake8 .
- name: Check format with Ruff
run: poetry run ruff format --check

- name: Check import sorting with isort
run: poetry run isort --check .
- name: Check docstring coverage
run: poetry run docstr-coverage ./**/*.py --fail-under 20 --verbose=2 --skip-file-doc
8 changes: 5 additions & 3 deletions .github/workflows/unit-testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: 3.11

- name: Install Poetry
uses: snok/install-poetry@v1
Expand All @@ -21,7 +23,7 @@ jobs:
# Allow loading a cached venv created in a previous run if the lock file is identical
- name: Load cached venv if it exists
id: venv-cache
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock', '**/pyproject.toml') }}
Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ __pycache__/

*.joblib
*.bin
*.json
*.pkl

# Autogluon
Expand All @@ -18,8 +17,6 @@ AutogluonModels/
mlruns/
runs/

examples/

# Distribution / packaging
.Python
build/
Expand Down
8 changes: 3 additions & 5 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
{
"recommendations": [
"ms-python.python",
"ms-python.isort",
"ms-python.flake8",
"ms-python.black-formatter",
"njpwerner.autodocstring",
"charliermarsh.ruff",
"njpwerner.autodocstring"
]
}
}
40 changes: 10 additions & 30 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,6 @@
{
//
// Set correct python path to venv's one
//
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
//`
// Very optional: type checking. Remove the line if your project doesn't really use or respect
// type hints. You should give it a try, though. They're great.
//
"python.analysis.typeCheckingMode": "basic",
//
// Hide .venv from explorer and searchbar
//
"python.analysis.typeCheckingMode": "off",
"files.watcherExclude": {
"**/.venv/**": true,
"**/__pycache__/**": true
Expand All @@ -23,40 +13,30 @@
"**/.venv/": true,
"**/__pycache__/**": true
},
//
// Linting and formatting
//
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"black-formatter.importStrategy": "fromEnvironment",
"isort.importStrategy": "fromEnvironment",
"flake8.importStrategy": "fromEnvironment",
"isort.args": [
"--settings-path",
"${workspaceFolder}/pyproject.toml"
],
"flake8.args": [
"--config=${workspaceFolder}/.flake8"
],
"editor.rulers": [
100 // if changing line length, also do it in .flake8 and pyproject.toml's [tool.black] section
100
],
"editor.wordWrapColumn": 100,
"files.trimFinalNewlines": true,
"files.trimTrailingWhitespace": true,
//
// Jupyter
//
"jupyter.notebookFileRoot": "${workspaceFolder}",
"jupyter.interactiveWindow.textEditor.executeSelection": true,
// TODO: this setting is showing a deprecation warning. Maybe we should drop it?
"jupyter.generateSVGPlots": true,
"autoDocstring.docstringFormat": "numpy",
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff"
},
"ruff.organizeImports": true,
"ruff.fixAll": true,
"ruff.importStrategy": "fromEnvironment",
"ruff.lint.run": "onSave"
}
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The key components of the pipeline include Pipeline Steps, which are predefined
- Evaluation metrics calculation and reporting
- Explainable AI (XAI) dashboard for model interpretability
- Extensible architecture for adding custom pipeline steps
- MLOps best practices for ensuring consistent results between training and serving

## Installation

Expand Down
4 changes: 2 additions & 2 deletions examples/ames_housing/configs/1_ames_housing_baseline.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
{
"step_type": "TabularSplitStep",
"parameters": {
"train_percentage": 0.7,
"train_percentage": 0.6,
"validation_percentage": 0.2,
"test_percentage": 0.1
"test_percentage": 0.2
}
},
{
Expand Down
4 changes: 2 additions & 2 deletions examples/ames_housing/configs/2_ames_housing_hp_tuning.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
{
"step_type": "TabularSplitStep",
"parameters": {
"train_percentage": 0.7,
"train_percentage": 0.6,
"validation_percentage": 0.2,
"test_percentage": 0.1
"test_percentage": 0.2
}
},
{
Expand Down
16 changes: 8 additions & 8 deletions examples/ames_housing/configs/3_ames_housing_hp_tuned.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
{
"step_type": "TabularSplitStep",
"parameters": {
"train_percentage": 0.7,
"train_percentage": 0.6,
"validation_percentage": 0.2,
"test_percentage": 0.1
"test_percentage": 0.2
}
},
{
Expand All @@ -45,12 +45,12 @@
"eval_metric": "rmse",
"tree_method": "hist",
"early_stopping_rounds": 20,
"max_depth": 15,
"eta": 0.08311222976823307,
"n_estimators": 374,
"min_child_weight": 6,
"subsample": 0.5272883435658126,
"colsample_bytree": 0.946222179438676
"max_depth": 5,
"eta": 0.15805002999964826,
"n_estimators": 1019,
"min_child_weight": 3,
"subsample": 0.8807043595486204,
"colsample_bytree": 0.8754815170751743
}
}
},
Expand Down
70 changes: 70 additions & 0 deletions examples/ames_housing/configs/4_ames_housing_autogluon.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"pipeline": {
"name": "XGBoostTrainingPipeline",
"description": "Training pipeline for XGBoost models.",
"parameters": {
"save_data_path": "ames_housing.pkl",
"target": "SalePrice",
"task": "regression",
"tracking": {
"experiment": "ames_housing",
"run": "AutoGluon"
}
},
"steps": [
{
"step_type": "GenerateStep",
"parameters": {
"train_path": "examples/ames_housing/data/train.csv",
"predict_path": "examples/ames_housing/data/test.csv",
"drop_columns": [
"Id"
],
"optimize_dtypes": true
}
},
{
"step_type": "TabularSplitStep",
"parameters": {
"train_percentage": 0.6,
"validation_percentage": 0.2,
"test_percentage": 0.2
}
},
{
"step_type": "CleanStep"
},
{
"step_type": "EncodeStep"
},
{
"step_type": "AutoGluonModelStep",
"parameters": {
"model_class": "AutoGluon",
"autogluon_create_params": {
"verbosity": 2
},
"autogluon_fit_params": {
"presets": [
"high_quality",
"optimize_for_deployment"
],
"save_bag_folds": true,
"time_limit": 1800,
"num_stack_levels": 1,
"dynamic_stacking": false
}
}
},
{
"step_type": "CalculateMetricsStep"
},
{
"step_type": "ExplainerDashboardStep",
"parameters": {
"enable_step": false
}
}
]
}
}
Loading
Loading