From 0431c5c3b084cd080bb5476287c1eef8c5c18767 Mon Sep 17 00:00:00 2001 From: Diego Marvid Date: Tue, 5 Mar 2024 17:46:52 +0000 Subject: [PATCH] add simple readme --- README.md | 172 +++++++++++++++++++++--------------------------------- 1 file changed, 68 insertions(+), 104 deletions(-) diff --git a/README.md b/README.md index 38ab043..2520d00 100644 --- a/README.md +++ b/README.md @@ -1,104 +1,68 @@ -# Repo Template - -Kick off a project with the right foot. - -A repository template for easily setting up a well behaved development environment for a smooth -collaboration experience. - -This template takes care of setting up and configuring: - -- A **virtual environment** -- **Formatting and linting** tools -- Some shared default **VSCode settings** -- A **Pull Request template** -- A **GitHub Action** that runs formatting and linting checks - -Any of these configurations and features can be disabled/modified freely after set up if the team -chooses to. - -Note: [pyenv](https://github.com/pyenv/pyenv#installation) and -[poetry](https://python-poetry.org/docs/#installation) are used for setting up a virtual environment -with the correct python version. Make sure both of those are installed correctly in your machine. - -# Usage - -1. Click the `Use this template` button at the top of this repo's home page to spawn a new repo - from this template. - -2. Clone the new repo to your local environment. - -3. Run `sh init.sh `. - - Note that: - - - the project's accepted python versions will be set to `^` - feel free - to change this manually in the `pyproject.toml` file after running the script. - - your project's source code should be placed in the newly-created folder with your project's - name, so that absolute imports (`from my_project.my_module import func`) work everywhere. - -4. Nuke this readme and the `init.sh` file. - -5. Add to git the changes made by the init script, such as the newly created `poetry.toml`, - `poetry.lock` and `.python-version` files. - -6. Commit and push your changes - your project is all set up. - -7. [Recommended] Set up the following in your GitHub project's `Settings` tab: - - Enable branch protection for the `main` branch in the `Branches` menu to prevent non-reviewed - pushes/merges to it. - - Enable `Automatically delete head branches` in the `General` tab for feature branches to be - cleaned up when merged. - -# For ongoing projects - -If you want to improve the current configs of an existing project, these files are the ones you'll -probably want to steal some content from: - -- [VSCode settings](.vscode/settings.json) -- [Flake8 config](.flake8) -- [Black and iSort configs](pyproject.toml) -- [Style check GitHub Action](.github/workflows/style-checks.yaml) - -Additionally, you might want to check the -[project's source code is correctly installed via Poetry](https://stackoverflow.com/questions/66586856/how-can-i-make-my-project-available-in-the-poetry-environment) -for intra-project imports to work as expected across the board. - -# For developers of this template - -To test new changes made to this template: - -1. Run the template in test mode with `test=true sh init.sh `, - which will not delete the [project_base/test.py](project_base/test.py) file from the source - directory. - -2. Use that file to check everything works as expected (see details in its docstring). - -3. Make sure not to version any of the files created by the script. `git reset --hard` + manually - deleting the created files not yet added to versioning works, for example. - -# Issues and suggestions - -Feel free to report issues or propose improvements to this template via GitHub issues or through the -`#team-tech-meta` channel in Slack. - -# Can I use it without Poetry? - -This template currently sets up your virtual environment via poetry only. - -If you want to use a different dependency manager, you'll have to manually do the following: - -1. Remove the `.venv` environment and the `pyproject.toml` and `poetry.lock` files. -2. Create a new environment with your dependency manager of choice. -3. Install flake, black and isort as dev dependencies. -4. Install the current project's source. -5. Set the path to your new environment's python in the `python.pythonPath` and - `python.defaultInterpreterPath` in [vscode settings](.vscode/settings.json). - -Disclaimer: this has not been tested, additional steps may be needed. - -# Troubleshooting - -### pyenv not picking up correct python version from .python-version - -Make sure the `PYENV_VERSION` env var isn't set in your current shell -(and if it is, run `unset PYENV_VERSION`). +# Pipeline Library + +The purpose of this library is to create pipelines for ML as simple as possible. At the moment we support XGBoost models, but we are working to support more models. + +This is an example of how to use the library to run an XGBoost pipeline: + +```json +{ + "custom_steps_path": "examples/ocf/", + "save_path": "runs/xgboost_train.pkl", + "pipeline": { + "name": "XGBoostTrainingPipeline", + "description": "Training pipeline for XGBoost models.", + "steps": [ + { + "step_type": "OCFGenerateStep", + "parameters": { + "path": "examples/ocf/data/trainset_new.parquet" + } + }, + { + "step_type": "OCFCleanStep", + "parameters": {} + }, + { + "step_type": "TabularSplitStep", + "parameters": { + "id_column": "ss_id", + "train_percentage": 0.95 + } + }, + { + "step_type": "XGBoostFitModelStep", + "parameters": { + "target": "average_power_kw", + "drop_columns": [ + "ss_id" + ], + "xgb_params": { + "max_depth": 12, + "eta": 0.12410097733370863, + "objective": "reg:squarederror", + "eval_metric": "mae", + "n_jobs": -1, + "n_estimators": 2, + "min_child_weight": 7, + "subsample": 0.8057743223537057, + "colsample_bytree": 0.6316852278944352 + }, + "save_model": true + } + } + ] + } +} +``` + +The user can define custom steps to generate and clean their own data and use them in the pipeline. Then we can run the pipeline with the following code: + +```python +import logging + +from pipeline_lib.core import Pipeline + +logging.basicConfig(level=logging.INFO) + +Pipeline.from_json(json_path).run() +``` \ No newline at end of file