Skip to content

Commit aa670ec

Browse files
committed
add simple readme
1 parent 3e63edf commit aa670ec

File tree

1 file changed

+70
-104
lines changed

1 file changed

+70
-104
lines changed

README.md

Lines changed: 70 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -1,104 +1,70 @@
1-
# Repo Template
2-
3-
Kick off a project with the right foot.
4-
5-
A repository template for easily setting up a well behaved development environment for a smooth
6-
collaboration experience.
7-
8-
This template takes care of setting up and configuring:
9-
10-
- A **virtual environment**
11-
- **Formatting and linting** tools
12-
- Some shared default **VSCode settings**
13-
- A **Pull Request template**
14-
- A **GitHub Action** that runs formatting and linting checks
15-
16-
Any of these configurations and features can be disabled/modified freely after set up if the team
17-
chooses to.
18-
19-
Note: [pyenv](https://github.com/pyenv/pyenv#installation) and
20-
[poetry](https://python-poetry.org/docs/#installation) are used for setting up a virtual environment
21-
with the correct python version. Make sure both of those are installed correctly in your machine.
22-
23-
# Usage
24-
25-
1. Click the `Use this template` button at the top of this repo's home page to spawn a new repo
26-
from this template.
27-
28-
2. Clone the new repo to your local environment.
29-
30-
3. Run `sh init.sh <your_project_name> <python version>`.
31-
32-
Note that:
33-
34-
- the project's accepted python versions will be set to `^<python version>` - feel free
35-
to change this manually in the `pyproject.toml` file after running the script.
36-
- your project's source code should be placed in the newly-created folder with your project's
37-
name, so that absolute imports (`from my_project.my_module import func`) work everywhere.
38-
39-
4. Nuke this readme and the `init.sh` file.
40-
41-
5. Add to git the changes made by the init script, such as the newly created `poetry.toml`,
42-
`poetry.lock` and `.python-version` files.
43-
44-
6. Commit and push your changes - your project is all set up.
45-
46-
7. [Recommended] Set up the following in your GitHub project's `Settings` tab:
47-
- Enable branch protection for the `main` branch in the `Branches` menu to prevent non-reviewed
48-
pushes/merges to it.
49-
- Enable `Automatically delete head branches` in the `General` tab for feature branches to be
50-
cleaned up when merged.
51-
52-
# For ongoing projects
53-
54-
If you want to improve the current configs of an existing project, these files are the ones you'll
55-
probably want to steal some content from:
56-
57-
- [VSCode settings](.vscode/settings.json)
58-
- [Flake8 config](.flake8)
59-
- [Black and iSort configs](pyproject.toml)
60-
- [Style check GitHub Action](.github/workflows/style-checks.yaml)
61-
62-
Additionally, you might want to check the
63-
[project's source code is correctly installed via Poetry](https://stackoverflow.com/questions/66586856/how-can-i-make-my-project-available-in-the-poetry-environment)
64-
for intra-project imports to work as expected across the board.
65-
66-
# For developers of this template
67-
68-
To test new changes made to this template:
69-
70-
1. Run the template in test mode with `test=true sh init.sh <your_project_name> <python version>`,
71-
which will not delete the [project_base/test.py](project_base/test.py) file from the source
72-
directory.
73-
74-
2. Use that file to check everything works as expected (see details in its docstring).
75-
76-
3. Make sure not to version any of the files created by the script. `git reset --hard` + manually
77-
deleting the created files not yet added to versioning works, for example.
78-
79-
# Issues and suggestions
80-
81-
Feel free to report issues or propose improvements to this template via GitHub issues or through the
82-
`#team-tech-meta` channel in Slack.
83-
84-
# Can I use it without Poetry?
85-
86-
This template currently sets up your virtual environment via poetry only.
87-
88-
If you want to use a different dependency manager, you'll have to manually do the following:
89-
90-
1. Remove the `.venv` environment and the `pyproject.toml` and `poetry.lock` files.
91-
2. Create a new environment with your dependency manager of choice.
92-
3. Install flake, black and isort as dev dependencies.
93-
4. Install the current project's source.
94-
5. Set the path to your new environment's python in the `python.pythonPath` and
95-
`python.defaultInterpreterPath` in [vscode settings](.vscode/settings.json).
96-
97-
Disclaimer: this has not been tested, additional steps may be needed.
98-
99-
# Troubleshooting
100-
101-
### pyenv not picking up correct python version from .python-version
102-
103-
Make sure the `PYENV_VERSION` env var isn't set in your current shell
104-
(and if it is, run `unset PYENV_VERSION`).
1+
# Pipeline Library
2+
3+
The purpose of this library is to create pipelines for ML as simple as possible. At the moment we support XGBoost models, but we are working to support more models.
4+
5+
This is an example of how to use the library to run an XGBoost pipeline:
6+
7+
We create a `train.json` file with the following content:
8+
9+
```json
10+
{
11+
"custom_steps_path": "examples/ocf/",
12+
"save_path": "runs/xgboost_train.pkl",
13+
"pipeline": {
14+
"name": "XGBoostTrainingPipeline",
15+
"description": "Training pipeline for XGBoost models.",
16+
"steps": [
17+
{
18+
"step_type": "OCFGenerateStep",
19+
"parameters": {
20+
"path": "examples/ocf/data/trainset_new.parquet"
21+
}
22+
},
23+
{
24+
"step_type": "OCFCleanStep",
25+
"parameters": {}
26+
},
27+
{
28+
"step_type": "TabularSplitStep",
29+
"parameters": {
30+
"id_column": "ss_id",
31+
"train_percentage": 0.95
32+
}
33+
},
34+
{
35+
"step_type": "XGBoostFitModelStep",
36+
"parameters": {
37+
"target": "average_power_kw",
38+
"drop_columns": [
39+
"ss_id"
40+
],
41+
"xgb_params": {
42+
"max_depth": 12,
43+
"eta": 0.12410097733370863,
44+
"objective": "reg:squarederror",
45+
"eval_metric": "mae",
46+
"n_jobs": -1,
47+
"n_estimators": 2,
48+
"min_child_weight": 7,
49+
"subsample": 0.8057743223537057,
50+
"colsample_bytree": 0.6316852278944352
51+
},
52+
"save_model": true
53+
}
54+
}
55+
]
56+
}
57+
}
58+
```
59+
60+
The user can define custom steps to generate and clean their own data and use them in the pipeline. Then we can run the pipeline with the following code:
61+
62+
```python
63+
import logging
64+
65+
from pipeline_lib.core import Pipeline
66+
67+
logging.basicConfig(level=logging.INFO)
68+
69+
Pipeline.from_json("train.json").run()
70+
```

0 commit comments

Comments
 (0)