Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week1+2 #1

Merged
merged 33 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
b2391e7
include data & setup uv
VanessaSchl Nov 3, 2024
de94ff6
add feature engineering
Nov 13, 2024
4eeff1c
fix code error in data_processor file
VanessaSchl Nov 13, 2024
6e98d3d
adapt code to work with hotel reservations module
VanessaSchl Nov 15, 2024
a68fb5a
adapt target in project config and notebooks
VanessaSchl Nov 15, 2024
3aa9392
change model to xgboost & adapt results plot
VanessaSchl Nov 19, 2024
3f194c8
remove scaler step from model evaluation function
VanessaSchl Nov 19, 2024
234a6cc
add updated whl file to notebooks folder
VanessaSchl Nov 19, 2024
52b7dfb
add seaborn to project requirements
VanessaSchl Nov 19, 2024
4a95137
update model & config params
VanessaSchl Nov 19, 2024
87ba370
update module imports in notebooks
VanessaSchl Nov 19, 2024
b459321
update access to config params in notebooks
VanessaSchl Nov 19, 2024
c554f24
fix bug in notebooks & reformat code
VanessaSchl Nov 19, 2024
b5dbd0c
fix bug in spark dataframe drop function
VanessaSchl Nov 19, 2024
c7e247b
adapt access of project config variables
VanessaSchl Nov 19, 2024
1aa0b8a
adapt access of project config variables in preprocessing notebook
VanessaSchl Nov 19, 2024
4e7e555
add type annotations in hotel reservations module
VanessaSchl Nov 19, 2024
ba8ecea
adapt type annotations for features
VanessaSchl Nov 19, 2024
1cf50e8
ensure that pandas get_dummies receives right input format
VanessaSchl Nov 19, 2024
76f1ac3
adapt project config file
VanessaSchl Nov 19, 2024
4b62fc4
fix bug in data processor
VanessaSchl Nov 19, 2024
c264855
adapt output of preprocessing fit function
VanessaSchl Nov 20, 2024
80aad60
adapt model evaluation function
VanessaSchl Nov 20, 2024
106c874
remove spark session from data processor class init
VanessaSchl Nov 20, 2024
a210c64
fix bug in feature table name
VanessaSchl Nov 20, 2024
7a4ed76
insert booking id into feature table
VanessaSchl Nov 20, 2024
bb18e6d
adapt features to look up in feature table
VanessaSchl Nov 20, 2024
a85ad37
replace target name in feature table
VanessaSchl Nov 20, 2024
1635e4f
add input bindings to feature function
VanessaSchl Nov 20, 2024
e6eb316
add indentation for model registration
VanessaSchl Nov 20, 2024
6b7aaa7
manually set signature for logged model
VanessaSchl Nov 21, 2024
9c6afac
reformat code
VanessaSchl Nov 21, 2024
19c069c
fix errors detected in pre-commit hook
VanessaSchl Nov 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @end-to-end-mlops-databricks/teachers @VanessaSchl
* @end-to-end-mlops-databricks/teachers @VanessaSchl
50 changes: 25 additions & 25 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
name: CI

on:
pull_request:
branches:
- main

jobs:
build_and_test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v3

- name: Set up Python
run: uv python install 3.11

- name: Install the dependencies
run: uv sync

- name: Run pre-commit checks
run: |
pre-commit run --all-files
name: CI
on:
pull_request:
branches:
- main
jobs:
build_and_test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
- name: Set up Python
run: uv python install 3.11
- name: Install the dependencies
run: uv sync
- name: Run pre-commit checks
run: |
pre-commit run --all-files
196 changes: 99 additions & 97 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,97 +1,99 @@
# OS X
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Folders
# data/

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Pycharm Project Settings
.idea/

# Rope project settings
.ropeproject

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# R project files
**.Rproj
*.utf8.md
*.knit.md
.Rproj.user

# VS code configuration
.vscode
.history
# OS X
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Folders
# data/

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Pycharm Project Settings
.idea/

# Rope project settings
.ropeproject

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# R project files
**.Rproj
*.utf8.md
*.knit.md
.Rproj.user

# VS code configuration
.vscode
.history

.databricks
34 changes: 17 additions & 17 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
exclude: ^tests/resources/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: check-json
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
- id: ruff-format
exclude: ^tests/resources/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: check-json
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
- id: ruff-format
52 changes: 26 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
<h1 align="center">
Marvelous MLOps End-to-end MLOps with Databricks course

## Practical information
- Weekly lectures on Wednesdays 16:00-18:00 CET.
- Code for the lecture is shared before the lecture.
- Presentation and lecture materials are shared right after the lecture.
- Video of the lecture is uploaded within 24 hours after the lecture.

- Every week we set up a deliverable, and you implement it with your own dataset.
- To submit the deliverable, create a feature branch in that repository, and a PR to main branch. The code can be merged after we review & approve & CI pipeline runs successfully.
- The deliverables can be submitted with a delay (for example, lecture 1 & 2 together), but we expect you to finish all assignments for the course before the 25th of November.


## Set up your environment
In this course, we use Databricks 15.4 LTS runtime, which uses Python 3.11.
In our examples, we use UV. Check out the documentation on how to install it: https://docs.astral.sh/uv/getting-started/installation/

To create a new environment and create a lockfile, run:

```
uv venv -p 3.11.0 venv
source venv/bin/activate
uv pip install -r pyproject.toml --all-extras
uv lock
```
<h1 align="center">
Marvelous MLOps End-to-end MLOps with Databricks course
## Practical information
- Weekly lectures on Wednesdays 16:00-18:00 CET.
- Code for the lecture is shared before the lecture.
- Presentation and lecture materials are shared right after the lecture.
- Video of the lecture is uploaded within 24 hours after the lecture.
- Every week we set up a deliverable, and you implement it with your own dataset.
- To submit the deliverable, create a feature branch in that repository, and a PR to main branch. The code can be merged after we review & approve & CI pipeline runs successfully.
- The deliverables can be submitted with a delay (for example, lecture 1 & 2 together), but we expect you to finish all assignments for the course before the 25th of November.
## Set up your environment
In this course, we use Databricks 15.4 LTS runtime, which uses Python 3.11.
In our examples, we use UV. Check out the documentation on how to install it: https://docs.astral.sh/uv/getting-started/installation/
To create a new environment and create a lockfile, run:
```
uv venv -p 3.11.0 venv
source venv/bin/activate
uv pip install -r pyproject.toml --all-extras
uv lock
```
Loading
Loading