- Azure Databricks Environment Setup
- Select dataset Kaggle - Credit Card Fraud Detection Dataset 2023
- Run python notebooks in databricks cluster for fraud_credit_cards usecase
- Create "DataProcessor" and "FraudModel" classes
- Push data.csv to databricks volume
- Push package.whl to databricks volume
- Create main.py to preprocess data, train model, and evaluate model
- Fix pre-commit checks
In this course, we use Databricks 15.4 LTS runtime, which uses Python 3.11. In our examples, we use UV. Check out the documentation on how to install it: https://docs.astral.sh/uv/getting-started/installation/
uv venv -p 3.11.11 .venv
source .venv/bin/activate
uv pip install -r pyproject.toml --all-extras
uv lock
uv build
To install and run fraud_credit_cards package
uv pip install dist/fraud_credit_cards-0.0.1-py3-none-any.whl
uv run python main.py
To run pre-commit checks
uv run pre-commit run --all-files