Predictive lead scoring system combining rule-based heuristics with ML models (Logistic Regression + Random Forest) to rank B2B leads by conversion probability.
- ML Model: Logistic Regression with 80%+ AUC
- Explainability: SHAP-based per-lead explanations
- Action Tiers: Hot/Warm/Nurture/Suppress with SLA recommendations
- Portfolio Dashboard: Interactive Streamlit app
- Production API: FastAPI with CSV batch scoring
- GCP Deployment: Cloud Run with weekly Cloud Scheduler
lead_scoring/
│
├── data/
│ ├── raw/ # Original source files — never modified
│ └── processed/ # Feature matrix and cleaned datasets
│
├── notebooks/ # One notebook per stage, numbered sequentially
│ ├── 01_eda.ipynb
│ ├── 02_feature_engineering.ipynb
│ ├── 03_baseline_scoring.ipynb
│ ├── 04_modeling.ipynb
│ └── 05_output_layer.ipynb
│
├── src/ # Importable Python modules
│ ├── features/ # Feature engineering logic
│ ├── models/ # Model training and calibration
│ ├── evaluation/ # Metrics, calibration plots, lift curves
│ └── utils/ # Shared helpers (I/O, logging, config)
│
├── outputs/
│ ├── reports/ # EDA report, model comparison, final scorecard
│ ├── figures/ # Saved charts (referenced in reports)
│ ├── models/ # Serialised model artefacts (.pkl)
│ └── scores/ # Scored lead files per run
│
├── docs/ # Decisions, methodology, feature dictionary
│ ├── decisions.md
│ └── feature_dictionary.md
│
├── app/ # NEW: FastAPI application
│ ├── core/
│ │ └── scorer.py # Core scoring logic
│ └── main.py # FastAPI endpoints
│
├── dashboard/ # NEW: Streamlit portfolio showcase
│ └── app.py
│
├── scripts/
│ └── train_model.py # Model training & serialization
│
├── config.py # Central config: paths, thresholds, hyperparams
├── cloudbuild.yaml # GCP CI/CD
├── deploy.sh # Deployment script
├── docker-compose.yml # Local development
├── Dockerfile # Container config
└── README.md
| Stage | Notebook | Status |
|---|---|---|
| 1 — EDA & Data Audit | 01_eda.ipynb |
✅ Complete |
| 2 — Feature Engineering | 02_feature_engineering.ipynb |
✅ Complete |
| 3 — Rule-Based Baseline | 03_baseline_scoring.ipynb |
✅ Complete |
| 4 — Predictive Modeling | 04_modeling.ipynb |
✅ Complete |
| 5 — Output & Action Layer | 05_output_layer.ipynb |
✅ Complete |
# Install dependencies
uv sync
# Train the model
uv run python scripts/train_model.py
# Run FastAPI (http://localhost:8000)
uv run uvicorn app.main:app --reload
# Run Streamlit Dashboard (http://localhost:8501)
uv run streamlit run dashboard/app.py
# Or use Docker Compose
docker-compose up# Health check
curl http://localhost:8000/health
# Score single lead
curl -X POST http://localhost:8000/score/lead \
-H "Content-Type: application/json" \
-d '{"lead_id": "lead_001", "features": {"high_intent_touch_count": 5, "is_decision_maker": 1}}'
# Batch score from CSV
curl -X POST http://localhost:8000/score/batch \
-F "file=@new_leads.csv"- Install gcloud CLI
- Create a GCP project
- Enable billing (within free tier limits)
# Set your project
gcloud config set project YOUR_PROJECT_ID
# Run deployment script
./deploy.shThis deploys:
- API:
https://lead-scoring-api-xxx-uc.a.run.app - Dashboard:
https://lead-scoring-dashboard-xxx-uc.a.run.app
gcloud scheduler jobs create http lead-scoring-weekly \
--schedule="0 9 * * 1" \
--uri="YOUR_API_URL/score/batch" \
--http-method=POST \
--time-zone="America/New_York"- Algorithm: Logistic Regression with StandardScaler
- Features: 33 behavioral, firmographic, and engagement signals
- Performance: AUC ~0.83, Top-20% captures 50%+ conversions
- Explainability: SHAP LinearExplainer for exact attributions