pii-removal-edge-deployment

End-to-end workflow for generating synthetic PII-containing healthcare text, fine-tuning a small language model (SLM) with LoRA adapters for PII removal / redaction, and preparing an optimized GGUF artifact for edge / resource-constrained deployment.

Project Overview

This repository demonstrates a three-stage pipeline:

Synthetic Data Generation (notebooks/step_1_synthetic_data_generation.ipynb): Creates structured healthcare-style text with embedded PII (names, dates, MRNs, etc.).
Model Fine-Tuning (notebooks/step_2_slm_finetuning.ipynb): Applies LoRA to adapt a small base model for PII detection / transformation.
Edge Deployment (notebooks/step_3_slm_edge_deployment.ipynb): Exports / converts the model to GGUF and provides patterns for lightweight inference.

Repository Structure

data/
  synthetic_data/           # Train/val/test synthetic JSON/JSONL artifacts
  lora_finetuned_model/     # LoRA adapter + tokenizer assets (LFS tracked)
  gguf_model/               # Exported / quantized GGUF model & tokenizer (LFS tracked)
notebooks/                  # Three sequential workflow notebooks

Quick Start (Local)

Clone the repository and ensure Git LFS is installed so large model artifacts pull correctly:

git clone https://github.com/superlinear-ai/pii-removal-edge-deployment.git
cd pii-removal-edge-deployment
git lfs install
git lfs pull

Running the Notebooks on Google Colab

Prerequisites

Install Git LFS locally if cloning first.

Steps

Clone (or download) the repo locally (optional if you upload files manually):

git clone https://github.com/superlinear-ai/pii-removal-edge-deployment.git
cd pii-removal-edge-deployment
git lfs install
git lfs pull

Open Google Colab.
Upload the desired notebook from notebooks/ (File -> Upload notebook).
If the notebook requires the model artifacts, either:
- Upload the needed subfolders from data/ manually, or
- Add a cell to git clone the repo (Colab runtime) and run git lfs install && git lfs pull.

Git LFS Usage

Large model & adapter artifacts (*.bin, *.pt, *.safetensors, *.gguf) and dataset JSONL files are tracked via patterns in .gitattributes.

Key commands:

git lfs install          # One-time per machine
git lfs track "*.gguf"   # Example: track new pattern
git add .gitattributes
git add path/to/large_file.gguf
git commit -m "Add new quantized model"

To verify LFS pointers:

git lfs ls-files

Adding New Data or Models

Place raw large files under an appropriate subfolder in data/.
Ensure pattern is in .gitattributes (edit if necessary).
Commit via LFS (see above).

Contributing

PRs welcome for improvements, tooling, reproducibility, and inference examples.

License

See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pii-removal-edge-deployment

Project Overview

Repository Structure

Quick Start (Local)

Running the Notebooks on Google Colab

Prerequisites

Steps

Git LFS Usage

Adding New Data or Models

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

superlinear-ai/pii-removal-edge-deployment

Folders and files

Latest commit

History

Repository files navigation

pii-removal-edge-deployment

Project Overview

Repository Structure

Quick Start (Local)

Running the Notebooks on Google Colab

Prerequisites

Steps

Git LFS Usage

Adding New Data or Models

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages