Skip to content

superlinear-ai/pii-removal-edge-deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pii-removal-edge-deployment

End-to-end workflow for generating synthetic PII-containing healthcare text, fine-tuning a small language model (SLM) with LoRA adapters for PII removal / redaction, and preparing an optimized GGUF artifact for edge / resource-constrained deployment.

Project Overview

This repository demonstrates a three-stage pipeline:

  1. Synthetic Data Generation (notebooks/step_1_synthetic_data_generation.ipynb): Creates structured healthcare-style text with embedded PII (names, dates, MRNs, etc.).
  2. Model Fine-Tuning (notebooks/step_2_slm_finetuning.ipynb): Applies LoRA to adapt a small base model for PII detection / transformation.
  3. Edge Deployment (notebooks/step_3_slm_edge_deployment.ipynb): Exports / converts the model to GGUF and provides patterns for lightweight inference.

Repository Structure

data/
  synthetic_data/           # Train/val/test synthetic JSON/JSONL artifacts
  lora_finetuned_model/     # LoRA adapter + tokenizer assets (LFS tracked)
  gguf_model/               # Exported / quantized GGUF model & tokenizer (LFS tracked)
notebooks/                  # Three sequential workflow notebooks

Quick Start (Local)

Clone the repository and ensure Git LFS is installed so large model artifacts pull correctly:

git clone https://github.com/superlinear-ai/pii-removal-edge-deployment.git
cd pii-removal-edge-deployment
git lfs install
git lfs pull

Running the Notebooks on Google Colab

Prerequisites

  • Install Git LFS locally if cloning first.

Steps

  1. Clone (or download) the repo locally (optional if you upload files manually):
    git clone https://github.com/superlinear-ai/pii-removal-edge-deployment.git
    cd pii-removal-edge-deployment
    git lfs install
    git lfs pull
  2. Open Google Colab.
  3. Upload the desired notebook from notebooks/ (File -> Upload notebook).
  4. If the notebook requires the model artifacts, either:
    • Upload the needed subfolders from data/ manually, or
    • Add a cell to git clone the repo (Colab runtime) and run git lfs install && git lfs pull.

Git LFS Usage

Large model & adapter artifacts (*.bin, *.pt, *.safetensors, *.gguf) and dataset JSONL files are tracked via patterns in .gitattributes.

Key commands:

git lfs install          # One-time per machine
git lfs track "*.gguf"   # Example: track new pattern
git add .gitattributes
git add path/to/large_file.gguf
git commit -m "Add new quantized model"

To verify LFS pointers:

git lfs ls-files

Adding New Data or Models

  1. Place raw large files under an appropriate subfolder in data/.
  2. Ensure pattern is in .gitattributes (edit if necessary).
  3. Commit via LFS (see above).

Contributing

PRs welcome for improvements, tooling, reproducibility, and inference examples.

License

See LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published