-
Inference Lead (Cost): Vedant
-
Infra Lead (Scalability & Latency/Throughput): Arnav Bharti
-
Data Lead: Vinayak
-
Training Lead (R@1, R@5): Sahitya Singh
-
Mentor: Nirant Kasliwal
-
Advisor: Dhruv Anand
Imagine asking:
“Need a cheesy, rainy-day dosa”
and getting an instant, mouth-watering shortlist—whether you’re in Pilani or Palo Alto.
Today’s search and embedding models cannot handle such phrases or sentences.
MasalaEdge will be the first edge-deployed, triplet-tuned model that fuses text, images, mood, and weather to power sub-300 ms search.
- Training: GPU-heavy Modal jobs
- Serving: Quantized to GGUF → Cloudflare Workers AI
- Target: Beat SigLIP and CLIP by +8 R@1 on Recipe1M+
- New Data: 20k Indian-flavour triplets (ignored by academia & industry so far)
- Hands-on MLOps (Modal)
- Serverless edge deployment (Cloudflare)
- Real-world latency SLOs
- Visible, demo-ready IP to showcase at APOGEE
- Potential to convert into a top-tier ACL/EMNLP workshop paper
- Weekly meetings with the student
- Rubric as road-map: 5 hard checkpoints over the semester
- Examiner column included for faculty audit
- Grade scale: 10 points
| Score | Week | High Speed Inference Pipeline | Multi-modal Food Embedding Model | Examiner Verification | Comments / Risk Flags |
|---|---|---|---|---|---|
| 10 / 10 – Distinction | Wk 10: Ship & publish | 1. Query2Dish model deployed: <300ms p95, <$0.01/1000 requests 2. Production SLA: >99% uptime across 50 concurrent queries 3. Stripe paywall live; first paid key logged |
1. Query2Dish benchmark: R@1 >40%, R@5 >70% 2. 10k+ Query2Dish dataset released (mood/weather/region) 3. 6-page ACL/EMNLP workshop paper submitted |
Public URL; run wrk for latency; cost dashboard; leaderboard JSON; repo v1.0; EasyChair receipt; Stripe screenshot |
Requires weekly "demo-or-die" discipline; dataset QA is usual downfall |
| 9 / 10 – Very Good | Wk 9: Production optimize | 1. Custom Query2Dish model deployed: <350ms p95, <$0.015/1000 requests 2. Auto-deployment pipeline: Modal → GGUF → Workers AI 3. Stripe paywall functional with usage analytics |
1. Query2Dish model: R@1 >35%, R@5 >65% 2. 8k+ Query2Dish pairs with quality validation 3. Beats SigLIP by +5 R@1 on im2recipe benchmark |
Same tests with looser thresholds; cost tracking dashboard | Missing piece usually cost optimization on Modal |
| 8 / 10 – Good | Wk 8: Custom model deploy | 1. First custom model deployed: <500ms p95, <$0.02/1000 requests 2. End-to-end cost tracking: Modal training + Workers AI serving 3. Load testing: 100+ concurrent requests/sec |
1. Query2Dish model trained on 5k+ pairs 2. R@1 >30%, R@5 >60% on Query2Dish benchmark 3. Beats baseline on Recipe1M+ sampled set |
Eval notebook; latency dashboard; cost breakdown; custom model served | Data noise or GPU overruns hit here |
| 7 / 10 – Satisfactory | Wk 6: Integration + training | 1. Benchmarking pipeline: im2recipe, recipe2im, Query2Dish 2. SigLIP/ColQwen served: <600ms p95, cost tracking setup 3. Integration working: trained model → deployed via pipeline |
1. Query2Dish dataset: 3k+ curated pairs 2. First custom model training on sampled Recipe1M+ 3. Baseline reproduction: SigLIP on 15k samples |
Faculty runs make eval; benchmark results JSON; API serves trained model |
Data cleaning underestimated → common blocker |
| 6 / 10 – Pass | Wk 4: Benchmarking MVP | 1. Cost tracking: Modal training + Workers AI serving costs 2. Benchmark pipeline: automated R@1, R@5 evaluation 3. SigLIP deployed via Workers AI: <800ms p95 |
1. Recipe1M+ sampled: 10-15k representative samples 2. Query2Dish schema + initial 500 pairs 3. Data preprocessing pipeline for multimodal inputs |
Examiner hits /healthz; benchmark runs automatically; cost dashboard shows $/request |
Easy with coding skills; no novelty yet |
| 5 / 10 – Borderline | Wk 3: Foundation + sampling | 1. SigLIP/ColQwen endpoint: basic embeddings via Workers AI 2. Latency baseline: measure existing model performance 3. Modal + Cloudflare deployment setup |
1. Recipe1M+ sampling strategy: 5k samples selected 2. Query2Dish data collection plan + 100 examples 3. Baseline: SigLIP performance on sampled data |
PDFs + job screenshot + working API endpoint + sample data verification | Acceptable only if severe blockers documented |
| 4 / 10 – Bare Minimum | < Wk 3 | Repo exists, some code committed, no runnable pipeline/data/charter | Recipe1M+ splits which are high diverstiy | Passable |
(Agreed with Prof. Dhruv)
| Component | Weight | How Graded |
|---|---|---|
| Checkpoint completion | 70% | Highest fully achieved band = base grade |
| Technical depth & code quality | 15% | Code review rubric: modularity, tests, CI, lint, comments |
| Documentation & replication kit | 10% | Can an external TA reproduce in < 2h? |
| Professionalism | 5% | Weekly demos, logbook, budget tracking (−1 per missed demo, capped −3) |