Skip to content

Architecture: Optimize labs.py nested table joins to prevent Toolforge OOM timeouts#479

Open
ayushshukla1807 wants to merge 1 commit intohatnote:masterfrom
ayushshukla1807:perf/labs-sql-optimization-1775813476
Open

Architecture: Optimize labs.py nested table joins to prevent Toolforge OOM timeouts#479
ayushshukla1807 wants to merge 1 commit intohatnote:masterfrom
ayushshukla1807:perf/labs-sql-optimization-1775813476

Conversation

@ayushshukla1807
Copy link
Copy Markdown

@ayushshukla1807 ayushshukla1807 commented Apr 10, 2026

Optimized the Toolforge query layer in labs.py.

The nested table joins for category imports were triggering OOM timeouts on massive campaigns. Swapped out the expensive subqueries for a paginated cursor approach to keep memory usage flat.

@mahmoud
Copy link
Copy Markdown
Member

mahmoud commented Apr 11, 2026

Sounds promising, but l think this PR might include more than just the labs fix. And you forgot to insert the issue number ;) Looking forward to reviewing all the PRs, though I wonder if you can speak a bit to your process/prompt? And is this part of GSoC / coordinated with someone on the team?

@ayushshukla1807 ayushshukla1807 force-pushed the perf/labs-sql-optimization-1775813476 branch from bdd8390 to c2ad6a9 Compare April 11, 2026 14:56
@ayushshukla1807
Copy link
Copy Markdown
Author

ayushshukla1807 commented Apr 11, 2026

hi @mahmoud ah rough catch on the git stuff my bad. was running local toolforge oom simulations and accidentally pushed a bunch of unrelated commits onto this branch instead of isolating the labs.py fix. just ran a rebase and force pushed so this is strictly just the labs optimization for #478 now.

regarding the prompt thing - i have submitted my proposal on Montage for GSOC 2026, but mostly i've just been really enjoying ripping into the backend to learn how everything ticks.
I started getting pretty deep into the architecture and got a bit overly formal with my pr write_ups (used copilot to help format my markdown because i wanted everything to look super organised).
The actual python logic and local testing is all me though.
I can see how the super formal github text + the sloppy git branch looked weird haha.
I'll tone it down and keep things more natural.

let me know if the nested join in labs.py looks okay on your end!

@ayushshukla1807
Copy link
Copy Markdown
Author

ayushshukla1807 commented Apr 11, 2026

also just to clarify the setup – i haven't specifically coordinated this directly with the team. i've just been digging through the codebase locally for fun because the stack is really interesting to me.

context wise: i've been around wikimedia since oct 2024 (was in the developer skill development program and did some stuff with imd ug). tracing these sqlalchemy bottlenecks in montage has been a massive learning experience for me. ive attached screenshots/recordings of my local terminal running the execution on my other prs too (#486 for the wal modes and #489 for the auth drop) just to show the local hardware testing.

definitely planning to stick around and keep contributing here long-term regardless of gsoc. if u get a chance to review those heavier backend prs later when u have free time that'd be awesome. thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants