Skip to content

Conversation

@mefich
Copy link
Contributor

@mefich mefich commented Oct 30, 2025

This is a small PR that ensures that vision model of a VLM is unloaded and doesn't stay in VRAM indefinitely.

I've used a few exl3 VLMs and noticed that after unloading them a noticeable amount of VRAM was kept reserved by TabbyAPI.
Exl3 backend unload function was missing code to unload the vision part.

This change ensures that when unloading vlm their vision part is also unloaded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant