Unload model after being not used for some time #72

hgruber · 2024-09-04T16:49:21Z

Unlike to #66 it's probably also a good idea to unload the model after e.g. 300s of not being used.
For ollama server this is the standard. My graphics card uses 40W more of power with any model being loaded.

fedirz · 2024-09-05T15:00:46Z

I think I'll have this feature implemented by EOW. For now, you can use the DELETE /api/ps/{model_name:path} route to manually offload the model.

samos123 · 2024-09-05T22:36:00Z

I would like this behind a flag so I can keep the model loaded for improved latency.

Foddy · 2024-09-29T12:33:17Z

Has anyone already found a solution for the problem, similar to how Ollama does it? Unfortunately, my GPU also constantly uses around 45W-50W in idle mode, which becomes quite expensive to operate given the local electricity prices (especially if the model is not used for an extended period) :S

fedirz · 2024-09-30T01:09:25Z

Has anyone already found a solution for the problem, similar to how Ollama does it? Unfortunately, my GPU also constantly uses around 45W-50W in idle mode, which becomes quite expensive to operate given the local electricity prices (especially if the model is not used for an extended period) :S

FYI, I'm currently working on this

fedirz · 2024-10-01T01:05:14Z

The feature has been implemented in #92 and is available on master. Will wait a couple of days before make a new release with the changes. Any feedback would be appreciated!

hgruber · 2024-10-02T21:46:42Z

Thanks, that was a quick one! Model unloading works, but some GPU Memory is still used by the process which keeps the GPU in P0 state and continues to consume more power than necessary:

|=========================================+======================+======================|
|   0  Tesla P40                      On  | 00000000:01:00.0 Off |                    0 |
| N/A   40C    P0              51W / 250W |    152MiB / 23040MiB |      0%	Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage	|
|=======================================================================================|
|    0   N/A  N/A   2441259	 C   ...ter-whisper-server/.venv/bin/python	 150MiB |
+---------------------------------------------------------------------------------------+

fedirz · 2024-10-03T15:42:41Z

@hgruber I'm aware of this, but unfortunately, this is an upstream issue which doesn't seem resolved yet. See (this)[https://github.com/SYSTRAN/faster-whisper/issues/992]. If you find a workaround, please LMK and I'll implement it.

hgruber changed the title ~~unload model after being not used for some time~~ Unload model after being not used for some time Sep 4, 2024

fedirz closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload model after being not used for some time #72

Unload model after being not used for some time #72

hgruber commented Sep 4, 2024

fedirz commented Sep 5, 2024

samos123 commented Sep 5, 2024

Foddy commented Sep 29, 2024

fedirz commented Sep 30, 2024

fedirz commented Oct 1, 2024

hgruber commented Oct 2, 2024

fedirz commented Oct 3, 2024

Unload model after being not used for some time #72

Unload model after being not used for some time #72

Comments

hgruber commented Sep 4, 2024

fedirz commented Sep 5, 2024

samos123 commented Sep 5, 2024

Foddy commented Sep 29, 2024

fedirz commented Sep 30, 2024

fedirz commented Oct 1, 2024

hgruber commented Oct 2, 2024

fedirz commented Oct 3, 2024