[bug]: Excessive VRAM usage on SDXL VAE decode stage

### Is there an existing issue for this problem?

- [x] I have searched the existing issues

### Operating system

Windows

### GPU vendor

Nvidia (CUDA)

### GPU model

3080

### GPU VRAM

10GB

### Version number

5.6.0

### Browser

Invoke Client

### Python dependencies

_No response_

### What happened

* Installed Invoke from fresh install
* Installed models, did some Flux-dev generations without issues
* Load a SDXL CheckPoint with sdxl_vae with 32bit precision
* Generate image to canvas, notice after all the steps are complete nothing is happening. This would at the time of the VAE decode, before the image is completed. Checked task manager and VRAM is filled and spilling into system RAM by 1.5-3 GB
* Behaviour is consistent, even with 16-bit VAE
* Afterwards VRAM is unloaded to 5-6 GB for next generation, again hitting very high levels and spilling over into RAM at the time of VAE decode.

Tested resolutions: 1152 x 896 & 1024 x 1024, same behaviour
This used to work well in previous versions by the time support for drawing pads were introduced (5.1?)
This is a standard installation, with exception for invoke.yaml being edited and below parameter being added:
enable_partial_loading: true

Not sure if it's for my system only but I never had this issue with Forge or ComfyUI.
Any idea what may be causing this weirdness with memory management and how one might fix it?
I have sysmem fallback enabled and I don't want to change it. Whats weird is there is no issues with Flux, even with full-size T5 which consumes more VRAM by magnitudes.

### What you expected to happen

I expect to generate a picture with an SDXL model without needing to use 11.5-13 GB VRAM at the time of VAE decode.

### How to reproduce the problem

* Use 10GB VRAM card
* Standard installation Invoke
* Invoke.yaml - Add: enable_partial_loading: true
* Generate a 1152 x 896 or 1024 x 1024 image to canvas using SDXL + sdxl_vae.safetensors (or auto)
* At the time of VAE decode, excessive VRAM is used.

### Additional context

_No response_

### Discord username

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Discord username

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

Description

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Discord username

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions