Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've noticed an inconsistency in the handling of weights in the second text encoder of the SDXL model, which seems to align with the issue reported in Stability-AI/generative-models#111. It appears that the diffusers and kohya libraries utilize the weights differently, as discussed in huggingface/diffusers#8238.
To further investigate, I conducted some experiments using the following script:
The output shows a difference in the text projection weights (

diff of text_projection: 32.875
). Surprisingly, the images generated before and after the transposition of weights are of same quality level.orig_a-cat-in-the-jungle_0.jpg vs tran_a-cat-in-the-jungle_0.jpg:
orig_a-girl-in-the-jungle_0.jpg vs tran_a-girl-in-the-jungle_0.jpg:

Possible explanations could be:
I am curious if anyone else has observed similar behavior or has insights into the robustness of SDXL's text encoding process.
Beta Was this translation helpful? Give feedback.
All reactions