Are the text padding tokens masked in SD3? #11072
Unanswered
va1bhavagrawal
asked this question in
Q&A
Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
SD3 has the text embeddings coming from CLIP and T5 text encoders. These are padded upto the sequence length (77 for CLIP,
max_sequence_length
for T5).How are the padding tokens masked? Do the image tokens directly attend to the padding tokens? I do not see any
attention_mask
being passed to the transformer? My intuition says that the image tokens should not be allowed to attend to the padding tokens.Any ideas how this is handled?
Beta Was this translation helpful? Give feedback.
All reactions