You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Awesome work! Inspirational! Interested to learn more about:
How long it took for pre-training? And infra used - 8gpu P4d/P4de/P5?
Any reason to strict to nvidia/NV-EMBED model? Have you instead tried using same LLAMA model itself for embedding? For example, using LLAMA’s last token representation? Will there be any extra benefit in using same model for embedding aswell as for decoder?
Does your approach generalize to longer text like 3k tokens aswell w.r.t embedding?
The text was updated successfully, but these errors were encountered:
Thanks for your interest! to answer your questions
The pre-training of the projector only took hours on a 4 GPU machine since we used a small corpus (wikitext)
For embedding models we have experimented with 4: NV-Embed (Lee et al. 2024; nvidia/NV-Embed-v1), SFR (Meng et al. 2024; Salesforce/SFR-Embedding-2 R), Stella (Zhang 2024; dunzhang/stella en 1.5B v5), GTR-t5 (Ni et al. 2021; sentence-transformers/gtr-t5-base). You can see detailed comparisons among those in our paper. The takeaway is that stronger Embedding Models are more likely to deliver better downstream performance in Vector-ICL (See our analysis in Sec 6.1)
I think so, but longer text might be more difficult to encode and retain the original information.
Awesome work! Inspirational! Interested to learn more about:
The text was updated successfully, but these errors were encountered: