Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding Training time and Embeddings #2

Open
shivapk opened this issue Jan 18, 2025 · 1 comment
Open

Regarding Training time and Embeddings #2

shivapk opened this issue Jan 18, 2025 · 1 comment

Comments

@shivapk
Copy link

shivapk commented Jan 18, 2025

Awesome work! Inspirational! Interested to learn more about:

  1. How long it took for pre-training? And infra used - 8gpu P4d/P4de/P5?
  2. Any reason to strict to nvidia/NV-EMBED model? Have you instead tried using same LLAMA model itself for embedding? For example, using LLAMA’s last token representation? Will there be any extra benefit in using same model for embedding aswell as for decoder?
  3. Does your approach generalize to longer text like 3k tokens aswell w.r.t embedding?
@EvanZhuang
Copy link
Owner

Thanks for your interest! to answer your questions

  1. The pre-training of the projector only took hours on a 4 GPU machine since we used a small corpus (wikitext)
  2. For embedding models we have experimented with 4: NV-Embed (Lee et al. 2024; nvidia/NV-Embed-v1), SFR (Meng et al. 2024; Salesforce/SFR-Embedding-2 R), Stella (Zhang 2024; dunzhang/stella en 1.5B v5), GTR-t5 (Ni et al. 2021; sentence-transformers/gtr-t5-base). You can see detailed comparisons among those in our paper. The takeaway is that stronger Embedding Models are more likely to deliver better downstream performance in Vector-ICL (See our analysis in Sec 6.1)
  3. I think so, but longer text might be more difficult to encode and retain the original information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants