Skip to content

feat: deepseek_v1 gqa and correct normalization mode #2715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

akhoroshev
Copy link
Contributor

@akhoroshev akhoroshev commented Jan 23, 2025

@nv-guomingz
Copy link
Collaborator

Hi @akhoroshev thanks for your contribution, we'll take a look on it.

@nv-guomingz
Copy link
Collaborator

Hi @akhoroshev we've fixed it internally about 1 month ago.
Due to 0.17 release ,we suspended the weekly update for a while.
Since the 0.17 already released, we'll resume the weekly update soon.

For the gqa part change, the deepseek doesn't use this attention AFAIK.
Do you have any specific reason for that?

@nv-guomingz nv-guomingz added the triaged Issue has been triaged by maintainers label Feb 5, 2025
@akhoroshev
Copy link
Contributor Author

akhoroshev commented Feb 5, 2025

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

@nv-guomingz
Copy link
Collaborator

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

@akhoroshev
Copy link
Contributor Author

akhoroshev commented Feb 6, 2025

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

I also found an open model that uses gqa deepseek v1

https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct

@juney-nvidia
Copy link
Collaborator

@akhoroshev Hi, we plan to deprecate DS V1/V2 support, with only keeping the V3/R1 model support.
So we may not accept this MR for now.

Thanks
June

@juney-nvidia juney-nvidia added the Community want to contribute PRs initiated from Community label Mar 24, 2025
@juney-nvidia juney-nvidia self-requested a review March 24, 2025 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute PRs initiated from Community triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants