Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential Attention is incompatible with Infini-Attention #36

Open
Vectorrent opened this issue Jan 24, 2025 · 0 comments
Open

Differential Attention is incompatible with Infini-Attention #36

Vectorrent opened this issue Jan 24, 2025 · 0 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Vectorrent
Copy link
Contributor

You can reproduce the errors like this:

python run.py --dev --no_dashboard --memory --differential

The main conflict is coming from the fact that differential heads are essentially doubled in size, so that they can be split in half, later. Infini-Attention was not designed to handle this, and thus fails here.

I tried to fix, but could not find an elegant solution, which left the math intact. Would love some help here.

@Vectorrent Vectorrent added bug Something isn't working help wanted Extra attention is needed labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant