Skip to content

Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu #3113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Apr 14, 2025

Conversation

sywangyi
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@baptistecolle
Copy link
Collaborator

Hey @sywangyi Thanks for contributing to the new upstreamed Gaudi backend 🙌

Here are some guidelines about the new backend that I think could be useful to you.

Guidelines for contributing to Gaudi on TGI: All changes should be made within the backends/gaudi folder. In general, you should avoid modifying the router, launcher, or benchmark to accommodate Gaudi hardware, as all Gaudi-specific logic should be contained within the backends/gaudi folder.

**Guidelines for contributing to Gaudi on TGI:** All changes should be made within the `backends/gaudi` folder. In general, you should avoid modifying the router, launcher, or benchmark to accommodate Gaudi hardware, as all Gaudi-specific logic should be contained within the `backends/gaudi` folder.

@baptistecolle
Copy link
Collaborator

FYI @regisss

When modifying anything outside of strictly Gaudi-related files (e.g., backend/gaudi, Dockerfile_gaudi, or Gaudi documentation), we should also tag @Narsil for review. This helps ensure that we don’t unintentionally modify or bloat core TGI components and allows the core maintainers to review the changes.

@baptistecolle
Copy link
Collaborator

Could you please rebase your image on the latest main branch of TGI?
Commit: 8c2c348f3c4d1101e862af493b44e4986119b557

This update integrates the latest changes from the old tgi-gaudi fork, allowing us to confirm that your refactoring is fully compatible with the latest version of the Gaudi backend.

Thanks!

sywangyi added 18 commits March 18, 2025 23:11
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
remove model where pageattn is not used, set block table to None since it's not used

Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
@sywangyi sywangyi marked this pull request as ready for review March 28, 2025 14:03
Copy link
Collaborator

@baptistecolle baptistecolle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge effort!

I’ve added @regisss and @Narsil as reviewers. @Narsil, could you check the modifications to the launcher and router and confirm that everything aligns with the main TGI team? Since this is a big PR, it would also be great for @regisss to take a look.

Really appreciate the refactoring and code cleanup!

One quick question—did you happen to benchmark the performance after this refactoring? Specifically, I’m curious about the impact of using the vllm-hpu-extension

@baptistecolle baptistecolle requested review from Narsil and regisss April 2, 2025 09:31
@baptistecolle baptistecolle added the gaudi Issues related to Intel Gaudi hardware label Apr 2, 2025
@baptistecolle baptistecolle changed the title clean cuda/rocm code in hpu backend, enable flat_hpu Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu Apr 2, 2025
@sywangyi
Copy link
Contributor Author

sywangyi commented Apr 3, 2025

Thanks for the huge effort!

I’ve added @regisss and @Narsil as reviewers. @Narsil, could you check the modifications to the launcher and router and confirm that everything aligns with the main TGI team? Since this is a big PR, it would also be great for @regisss to take a look.

Really appreciate the refactoring and code cleanup!

One quick question—did you happen to benchmark the performance after this refactoring? Specifically, I’m curious about the impact of using the vllm-hpu-extension

This PR is for functional enabling, I am also working in perf optimization. because we have comparison between tgi-gaudi(OH backend) and vllm-gaudi(hpu page path). vllm-gaudi is better than tgi-gaudi in throughput. this refactor is help to align the performance with vllm gaudi

@sywangyi
Copy link
Contributor Author

@baptistecolle I have rebase the PR to main. please help review it

@sywangyi
Copy link
Contributor Author

@regisss have rebase the PR to main. please help review it and merge

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Gently pinging @Narsil for final approval.

@Narsil Narsil merged commit d62c941 into huggingface:main Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gaudi Issues related to Intel Gaudi hardware
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants