-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu #3113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Hey @sywangyi Thanks for contributing to the new upstreamed Gaudi backend 🙌 Here are some guidelines about the new backend that I think could be useful to you.
|
FYI @regisss When modifying anything outside of strictly Gaudi-related files (e.g., backend/gaudi, Dockerfile_gaudi, or Gaudi documentation), we should also tag @Narsil for review. This helps ensure that we don’t unintentionally modify or bloat core TGI components and allows the core maintainers to review the changes. |
Signed-off-by: Wang, Yi A <[email protected]>
Could you please rebase your image on the latest main branch of TGI? This update integrates the latest changes from the old tgi-gaudi fork, allowing us to confirm that your refactoring is fully compatible with the latest version of the Gaudi backend. Thanks! |
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
remove model where pageattn is not used, set block table to None since it's not used Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
…mic shape Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge effort!
I’ve added @regisss and @Narsil as reviewers. @Narsil, could you check the modifications to the launcher and router and confirm that everything aligns with the main TGI team? Since this is a big PR, it would also be great for @regisss to take a look.
Really appreciate the refactoring and code cleanup!
One quick question—did you happen to benchmark the performance after this refactoring? Specifically, I’m curious about the impact of using the vllm-hpu-extension
This PR is for functional enabling, I am also working in perf optimization. because we have comparison between tgi-gaudi(OH backend) and vllm-gaudi(hpu page path). vllm-gaudi is better than tgi-gaudi in throughput. this refactor is help to align the performance with vllm gaudi |
Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Wang, Yi A <[email protected]>
@baptistecolle I have rebase the PR to main. please help review it |
@regisss have rebase the PR to main. please help review it and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Gently pinging @Narsil for final approval.
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.