[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

prashanth058 · 2025-11-19T00:17:19Z

Issue:
LoRA-wrapped RowParallelLinear was adding bias as a separate bfloat16 operation instead of fusing it into the GEMM kernel like the unwrapped layer does. This caused precision loss because the fused kernel can accumulate in higher precision (FP32) before converting to bfloat16, while separate addition incurs additional rounding errors. The discrepancy appeared even with zero LoRA weights when comparing LoRA-wrapped vs merged weight results.

Fix:
Pass bias to apply() only on rank 0 (or when skip_bias_add=False), allowing the quantization method to fuse bias addition with matrix multiplication in the GEMM kernel. This matches the unwrapped layer's behavior and eliminates precision discrepancies.

Signed-off-by: prashanth058 <[email protected]>

…lora-bias-precision

gemini-code-assist

Code Review

This pull request addresses a precision loss issue in LoRA-wrapped RowParallelLinear by fusing the bias addition into the GEMM kernel, which aligns its behavior with the non-LoRA equivalent layer. The changes correctly pass the bias to the apply method only on rank 0 to prevent redundant additions in tensor-parallel setups, and the refactoring of the bias handling logic improves code clarity. The fix appears correct and well-implemented. I have no major concerns with this change.

jeejeelee · 2025-11-19T01:36:20Z

over LGTM, could you please address CI failure first?

prashanth058 added 2 commits November 18, 2025 23:51

fix lora bias precision

90a0850

Signed-off-by: prashanth058 <[email protected]>

Merge branch 'main' of https://github.com/vllm-project/vllm into fix/…

b58afac

…lora-bias-precision

prashanth058 requested a review from jeejeelee as a code owner November 19, 2025 00:17

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

prashanth058 commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jeejeelee commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

Are you sure you want to change the base?

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

Conversation

prashanth058 commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jeejeelee commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prashanth058 commented Nov 19, 2025 •

edited by github-actions bot

Loading