Commit d109134
[Bugfix] bugfix for the order of dummy run pad and sync (vllm-project#5777)
### What this PR does / why we need it?
This PR addresses an issue in piecewise graph mode when Multi-Threading
Parallelism (MTP) is enabled. Specifically, the original dummy run
sequence performs the following steps in order:
1. Sync DP (input length = 1 + k)
2. Dispatch (input length = 1 + k, with padding==graph size)
However, in the model execution phase, the sequence differs, resulting
in:
1. Padding (input length = 1, with padding)
2. Sync DP (input length = 1 + k)
3. Dispatch (input length 1 + k != graph size 1 + k, with padding)
This discrepancy leads to a mismatch between the input sizes used in the
model execution and those expected by the dispatch graph, causing an
inconsistency in graph size.
This PR ensures that the dispatch graph size aligns correctly by
modifying the sequence of operations during model execution to match the
dummy run sequence, resolving the mismatch issue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654
Signed-off-by: LiuYi-UP <[email protected]>1 parent a23dfce commit d109134
1 file changed
+11
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2072 | 2072 | | |
2073 | 2073 | | |
2074 | 2074 | | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
2075 | 2079 | | |
2076 | 2080 | | |
2077 | | - | |
2078 | | - | |
| 2081 | + | |
| 2082 | + | |
2079 | 2083 | | |
2080 | 2084 | | |
2081 | 2085 | | |
| |||
2122 | 2126 | | |
2123 | 2127 | | |
2124 | 2128 | | |
2125 | | - | |
2126 | | - | |
2127 | | - | |
| 2129 | + | |
| 2130 | + | |
| 2131 | + | |
| 2132 | + | |
| 2133 | + | |
2128 | 2134 | | |
2129 | 2135 | | |
2130 | 2136 | | |
| |||
0 commit comments