-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tstesco/benchmark-uplift #63
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
def backport_removeprefix(string: str, prefix: str) -> str: | ||
return string[len(prefix):] if string.startswith(prefix) else string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to this function specifying why it's needed (& the torch version it's supporting)? EDIT: Nevermind, see other comment about remove_prefix
benchmarks/backend_request_func.py
Outdated
"best_of": request_func_input.best_of, | ||
# "best_of": request_func_input.best_of, | ||
"max_tokens": request_func_input.output_len, | ||
"logprobs": request_func_input.logprobs, | ||
# "logprobs": request_func_input.logprobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this changed by mistake? (I don't think we can upstream this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont support those parameters in our fork yet #44. I can add a TODO pointing to the issue.
# Since vllm must support Python 3.8, we can't use str.removeprefix(prefix) | ||
# introduced in Python 3.9 | ||
def remove_prefix(text: str, prefix: str) -> str: | ||
if text.startswith(prefix): | ||
return text[len(prefix):] | ||
return text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They already had a function remove_prefix
for supporting 3.8 but intentionally removed it, they may not want us to add it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be here until we can move to python 3.9+, and that hopefully happens before we upstream. I can add e.g.:
# TODO: remove backport after we can drop support of Python 3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are aiming to proceed with the rebase + integration of the dev branch on to upstream in the next week or two, so I'm hesitant to push this since we'll have to remove it again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing as per #63 (comment) |
changelog:
max_concurrency
feature to allow for large n when running benchmark sweeps and measuring correct TTFT and E2EL.Discussion
Previously the benchmarking script sends all requests at time=0 and vLLM queues them on the server side, thus measuring the queue time in TTFT and E2EL. The latest version from upstream uses
asyncio.Semaphore(max_concurrency)
to stop all requests from running at once, at a higher concurrency than the model max batch size. Setting max_concurrency to the max batch size of the model allows for correct measurement of TTFT and E2EL.