-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[CI/Build] Replace lm-eval gsm8k tests with faster implementation #23002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request replaces the lm-eval based gsm8k tests with a custom, faster implementation. The changes include a new evaluation script, pytest integration, and model configurations. My review focuses on correctness and stability. I've identified a race condition in the file download logic and a potential resource exhaustion issue due to an unsafe default for parallel requests. Both are high-severity issues that should be addressed to ensure the new test is robust and reliable.
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
|
can we make this do streaming in a follow up @mgoin ? |
|
@robertgshaw2-redhat yes, that is my plan to also make this a benchmark util
|
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
…lm-project#23002) Signed-off-by: mgoin <mgoin64@gmail.com>
Purpose
We mostly use
lm-evalas a way to run gsm8k evaluations for end-to-end accuracy testing, but gsm8k is a simple eval that you can replicate in ~100 lines of code so I think we should implement a close approximation of it that focuses on performance and ease of use for server benchmarking.This PR focuses on just replacing the
LLMclass test with a fastervllm serve-based test for accuracy without streaming, but in the future we can repurpose this script to do streaming and measure fine-grained performance alongside accuracy.Test Plan
See if
LM Eval Small Modelsjob is green and hopefully faster than beforeTest Result
Job time reduced from 36min to 18min!
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.