-
Notifications
You must be signed in to change notification settings - Fork 184
Open
Description
Hi, thanks for the great benchmark!
I have a question regarding the evaluation setup for LiveCodeBench v6.
https://www.emergentmind.com/topics/livecodebench-v5-v6-pro
https://livecodebench.github.io/leaderboard.html
From the documentation, it seems that:
- LiveCodeBench v6 contains 454 problems collected from Aug 2024 to May 2025.
However, in practice I observed that:
- The test_v6 split on HuggingFace contains 175 problems.
- The difficulty distribution appears to be 75 Easy / 75 Medium / 25 Hard, which matches the commonly reported evaluation setup.
Could you clarify the intended evaluation protocol?
- Is 454 the total dataset size, while 175 is the standard evaluation subset?
- Should experiments reported in papers follow the 175-task split?
- Is there an official list defining this evaluation subset?
I want to make sure my evaluation setup is consistent with the intended benchmark protocol.
Thanks again for releasing LiveCodeBench!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels