-
Notifications
You must be signed in to change notification settings - Fork 345
Open
2 / 32 of 3 issues completedDescription
This doc will be maintained lively and the related RFC on the corresponding item will be linked here. Feel free to comment and request other features!
Unified Evaluation Suite
- Improve the code quality of the current evaluation suite.
- Add more benchmarks.
- Scalable evaluation suite.
- Support Best-of-N, MCTS, PRM-guided generation with basic PRM / policy models etc.
- Decontamination implementation for future developments as we are adding more training data in.
- LLM check for results.
Demo
- Serve the model and add demo for user [Doc] Add Chat Demo & Discord Link #36
Data decontamination
- Implement a data decontamination pipeline using n-gram (10 for default), against am evaluation set.
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Labels
No labels