Skip to content

[Roadmap] SkyThought Roadmap #13

@caoshiyi

Description

@caoshiyi

This doc will be maintained lively and the related RFC on the corresponding item will be linked here. Feel free to comment and request other features!

Unified Evaluation Suite

@richardliaw @kouroshHakha

  • Improve the code quality of the current evaluation suite.
  • Add more benchmarks.
  • Scalable evaluation suite.
  • Support Best-of-N, MCTS, PRM-guided generation with basic PRM / policy models etc.
  • Decontamination implementation for future developments as we are adding more training data in.
  • LLM check for results.

Demo

Data decontamination

  • Implement a data decontamination pipeline using n-gram (10 for default), against am evaluation set.

Sub-issues

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions