20250915 KAT-Coder Submission for SWE-bench Verified #337
Open
zheng-kuaishou wants to merge 5 commits intoSWE-bench:mainfrom
Open
20250915 KAT-Coder Submission for SWE-bench Verified #337zheng-kuaishou wants to merge 5 commits intoSWE-bench:mainfrom
zheng-kuaishou wants to merge 5 commits intoSWE-bench:mainfrom
Conversation
Member
Author
Hi @john-b-yang ,Thank you for your message. |
Author
|
I noticed that you’ve started reviewing and commenting on other pull requests.Could you please take a look at my submission as well and let me know if anything else is needed? Thank you for all the work you’ve been doing to maintain this project. It brings great value to the industry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


KAT-Coder
Today, we're thrilled to announce two groundbreaking models in our KAT series: KAT-Dev-32B and KAT-Coder — representing accessible excellence and ultimate performance in code intelligence. We are excited to introduce KAT-Dev-32B - our new open-source 32B-parameter model for software engineering tasks. We have released KAT-Dev-32B to the community for further research and development and you can find it at https://huggingface.co/Kwaipilot/KAT-Dev. Moreover, KAT-Coder is our most powerful variant.
Key Contributions
Our KAT-Coder and KAT-Dev-32B are optimized via several stages of training, including a mid-training stage, supervised fine-tuning (SFT) & reinforcement fine-tuning (RFT) stage and an large-scale agentic reinforcement learning (RL) stage. In summary, our contributions include:
🎯 Mid-Training: We observe that adding extensive training for tool-use capability, multi-turn interaction, and instruction-following at this stage may not yield large performance gains in the current results (e.g., on leaderboards like SWE-bench), but it will have a significant impact on the subsequent SFT and RL stages.
🎯 SFT & RFT: We meticulously curated eight task types and eight programming scenarios during the SFT stage to ensure the model's generalization and comprehensive capabilities. Moreover, before RL, we innovatively introduced an RFT stage with "teacher trajectories" annotated by human engineers as guidance during training.
🎯 Agentic RL Scaling: Scaling agentic RL hinges on three challenges: efficient learning over nonlinear trajectory histories, leveraging intrinsic model signals, and building scalable high-throughput infrastructure. We address these with prefix caching on logprob computation, entropy-based trajectory pruning, and SeamlessFlow architecture.
Moreover, to achieve an enhanced performance on SWE-Bench, we further equip our KAT-Coder with an extra reflection step and iteratively refine the generated outputs.
You can learn more details about our models in https://kwaipilot.github.io/KAT-Coder/ and how we perform reflection in https://github.com/kwaipilot/KAT-Coder-Agent/blob/main/KAT-Coder_Report.md.
Performance
This pull request includes the performance of the KAT-Coder on the SWE-bench verify.
Checklist
PASS_TO_PASS,FAIL_TO_PASS)hintsfield in SWE-bench