20250915 KAT-Coder Submission for SWE-bench Verified by zheng-kuaishou · Pull Request #337 · SWE-bench/experiments

zheng-kuaishou · 2025-09-16T04:28:07Z

KAT-Coder

Today, we're thrilled to announce two groundbreaking models in our KAT series: KAT-Dev-32B and KAT-Coder — representing accessible excellence and ultimate performance in code intelligence. We are excited to introduce KAT-Dev-32B - our new open-source 32B-parameter model for software engineering tasks. We have released KAT-Dev-32B to the community for further research and development and you can find it at https://huggingface.co/Kwaipilot/KAT-Dev. Moreover, KAT-Coder is our most powerful variant.

Key Contributions

Our KAT-Coder and KAT-Dev-32B are optimized via several stages of training, including a mid-training stage, supervised fine-tuning (SFT) & reinforcement fine-tuning (RFT) stage and an large-scale agentic reinforcement learning (RL) stage. In summary, our contributions include:

🎯 Mid-Training: We observe that adding extensive training for tool-use capability, multi-turn interaction, and instruction-following at this stage may not yield large performance gains in the current results (e.g., on leaderboards like SWE-bench), but it will have a significant impact on the subsequent SFT and RL stages.

🎯 SFT & RFT: We meticulously curated eight task types and eight programming scenarios during the SFT stage to ensure the model's generalization and comprehensive capabilities. Moreover, before RL, we innovatively introduced an RFT stage with "teacher trajectories" annotated by human engineers as guidance during training.

🎯 Agentic RL Scaling: Scaling agentic RL hinges on three challenges: efficient learning over nonlinear trajectory histories, leveraging intrinsic model signals, and building scalable high-throughput infrastructure. We address these with prefix caching on logprob computation, entropy-based trajectory pruning, and SeamlessFlow architecture.

Moreover, to achieve an enhanced performance on SWE-Bench, we further equip our KAT-Coder with an extra reflection step and iteratively refine the generated outputs.

You can learn more details about our models in https://kwaipilot.github.io/KAT-Coder/ and how we perform reflection in https://github.com/kwaipilot/KAT-Coder-Agent/blob/main/KAT-Coder_Report.md.

Performance

This pull request includes the performance of the KAT-Coder on the SWE-bench verify.

Submission summary for 20250916_KAT-Coder on SWE-bench verified split
==================================================
Resolved 382 instances (76.4%)
==================================================
Resolved by Repository
- astropy/astropy: 13/22 (59.09%)
- django/django: 184/231 (79.65%)
- matplotlib/matplotlib: 24/34 (70.59%)
- mwaskom/seaborn: 1/2 (50.0%)
- pallets/flask: 1/1 (100.0%)
- psf/requests: 2/8 (25.0%)
- pydata/xarray: 18/22 (81.82%)
- pylint-dev/pylint: 6/10 (60.0%)
- pytest-dev/pytest: 15/19 (78.95%)
- scikit-learn/scikit-learn: 30/32 (93.75%)
- sphinx-doc/sphinx: 28/44 (63.64%)
- sympy/sympy: 60/75 (80.0%)
==================================================
Resolved by Time
- 2013: 1/3 (33.33%)
- 2014: 0/2 (0.0%)
- 2015: 0/1 (0.0%)
- 2016: 2/2 (100.0%)
- 2017: 14/16 (87.5%)
- 2018: 19/24 (79.17%)
- 2019: 78/98 (79.59%)
- 2020: 89/108 (82.41%)
- 2021: 58/86 (67.44%)
- 2022: 78/102 (76.47%)
- 2023: 43/58 (74.14%)

Checklist

Is a pass@1 submission (does not attempt the same task instance more than once)
Does not use SWE-bench test knowledge (PASS_TO_PASS, FAIL_TO_PASS)
Does not use the hints field in SWE-bench
Does not have web-browsing OR has taken steps to prevent lookup of SWE-bench solutions via web-browsing

john-b-yang · 2025-10-01T20:27:27Z

You have uploaded a lot of files - please reduce the representation for trajectories to one file per task instance, thanks.

zheng-kuaishou · 2025-10-04T02:46:09Z

You have uploaded a lot of files - please reduce the representation for trajectories to one file per task instance, thanks.

Hi @john-b-yang ,Thank you for your message.
Our initial submission included logs and trajs from our two models, which unfortunately led to a larger number of files than expected. I've now removed the results and trajectory files for one of the models to simplify the submission.
Could you kindly check again and let me know if there are still any issues? Really appreciate your time and help!

zheng-kuaishou · 2025-11-25T04:10:27Z

Hi @john-b-yang @ofirpress,

I noticed that you’ve started reviewing and commenting on other pull requests.Could you please take a look at my submission as well and let me know if anything else is needed?

Thank you for all the work you’ve been doing to maintain this project. It brings great value to the industry.

zheng-kuaishou added 4 commits September 16, 2025 01:21

KAT-Dev-32B submission

44cf4bd

update blog

130c4e5

update readme

c217f7d

KAT-Coder Submission

6958fda

zheng-kuaishou changed the title ~~20250915 KAT-Dev-32B Submission for SWE-bench Verified~~ 20250915 KAT-Dev-32B & KAT-Coder Submission for SWE-bench Verified Sep 16, 2025

KAT-Coder submission

84ad403

zheng-kuaishou changed the title ~~20250915 KAT-Dev-32B & KAT-Coder Submission for SWE-bench Verified~~ 20250915 KAT-Coder Submission for SWE-bench Verified Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20250915 KAT-Coder Submission for SWE-bench Verified #337

20250915 KAT-Coder Submission for SWE-bench Verified #337
zheng-kuaishou wants to merge 5 commits intoSWE-bench:mainfrom
KAT-JungleJuice:main

zheng-kuaishou commented Sep 16, 2025 •

edited

Loading

Uh oh!

john-b-yang commented Oct 1, 2025

Uh oh!

zheng-kuaishou commented Oct 4, 2025

Uh oh!

zheng-kuaishou commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zheng-kuaishou commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

KAT-Coder

Key Contributions

Performance

Checklist

Uh oh!

john-b-yang commented Oct 1, 2025

Uh oh!

zheng-kuaishou commented Oct 4, 2025

Uh oh!

zheng-kuaishou commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zheng-kuaishou commented Sep 16, 2025 •

edited

Loading

zheng-kuaishou commented Nov 25, 2025 •

edited

Loading