Skip to content

Tables & Resources

This page contains statistical tables and resources from our comprehensive survey on Issue Resolution in Software Engineering.


Evaluation & Training Datasets

A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.

Dataset Language Multimodal Repos Amount Environment Link
Single-PL Datasets
SWE-Fixer Python No 856 115,406 No GitHub HuggingFace HuggingFace
SWE-smith Python No 128 50k Yes GitHub HuggingFace
SWE-Lego Python No 3,251 32,119 Yes GitHub HuggingFace
SWE-rebench Python No 3,468 21,336 Yes GitHub HuggingFace
SWE-bench-train Python No 37 19k No GitHub HuggingFace
SWE-Flow Python No 74 18,081 Yes GitHub
Skywork-SWE Python No 2,531 10,169 Yes -
R2E-Gym Python No 10 8,135 Yes GitHub HuggingFace
RepoForge Python No - 7.3k Yes -
SWE-bench-extra Python No 2k 6.38k Yes HuggingFace
SWE-Gym Python No 11 2,438 Yes GitHub HuggingFace
SWE-bench Python No 12 2,294 Yes GitHub HuggingFace
SWE-bench-java Java No 19 1,797 Yes GitHub HuggingFace
FEA-bench Python No 83 1,401 Yes GitHub HuggingFace
SWE-bench-Live Python No 164 1,565 Yes GitHub HuggingFace
Loc-Bench Python No - 560 No GitHub HuggingFace
SWE-bench Verified Python No - 500 Yes GitHub HuggingFace
SWE-bench Lite Python No 12 300 Yes GitHub HuggingFace
SWE-MERA Python No 200 300 Yes GitHub HuggingFace
SWE-Bench-CL Python No 8 273 Yes GitHub
SWE-Sharp-Bench C# No 17 150 Yes GitHub HuggingFace
SWE-Perf Python No 12 140 Yes GitHub HuggingFace
Visual SWE-bench Python Yes 11 133 Yes GitHub HuggingFace
SWE-EVO Python No 7 48 Yes GitHub
Multi-PL Datasets
SWE-Mirror Python, Rust, Go No 40 60k Yes -
Multi-SWE-bench Java, JS, TS, Go, Rust, C, C++ No 76 4,723 Yes GitHub HuggingFace
Swing-Bench Python, Go, C++, Rust No 400 2300 Yes -
SWE-PolyBench Python, Java, JS, TS No 21 2,110 Yes GitHub HuggingFace HuggingFace
SWE-Compass Python, JS, TS, Java, C, C++, Go, Rust, Kotlin, C# No - 2,000 Yes GitHub HuggingFace
SWE-Bench Pro Python, Go, TS No 41 1,865 Yes GitHub HuggingFace
SWE-bench++ Python, Go, TS, JS, Ruby, PHP, Java, Rust, C++, C#, C No 3,971 1,782 Yes GitHub HuggingFace
SWE-Lancer JS, TS No - 1,488 Yes GitHub
OmniGIRL Python, TS, Java, JS Yes 15 959 Yes GitHub HuggingFace
SWE-bench Multimodal JS, TS, HTML, CSS Yes 17 619 Yes GitHub HuggingFace
SWE-fficiency Python, Cython No 9 498 Yes GitHub
SWE-Factory Python, Java, JS, TS No 12 430 Yes GitHub HuggingFace
SWE-bench-Live-MultiLang \& Windows Python, JS, TS, C, C++, C#, Java, Go, Rust No 238 418 Yes GitHub HuggingFace HuggingFace
SWE-bench Multilingual C, C++, Go, Java, JS, TS, Rust, Python, Ruby, PHP No 42 300 Yes GitHub HuggingFace
SWE-InfraBench Python, TS No - 100 Yes -

Training Trajectory Datasets

A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.

Dataset Language Repos Amount Link
SWE-Fixer Python 856 69,752 GitHub HuggingFace
SWE-rebench Python 1,823 67,074 HuggingFace
R2E-Gym Python 10 3,321 GitHub HuggingFace
SWE-Synth Python 11 3,018 GitHub HuggingFace
SWE-Factory Python 10 2,809 GitHub HuggingFace
SWE-Gym Python 11 491 GitHub HuggingFace
SWE-Lego Python 3251 14.6k GitHub

SFT-based Methods

Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).

Model Name Base Model Size Arch. Training Scaffold Res.(%) Code Data Model
SWE-rebench-openhands-Qwen3-235B-A22B Qwen3-235B-A22B 235B-A22B MoE OpenHands 59.9 - HuggingFace HuggingFace
SWE-Lego-Qwen3-32B Qwen3-32B 32B Dense OpenHands 57.6 GitHub HuggingFace HuggingFace
CGM-SWE-PY Qwen2.5-Coder-72B 72B Dense Graph RAG 50.4 GitHub - HuggingFace
SWE-rebench-openhands-Qwen3-30B-A3B Qwen3-30B-A3B 30B-A3B MoE OpenHands 49.7 - HuggingFace HuggingFace
Devstral Mistral Small 3 22B Dense OpenHands 46.8 - - HuggingFace
Co-PatcheR Qwen2.5-Coder-14B 3$\times$14B Dense PatchPilot-mini 46.0 GitHub - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense Agentless 45.0 GitHub HuggingFace HuggingFace
SWE-Lego-Qwen3-8B Qwen3-8B 8B Dense OpenHands 44.4 GitHub HuggingFace HuggingFace
Lingma SWE-GPT Qwen2.5-72B-Instruct 72B Dense SWESynInfer 30.2 GitHub - -
SWE-Gym-Qwen-32B Qwen2.5-Coder-32B 32B Dense OpenHands, MoatlessTools 20.6 GitHub - HuggingFace
SWE-Gym-Qwen-14B Qwen2.5-Coder-14B 14B Dense OpenHands, MoatlessTools 16.4 GitHub - HuggingFace
SWE-Gym-Qwen-7B Qwen2.5-Coder-7B 7B Dense OpenHands, MoatlessTools 10.6 GitHub - HuggingFace

RL-based Methods

A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.

Model Name Base Model Size Arch. Train. Scaffold Reward Res.(%) Code Data Model
560B Models (MoE)
LongCat-Flash-Think LongCatFlash-Base 560B-A27B MoE R2E-Gym Outcome 60.4 GitHub - HuggingFace
72B Models
Kimi-Dev Qwen 2.5-72B-Base 72B Dense BugFixer + TestWriter Outcome 60.4 GitHub - HuggingFace
SWE-RL Llama-3.3-70B-Instruct 70B Dense Agentless-mini Outcome 41.0 GitHub - -
Multi-turn RL(Nebius) Qwen2.5-72B-Instruct 72B Dense SWE-agent Outcome 39.0 - - -
Agent-RLVR-RM-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 27.8 - - -
Agent-RLVR-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 22.4 - - -
32B Models
OpenHands Critic Qwen2.5-Coder-32B 32B Dense SWE-Gym - 66.4 GitHub - HuggingFace
KAT-Dev-32B Qwen3-32B 32B Dense - - 62.4 - - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense - Outcome 60.2 GitHub HuggingFace HuggingFace
FoldAgent Seed-OSS-36B-Instruct 36B Dense FoldAgent Process 58.0 GitHub - -
SeamlessFlow-32B Qwen3-32B 32B Dense SWE-agent Outcome 45.8 GitHub - -
DeepSWE Qwen3-32B 32B Dense R2E-Gym Outcome 42.2 GitHub HuggingFace HuggingFace
SA-SWE-32B - 32B Dense SkyRL-Agent - 39.4 - - -
OpenHands LM v0.1 Qwen2.5-Coder-32B 32B Dense SWE-Gym - 37.2 GitHub - HuggingFace
SWE-Dev-32B Qwen2.5-Coder-32B 32B Dense OpenHands Outcome 36.6 GitHub - HuggingFace
Satori-SWE Qwen2.5-Coder-32B 32B Dense Retriever + Code editor Outcome 35.8 GitHub HuggingFace HuggingFace
SoRFT-32B Qwen2.5-Coder-32B 32B Dense Agentless Outcome 30.8 - - -
Agent-RLVR-32B Qwen2.5-Coder-32B 32B Dense Localization + Repair Outcome 21.6 - - -
14B Models
Agent-RLVR-14B Qwen2.5-Coder-14B 14B Dense Localization + Repair Outcome 18.0 - - -
SEAlign-14B Qwen2.5-Coder-14B 14B Dense OpenHands Process 17.7 - - -
7-8B Models
SeamlessFlow-8B Qwen3-8B 8B Dense SWE-agent Outcome 27.4 GitHub - -
SWE-Dev-7B Qwen2.5-Coder-7B 7B Dense OpenHands Outcome 23.4 GitHub - HuggingFace
SoRFT-7B Qwen2.5-Coder-7B 7B Dense Agentless Outcome 21.4 - - -
SWE-Dev-8B Llama-3.1-8B 8B Dense OpenHands Outcome 18.0 GitHub - HuggingFace
SEAlign-7B Qwen2.5-Coder-7B 7B Dense OpenHands Process 15.0 - - -
SWE-Dev-9B GLM-4-9B 9B Dense OpenHands Outcome 13.6 GitHub - HuggingFace

General Foundation Models

Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.

Model Name Size Arch. Inf. Scaffold Reward Res.(%) Code Model
KAT-Coder - - Claude Code Outcome 73.4 - Website
MiMo-V2-Flash 309B-A15B MoE Agentless Outcome 73.4 GitHub HuggingFace
Deepseek V3.2 671B-A37B MoE Claude Code, RooCode - 73.1 GitHub HuggingFace
Kimi-K2-Instruct 1T MoE Agentless Outcome 71.6 - HuggingFace
Qwen3-Coder 480B-A35B MoE OpenHands Outcome 69.6 GitHub HuggingFace
GLM-4.6 355B-A32B MoE OpenHands Outcome 68.0 - HuggingFace
gpt-oss-120b 116.8B-A5.1B MoE Internal tool Outcome 62.0 GitHub HuggingFace
Minimax M2 230B-10B MoE R2E-Gym Outcome 61.0 GitHub HuggingFace
gpt-oss-20b 20.9B-A3.6B MoE Internal tool Outcome 60.0 GitHub HuggingFace
GLM-4.5-Air 106B-A12B MoE OpenHands Outcome 57.6 - -
Minimax M1-80k 456B-A45.9B MoE Agentless Outcome 56.0 GitHub Website
Minimax M1-40k 456B-A45.9B MoE Agentless Outcome 55.6 GitHub Website
Seed1.5-Thinking 200B-A20B MoE - Outcome 47.0 GitHub -
Llama 4 Maverick 400B-A17B MoE mini-SWE-agent Outcome 21.0 GitHub HuggingFace
Llama 4 Scout 109B-17B MoE mini-SWE-agent Outcome 9.1 GitHub HuggingFace