WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models Paper β’ 2602.02537 β’ Published 8 days ago β’ 5
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper β’ 2503.06680 β’ Published Mar 9, 2025 β’ 20
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper β’ 2503.06680 β’ Published Mar 9, 2025 β’ 20
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper β’ 2406.20094 β’ Published Jun 28, 2024 β’ 104
The Prompt Report: A Systematic Survey of Prompting Techniques Paper β’ 2406.06608 β’ Published Jun 6, 2024 β’ 68
FlowMind: Automatic Workflow Generation with LLMs Paper β’ 2404.13050 β’ Published Mar 17, 2024 β’ 34
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper β’ 2404.16375 β’ Published Apr 25, 2024 β’ 18