GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git
Published in REALM Workshop at ACL 2025 (Spotlight), 2025
GitGoodBench is a benchmark for evaluating the ability of AI agents to perform version control tasks. It covers three core Git scenarios sourced from real open-source repositories, with datasets of 900, 120, and 17,469 samples respectively. The benchmark provides a rigorous evaluation framework for agentic performance on practical software engineering workflows involving Git.
Recommended citation: Lindenbauer, T., Bogomolov, E., & Zharov, Y. (2025). "GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git." Proceedings of the 1st Workshop for Research on Agent Language Models (REALM) at ACL 2025.
Download Paper
