GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git
Published in REALM Workshop at ACL 2025 (Spotlight), 2025
A benchmark for evaluating AI agent performance on version control tasks, covering three core Git scenarios with datasets derived from open-source repositories.
Recommended citation: Lindenbauer, T., Bogomolov, E., & Zharov, Y. (2025). "GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git." Proceedings of the 1st Workshop for Research on Agent Language Models (REALM) at ACL 2025.
Download Paper
