Posts by Collection

publications

GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git

Published in REALM Workshop at ACL 2025 (Spotlight), 2025

A benchmark for evaluating AI agent performance on version control tasks, covering three core Git scenarios with datasets derived from open-source repositories.

Recommended citation: Lindenbauer, T., Bogomolov, E., & Zharov, Y. (2025). "GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git." Proceedings of the 1st Workshop for Research on Agent Language Models (REALM) at ACL 2025.
Download Paper

From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents

Published in REALM Workshop at ACL 2025, 2025

An investigation into episodic memory in software engineering agents, showing that repository-level understanding is pivotal for patch localization and that naive memory augmentation can introduce noise.

Recommended citation: Lindenbauer, T., Groh, G., & Schütze, H. (2025). "From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents." Proceedings of the 1st Workshop for Research on Agent Language Models (REALM) at ACL 2025.
Download Paper

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

Published in DL4C Workshop at NeurIPS 2025, 2025

We show that simple sliding window based observation masking matches the performance of LLM-based summarization for managing agent context, at a fraction of the computational cost.

Recommended citation: Lindenbauer, T., Slinko, I., Felder, L., Bogomolov, E., & Zharov, Y. (2025). "The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management." Deep Learning for Code Workshop (DL4C) at NeurIPS 2025.
Download Paper

Tobias Lindenbauer

Posts by Collection

publications

GitGoodBench: A Novel Benchmark for Evaluating Agentic Performance on Git

From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management