Projects

Here is a selection of projects I’ve worked on.

RouteLLM Reproduction

BERT · XLM-RoBERTa · HuggingFace Trainer · Dataset Rebalancing · LLM Routing · Benchmarking (MMLU, GSM8k, MT-bench)

This project is a reproduction and improvement of the BERT-based router from the RouteLLM paper. While the original framework aimed to save LLM costs without compromising quality, my reproduction revealed significant methodological flaws in the initial BERT router implementation.

Key Findings & Methodological Improvements:

Addressing Overfitting: I identified that the original BERT routers were overfitting to the majority class, achieving poor macro F1 scores (0.23-0.35). This was primarily due to a heavily skewed training dataset (51% strong model wins).
Dataset Rebalancing: To fix this, I implemented oversampling to balance the classes, which enabled meaningful training convergence and significantly improved routing performance.
Improved Performance: My rebalanced BERT router outperforms the original checkpoints on MMLU, GSM8k, and MT-bench benchmarks, even when using significantly less data (19k vs 130k+ samples).
Reproducible Pipeline: I developed a complete training and evaluation pipeline using HuggingFace Trainer and XLM-RoBERTa-base for 3-class classification (strong_win, tie, weak_win).

Source code

Tobias Lindenbauer

RouteLLM Reproduction