Beyond Chained LLM Calls: Differentiable Routing for Efficient LLMs

2025-07-06
Beyond Chained LLM Calls: Differentiable Routing for Efficient LLMs

Modern large language model (LLM) agent architectures heavily rely on chaining LLM calls, resulting in high costs, latency, and poor scalability. This paper introduces a differentiable router that models tool selection as a trainable function, instead of relying on LLMs. This approach learns tool selection from data via reinforcement learning or supervised fine-tuning, running outside the LLM. It avoids external API calls, improves determinism and composability, and reduces costs. Experiments show that this method significantly reduces costs, improves performance, and clarifies model behavior, marking a step towards LLM systems that look less like prompt chains and more like programs.

Read more