Factorio Learning Environment: A New Benchmark for LLMs

2025-03-11

Large Language Models (LLMs) are rapidly exceeding existing benchmarks, demanding new open-ended evaluations. The Factorio Learning Environment (FLE) is introduced, using the game Factorio to test agents on long-term planning, program synthesis, and resource optimization. FLE offers open-ended, exponentially scaling challenges—from basic automation to complex factories processing millions of resource units per second. Two settings are provided: lab-play with 24 structured tasks and fixed resources, and open-play, the unbounded task of building the largest factory from scratch on a procedurally generated map. Results show LLMs still lack strong spatial reasoning. In lab-play, LLMs show promise in short-term skills but fail in constrained environments, highlighting limitations in error analysis. In open-play, while LLMs discover automation strategies improving growth (e.g., electric drilling), they fail at complex automation (e.g., electronic circuit manufacturing).

Read more
AI