CPython's Performance Boost: A Tale of Unexpected Twists and LLVM Regressions

2025-03-10

A recent CPython merge introduced a new bytecode interpreter implementation, initially showing impressive 10-15% performance gains across various benchmarks. However, this improvement stemmed from inadvertently circumventing an LLVM 19 regression. When benchmarked against a better baseline (like GCC or tuned LLVM 19), the actual performance boost shrinks to a mere 1-5%. Weeks of compiling, benchmarking, and disassembling revealed that LLVM 19's limitations on tail-call duplication caused the regression. While the tail-call interpreter is a valuable improvement, this incident highlights challenges in benchmarking, performance engineering, and software engineering, such as the difficulty in choosing baselines and understanding the complexities of compiler optimizations.

Read more
Development