How Modern CPUs Efficiently Predict Bytecode Interpreter Loops

2025-07-05

While investigating the performance of a new Python interpreter, the author discovered that modern CPUs can efficiently predict indirect jumps within bytecode interpreter loops. This is achieved through advanced branch predictors like TAGE and ITTAGE. These predictors map the program counter (PC) and its history to past execution behavior, using multiple tables with geometrically increasing history lengths to dynamically choose the best prediction. The author explores applying ITTAGE's principles to coverage-guided fuzzing and program state exploration, suggesting it could lead to better understanding and exploration of interpreters and similar programs.

Read more

CPython's Performance Boost: A Tale of Unexpected Twists and LLVM Regressions

2025-03-10

A recent CPython merge introduced a new bytecode interpreter implementation, initially showing impressive 10-15% performance gains across various benchmarks. However, this improvement stemmed from inadvertently circumventing an LLVM 19 regression. When benchmarked against a better baseline (like GCC or tuned LLVM 19), the actual performance boost shrinks to a mere 1-5%. Weeks of compiling, benchmarking, and disassembling revealed that LLVM 19's limitations on tail-call duplication caused the regression. While the tail-call interpreter is a valuable improvement, this incident highlights challenges in benchmarking, performance engineering, and software engineering, such as the difficulty in choosing baselines and understanding the complexities of compiler optimizations.

Read more
Development