Pointers Are Complicated II: The Subtle Bugs in LLVM Optimizations
2025-02-02
This post delves into the importance of precise semantics for compiler intermediate representations (IRs), especially for languages like C, C++, and Rust that allow unsafe pointer manipulation. The author demonstrates how a sequence of three seemingly correct LLVM compiler optimizations can lead to incorrect program behavior. The root cause is pointer provenance – the extra information embedded in a pointer beyond its memory address, detailing how it was computed. This necessitates a more precise LLVM IR specification, including a precise definition of undefined behavior (UB). Treating compiler IRs as standalone programming languages with rigorous specifications is key to resolving such issues.
Development
pointer semantics