Bazel Caching, Remote Execution, and glibc Version Mismatch Crash Production

This article details a production crash caused by the interaction between Bazel caching, remote execution, and differing glibc versions across environments. A developer builds and tests a change locally, CI leverages the cache to build a release, but deployment to production fails due to a missing 'GLIBC_2.28' version. The article analyzes how glibc version discrepancies break build reproducibility and presents solutions: a quick hack involves capturing local and remote glibc versions, selecting the higher one for the C++ toolchain; a more robust solution restricts Action Cache writes, forcing builds to run on remote executors; the ultimate solution utilizes sysroots, installing multiple glibc versions across environments and explicitly specifying which to use. The article stresses the importance of reproducible builds, recommending solutions based on context.