Compiler Optimization & Load-Store Conflicts: A Performance Cliffhanger

2025-05-04

This article details an unexpected performance issue: a simple geometry decoder shows massive performance variations across different compiler versions. The root cause? A little-known microarchitectural detail: load-store conflicts. GCC-14 cleverly vectorized the code, resulting in a performance boost. However, GCC-15 regressed significantly due to altered optimization strategies, leading to frequent load-store conflicts. Clang, surprisingly, excelled on ARM architectures by leveraging the load-store characteristics. This highlights that compiler optimization isn't a silver bullet; close attention to generated code and underlying hardware microarchitecture is crucial.

Read more

Astonishing Discrepancies: A Comparison of Acceleration Structure Memory Usage Across GPUs

2025-04-02

This article benchmarks the memory consumption of building acceleration structures (BVHs) for ray tracing across different vendor GPUs. The results reveal significant discrepancies, with the latest NVIDIA GPUs using only one-third or even one-twentieth the memory of AMD counterparts. The article delves into the internal structure of BVHs, contrasting different driver implementations and hardware architecture effects. It analyzes the BVH implementation details of AMD's RDNA2/3 and RDNA4 architectures, explaining the reasons behind the memory usage differences. Finally, the author concludes that BVH memory consumption is heavily influenced by hardware, drivers, and algorithms, and projects future improvement potential.

Read more