Learning GPU Architecture Through Memory Bandwidth Microbenchmarks

Traverse Research delved deep into GPU architecture by measuring memory bandwidth across various GPUs using custom microbenchmarks. The article explores the complexities of GPU memory access, including descriptors, buffer types (byte address, structured, typed), and texture units. It also covers GPU memory hierarchy, cache policies (write-through, write-back, write-around), and latency hiding techniques. Experiments revealed significant differences in cache and VRAM bandwidth across architectures: the Meta Quest 3's Adreno 740 showed a dramatic bandwidth improvement using textures; the AMD Radeon RX 9070 XT exhibited differences between floating-point and integer loads; the Intel Arc B580 displayed unique patterns with varying data types; and the NVIDIA GeForce RTX 5070 Ti experienced bottlenecks with many writes to the same small memory area. These findings offer insights for optimizing GPU software performance, particularly in hardware-specific projects.