GPUPrefixSums: Portable GPU Prefix Sum Library for High-Performance Computing

2025-08-28
GPUPrefixSums: Portable GPU Prefix Sum Library for High-Performance Computing

GPUPrefixSums brings state-of-the-art GPU prefix sum techniques from CUDA to portable compute shaders. It introduces 'Decoupled Fallback,' a novel technique enabling prefix sum calculations even on devices lacking forward thread progress guarantees. The D3D12 implementation includes a comprehensive survey of algorithms, benchmarked against Nvidia's CUB library. Versions are available for Unity and as a barebones testbed. GPUPrefixSums aims to improve efficiency and portability, supporting parallel computing tasks like sorting, compression, and graph traversal.

Development prefix sum