Optimizing WMMA Kernels on AMD RDNA 4 Architecture

2025-07-21
Optimizing WMMA Kernels on AMD RDNA 4 Architecture

AMD RDNA 4 architecture GPUs, featuring 3rd-generation Matrix Cores, significantly improve Generalized Matrix Multiplication (GEMM) performance. This article delves into optimizing matrix operations using WMMA (Wave Matrix Multiply Accumulate) intrinsics within HIP kernels on RDNA 4 GPUs. It explains WMMA's functionality, the use of new intrinsics like `__builtin_amdgcn_wmma_f32_16x16x16_f16_w32_gfx12`, and key differences from RDNA 3. A simplified Multilayer Perceptron (MLP) implementation example showcases efficient matrix computation on RDNA 4.

Read more
Hardware

Beyond Triangles: A Novel Quadrilateral Rendering Approach

2025-04-11
Beyond Triangles: A Novel Quadrilateral Rendering Approach

Real-time computer graphics has long relied on triangles due to GPUs' native support for hardware-accelerated rasterization of triangles only. This leads to C^1 discontinuities in vertex attributes like texture coordinates and normals along the shared edge when quadrilaterals are split into triangles. This article presents a novel method that preserves C^1 continuity across the common edge of two triangles generated from convex quadrilaterals using an algebraic solution for bilinear interpolation coefficients expressed in barycentric coordinates. The method is implemented across Geometry, Tessellation, and Mesh shaders, significantly improving rendering quality with negligible computational overhead.

Read more