SVDQuant: 3x Speedup on Blackwell GPUs with NVFP4
2025-02-22
MIT researchers have developed SVDQuant, a novel 4-bit quantization paradigm that leverages a low-rank branch to absorb outliers, resulting in significant performance gains on NVIDIA's Blackwell GPU architecture. Using the NVFP4 format, SVDQuant achieves better image quality than INT4 and is 3x faster than BF16, with a 3.5x reduction in memory usage. The research is open-sourced and includes an interactive demo.