Luminal: A High-Performance Deep Learning Library with Search-Based Compilation
2025-08-20
Luminal is a deep learning library achieving high performance through search-based compilation. Its core is remarkably minimal, built upon just 12 primitive operations yet capable of supporting complex models like Transformers and convolutional networks. By aggressively fusing kernels and compiling shape-specific kernels at compile time, Luminal overcomes typical RISC limitations and automatically derives complex optimizations like Flash Attention. Its static compilation approach avoids runtime overhead, with Metal and CUDA support enabling fast execution on Macs and Nvidia GPUs. Significant performance gains have been demonstrated on models such as Llama 3 8B.
Development