AI-Generated CUDA Kernels Outperform PyTorch?

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

AI-Generated CUDA Kernels Outperform PyTorch?

2025-05-30

Researchers used large language models and a novel branching search strategy to automatically generate pure CUDA-C kernels without relying on libraries like CUTLASS or Triton. Surprisingly, these AI-generated kernels in some cases outperform even expert-optimized production kernels in PyTorch, achieving nearly 2x speedup on Conv2D. The method leverages natural language reasoning about optimization strategies and a branching search to explore multiple hypotheses in parallel, effectively avoiding local optima. While FP16 matrix multiplication and Flash Attention performance still needs improvement, this research opens a new frontier in high-performance kernel autogeneration, hinting at the immense potential of AI in compiler optimization.

(crfm.stanford.edu)

AI Kernel Generation

Stack AI Hiring 10x Frontend Engineers to Democratize AI

Gmail's AI Email Summarization: Useful but Disableable