MLC-LLM: Bringing Competitive LLM Inference to AMD GPUs

2024-12-24

NVIDIA GPUs have long dominated the Large Language Model (LLM) inference landscape. However, the MLC-LLM project leverages machine learning compilation to successfully deploy LLMs onto AMD GPUs, achieving impressive results. Using ROCm and Vulkan, the AMD Radeon RX 7900 XTX reaches 80% of the speed of the NVIDIA RTX 4090 and 94% of the RTX 3090 Ti for Llama2-7B/13B inference. This significantly enhances AMD GPU competitiveness and broadens LLM deployment options, extending to AMD APUs like those found in the Steam Deck. Future developments for MLC-LLM include optimizations for batching, multi-GPU support, expanded quantization and model architectures, and further bridging the performance gap with NVIDIA, ultimately addressing AI compute limitations.

Read more