Modular Unveils MAX 24.6: A Native GPU Generative AI Platform

2024-12-17

Modular has released MAX 24.6, a native GPU generative AI platform designed to redefine how AI is developed and deployed. At its core is MAX GPU, a vertically integrated generative AI serving stack eliminating reliance on vendor-specific computation libraries like NVIDIA CUDA. Built on the high-performance AI model compiler and runtime MAX Engine and the Python-native serving layer MAX Serve, it supports the entire AI development lifecycle, from experimentation to production deployment. MAX 24.6 supports various hardware platforms, including NVIDIA A100, L40, L4, and A10 accelerators, with planned support for H100, H200, and AMD GPUs. It's compatible with Hugging Face models and provides an OpenAI-compatible client API. MAX 24.6 achieves a throughput of 3860 output tokens per second on Llama 3.1, matching vLLM's performance with a smaller Docker image size.

Read more