Cerebras Shatters Inference Speed Record with Llama 4 Maverick 400B

2025-05-31
Cerebras Shatters Inference Speed Record with Llama 4 Maverick 400B

Cerebras Systems has achieved a groundbreaking inference speed of over 2,500 tokens per second (TPS) on Meta's Llama 4 Maverick 400B parameter model, more than doubling Nvidia's performance. This record-breaking speed, independently verified by Artificial Analysis, is crucial for AI applications like agents, code generation, and complex reasoning, significantly reducing latency and improving user experience. Unlike Nvidia's solution which relied on unavailable custom optimizations, Cerebras' performance is readily accessible via Meta's upcoming API, offering a superior solution for developers and enterprise AI users.

AI