MiniMax-M1: A 456B Parameter Hybrid-Attention Reasoning Model

2025-06-18
MiniMax-M1: A 456B Parameter Hybrid-Attention Reasoning Model

MiniMax-M1, a groundbreaking open-weight, large-scale hybrid-attention reasoning model, boasts 456 billion parameters. Powered by a hybrid Mixture-of-Experts (MoE) architecture and a lightning attention mechanism, it natively supports a context length of 1 million tokens. Trained using large-scale reinforcement learning, MiniMax-M1 outperforms other leading models like DeepSeek R1 and Qwen3-235B on complex tasks, particularly in software engineering and long-context understanding. Its efficient test-time compute makes it a strong foundation for next-generation language model agents.