SpikingBrain: A Brain-Inspired, Highly Efficient Large Language Model

2025-09-14
SpikingBrain: A Brain-Inspired, Highly Efficient Large Language Model

SpikingBrain is a 7B parameter large language model inspired by brain mechanisms. It integrates hybrid efficient attention, MoE modules, and spike encoding, supported by a universal conversion pipeline compatible with the open-source model ecosystem. This allows for continual pre-training with less than 2% of the data while achieving performance comparable to mainstream open-source models. Furthermore, the framework, operators, parallel strategies, and communication primitives are adapted for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference. SpikingBrain achieves over 100x speedup in TTFT for 4M-token sequences, while spiking delivers over 69% sparsity at the micro level. Combined with macro-level MoE sparsity, these advancements provide valuable guidance for designing next-generation neuromorphic chips. The repository provides the full implementation and weights of SpikingBrain-7B, including HuggingFace, vLLM inference, and quantized versions, enabling flexible deployment and research across various scenarios.