DeepEP: A High-Performance Communication Library for Mixture-of-Experts

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

DeepEP: A High-Performance Communication Library for Mixture-of-Experts

2025-02-25

DeepEP is a communication library designed for Mixture-of-Experts (MoE) and expert parallelism (EP), offering high-throughput and low-latency all-to-all GPU kernels (MoE dispatch and combine). It supports low-precision operations, including FP8. Optimized for the group-limited gating algorithm in DeepSeek-V3, DeepEP provides kernels for asymmetric-domain bandwidth forwarding (e.g., NVLink to RDMA). These kernels achieve high throughput, suitable for training and inference prefilling. SM (Streaming Multiprocessors) number control is also supported. For latency-sensitive inference decoding, low-latency kernels using pure RDMA minimize delays. A hook-based communication-computation overlap method is included, requiring no SM resources. The library is tested with InfiniBand and is theoretically compatible with RoCE.

(github.com)

Development GPU Communication

Court Halts Access to Sensitive Data After Unions Sue Over Privacy Breach

Microsoft Locks Out Older Intel CPUs from Windows 11 24H2