3FS: A High-Performance Distributed File System for AI

2025-02-28
3FS: A High-Performance Distributed File System for AI

3FS is a high-performance distributed file system designed to tackle the challenges of AI training and inference workloads. Leveraging modern SSDs and RDMA networks, it provides a shared storage layer that simplifies the development of distributed applications. Key features include: exceptional performance and usability, strong consistency via CRAQ, standard file interfaces, and support for diverse workloads (data preparation, dataloaders, checkpointing, and KVCache for inference). Benchmarks demonstrate impressive results: up to 6.6 TiB/s read throughput on large clusters and 3.66 TiB/min sort throughput. KVCache significantly boosts LLM inference efficiency, reaching peak read throughput of 40 GiB/s. The project is open-source with detailed setup and run instructions.