TScale: Training LLMs on Consumer Hardware
2025-05-04
TScale is a transformer model training and inference framework written in C++ and CUDA, designed to run on consumer-grade hardware. It achieves significant cost and time reductions through optimized architecture, low-precision computation (fp8 and int8), CPU offloading, and synchronous and asynchronous distributed training. Even a 1T parameter model becomes tractable with clever indexing techniques, enabling training on typical home computers. TScale demonstrates immense potential in lowering the barrier to entry for LLM training.