Training Large Diffusion Models on a Shoestring Budget: $1890
2025-01-16
Sony Research has open-sourced micro_diffusion, demonstrating how to train large-scale diffusion models on an extremely low budget ($1890). Using 37 million publicly available real and synthetic images, they trained a 1.16 billion parameter sparse transformer model, achieving an FID score of 12.7 on zero-shot generation on the COCO dataset. The project provides training code, dataset code, pre-trained model weights, and details a staged training process, including progressive training from low to high resolution and the use of patch masking to reduce training costs and improve efficiency.