Goku: Flow-Based Video Generative Foundation Models Achieve SOTA Performance
2025-02-15
A collaborative team from ByteDance and HKU introduces Goku, a family of image and video generation models based on rectified flow Transformers. Goku achieves industry-leading visual generation performance through meticulous data curation, advanced model design, and flow formulation. Supporting text-to-video, image-to-video, and text-to-image generation, Goku achieves top scores on major benchmarks like GenEval, DPG-Bench, and VBench. Notably, Goku-T2V scored 84.85 on VBench, placing it second overall as of October 7th, 2024, surpassing several leading commercial text-to-video models.