4Real-Video-V2: Efficient 4D Video Diffusion Model
2025-06-24
Snap Inc. and KAUST have collaborated on 4Real-Video-V2, a feedforward architecture-based 4D video diffusion model. It efficiently computes a 4D spatio-temporal grid of video frames and 3D Gaussian particles for each time step. The key is a sparse attention pattern allowing tokens to attend to others in the same frame, at the same timestamp, or from the same viewpoint. This makes it scalable to large pre-trained video models, efficient to train, and offers good generalization, achieving significant improvements without adding parameters to the base video model.