TarFlow: Transformer-based Normalizing Flows Achieve SOTA Image Likelihood Estimation

Researchers introduce TarFlow, a novel normalizing flow model leveraging Transformers and masked autoregressive flows. TarFlow efficiently estimates density and generates images by processing image patches with autoregressive Transformer blocks, alternating the autoregression direction between layers. Three key techniques boost sample quality: Gaussian noise augmentation during training, post-training denoising, and an effective guidance method for both class-conditional and unconditional generation. TarFlow achieves state-of-the-art results in image likelihood estimation, significantly outperforming previous methods and generating samples comparable in quality and diversity to diffusion models—a first for a standalone normalizing flow model.
Read more