RenderFormer: Global Illumination Neural Rendering without Per-Scene Training

2025-06-01

RenderFormer is a neural rendering pipeline that directly renders an image from a triangle-based scene representation with full global illumination effects, requiring no per-scene training or fine-tuning. Instead of a physics-based approach, it formulates rendering as a sequence-to-sequence transformation: a sequence of tokens representing triangles with reflectance properties is converted into a sequence of output tokens representing small pixel patches. It uses a two-stage transformer-based pipeline: a view-independent stage modeling triangle-to-triangle light transport, and a view-dependent stage transforming ray bundles into pixel values guided by the view-independent stage. No rasterization or ray tracing is needed.