Helix: A Vision-Language-Action Model for General-Purpose Robotic Manipulation

2025-02-20
Helix: A Vision-Language-Action Model for General-Purpose Robotic Manipulation

Figure introduces Helix, a groundbreaking Vision-Language-Action (VLA) model unifying perception, language understanding, and learned control to overcome long-standing robotics challenges. Helix achieves several firsts: full upper-body high-rate continuous control, multi-robot collaboration, and the ability to pick up virtually any small household object using only natural language instructions. A single neural network learns all behaviors without task-specific fine-tuning, running on embedded low-power GPUs for commercial readiness. Helix's "System 1" (fast reactive visuomotor policy) and "System 2" (internet-pretrained VLM) architecture enables fast generalization and precise control, paving the way for scaling humanoid robots to home environments.