Real-Time Chunking for Vision-Language-Action Models

2025-06-17

This paper introduces Real-Time Chunking (RTC), an algorithm addressing the real-time execution challenge of Vision-Language-Action (VLA) models in robotics. Traditional VLAs are slow and prone to discontinuities when switching between action chunks, leading to unstable robot behavior. RTC solves this by dividing actions into chunks and generating the next chunk while executing the previous one, achieving real-time performance and eliminating discontinuities. Experiments demonstrate RTC significantly improves execution speed and accuracy, maintaining robust performance even under high latency. This research paves the way for building robots capable of real-time complex task handling.

Read more