Accidentally Solving Robotics by Watching 1 Million Hours of YouTube

2025-06-30
Accidentally Solving Robotics by Watching 1 Million Hours of YouTube

Researchers accidentally solved a long-standing robotics problem by training a model called V-JEPA 2 on one million hours of YouTube videos. Instead of predicting the next word, V-JEPA 2 predicts the next moment in reality, learning to understand physics through observation. Unlike previous language-dependent models, V-JEPA 2 demonstrates impressive zero-shot generalization, successfully completing complex tasks like grasping and placing objects in unseen environments. While limitations like camera pose sensitivity and long-horizon drift remain, this research opens new avenues for robotics, hinting at a future where robots might possess comprehension comparable to ChatGPT.

Read more
AI