RoboPianist: Mastering the Piano with Deep Reinforcement Learning
Researchers trained anthropomorphic robot hands to play the piano using deep reinforcement learning. They built a simulated environment using MuJoCo, featuring an 88-key digital keyboard and two Shadow Dexterous Hands, each with 24 degrees of freedom. MIDI files were converted into time-indexed note trajectories, serving as the goal representation for the reinforcement learning agent. To address the exploration challenge in the high-dimensional action space, human priors in the form of fingering labels were incorporated into the reward function. A state-of-the-art model-free RL algorithm, DroQ, was used to train the agent, resulting in successful piano performances across various pieces, achieving impressive F1 scores on the Etude-12 subset. The research also releases a simulated benchmark and dataset to advance high-dimensional control.