YuE: Open Foundation Model for Long-Form Music Generation
Researchers introduce YuE, a family of open foundation models based on LLaMA2, tackling the challenging lyrics-to-song problem in long-form music generation. YuE generates up to five minutes of music, maintaining lyrical alignment, coherent structure, and engaging melodies with accompaniment. This is achieved through track-decoupled next-token prediction, structural progressive conditioning, and a multitask, multiphase pre-training recipe. Improved in-context learning enables versatile style transfer (e.g., Japanese city pop to English rap) and bidirectional generation. Evaluations show YuE matching or exceeding proprietary systems in musicality and vocal agility. Fine-tuning adds controls and tail language support. YuE's representations also excel in music understanding tasks, achieving state-of-the-art results on the MARBLE benchmark.
Read more