Sesame AI Releases 1B Parameter Conversational Speech Model

2025-03-18
Sesame AI Releases 1B Parameter Conversational Speech Model

Sesame AI Labs has released CSM (Conversational Speech Model), a 1 billion parameter speech generation model based on the Llama architecture. CSM generates RVQ audio codes from text and audio inputs and its checkpoint is available on Hugging Face. An interactive voice demo and a Hugging Face space for testing audio generation are also provided. While capable of producing varied voices, CSM hasn't been fine-tuned to specific voices and has limited multilingual support. Sesame AI emphasizes its use for research and educational purposes only, prohibiting impersonation, misinformation, and illegal activities.