Dia: A 1.6B Parameter Text-to-Speech Model from Nari Labs

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Dia: A 1.6B Parameter Text-to-Speech Model from Nari Labs

2025-04-21

Nari Labs introduces Dia, a 1.6B parameter text-to-speech model capable of generating highly realistic dialogue directly from transcripts. Users can control emotion and tone by conditioning the output on audio, and the model even produces nonverbal cues like laughter and coughs. To accelerate research, pretrained model checkpoints and inference code are available on Hugging Face. A demo page compares Dia to ElevenLabs Studio and Sesame CSM-1B. While currently requiring around 10GB VRAM and GPU support (CPU support coming soon), Dia generates roughly 40 tokens/second on an A4000 GPU. A quantized version is planned for improved memory efficiency. The model is licensed under Apache License 2.0 and strictly prohibits misuse such as identity theft, generating deceptive content, or illegal activities.

(github.com)

AFRINIC Election: A Power Struggle for Control of Africa's Internet Future

Solving Decentralized Social Media's URI Problem