Auto-AVSR: Open-Source Lip-Reading Speech Recognition Framework Achieves SOTA
2025-02-03
Auto-AVSR is an open-source, end-to-end audio-visual speech recognition (AV-ASR) framework focusing on visual speech (lip-reading). Achieving a word error rate (WER) of 20.3% for visual speech recognition (VSR) and 1.0% for audio speech recognition (ASR) on the LRS3 benchmark, it provides code and tutorials for training, evaluation, and API usage, supporting multi-node training. Users can leverage pre-trained models or train from scratch, customizing hyperparameters as needed.
AI
lip reading