Multimodal Siamese Networks for Dementia Detection from Speech in Women

2025-08-24
Multimodal Siamese Networks for Dementia Detection from Speech in Women

This study leverages a multimodal Siamese network to detect dementia from speech data, specifically focusing on female participants. Utilizing audio recordings and transcripts from the Pitt Corpus within the Dementia Bank database, the research employs various audio analysis techniques (MFCCs, zero-crossing rate, etc.) and text preprocessing methods. A multimodal Siamese network is developed, combining audio and text features to enhance dementia detection accuracy. Data augmentation techniques are implemented to improve model robustness. The study offers a comprehensive approach to multimodal learning in the context of dementia diagnosis.