Multimodal Siamese Networks for Dementia Detection from Speech in Women
2025-08-24

This study leverages a multimodal Siamese network to detect dementia from speech data, specifically focusing on female participants. Utilizing audio recordings and transcripts from the Pitt Corpus within the Dementia Bank database, the research employs various audio analysis techniques (MFCCs, zero-crossing rate, etc.) and text preprocessing methods. A multimodal Siamese network is developed, combining audio and text features to enhance dementia detection accuracy. Data augmentation techniques are implemented to improve model robustness. The study offers a comprehensive approach to multimodal learning in the context of dementia diagnosis.