Whisper's Embeddings Surprisingly Align with Human Brain Activity During Speech

A study reveals a surprising alignment between OpenAI's Whisper speech recognition model and the neural activity in the human brain during natural conversations. By comparing Whisper's embeddings to brain activity in regions like the inferior frontal gyrus (IFG) and superior temporal gyrus (STG), researchers found that language embeddings peaked before speech embeddings during speech production, and vice-versa during comprehension. This suggests Whisper, despite not being designed with brain mechanisms in mind, captures key aspects of language processing. The findings also highlight a 'soft hierarchy' in brain language processing: higher-order areas like the IFG prioritize semantic and syntactic information but also process lower-level auditory features, while lower-order areas like the STG prioritize acoustic and phonemic processing but also capture word-level information.
Read more