A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations

Goldstein, Ariel; Wang, Haocheng; Niekerken, Leonard; Schain, Mariano; Zada, Zaid; Aubrey, Bobbi; Sheffer, Tom; Nastase, Samuel A.; Gazula, Harshvardhan; Singh, Aditi; Rao, Aditi; Choe, Gina; Kim, Catherine; Doyle, Werner; Friedman, Daniel; Devore, Sasha; Dugan, Patricia; Hassidim, Avinatan; Brenner, Michael; Matias, Yossi; Devinsky, Orrin; Flinker, Adeen; Hasson, Uri

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations

Ariel Goldstein (), Haocheng Wang, Leonard Niekerken, Mariano Schain, Zaid Zada, Bobbi Aubrey, Tom Sheffer, Samuel A. Nastase, Harshvardhan Gazula, Aditi Singh, Aditi Rao, Gina Choe, Catherine Kim, Werner Doyle, Daniel Friedman, Sasha Devore, Patricia Dugan, Avinatan Hassidim, Michael Brenner, Yossi Matias, Orrin Devinsky, Adeen Flinker and Uri Hasson
Additional contact information
Ariel Goldstein: Hebrew University
Haocheng Wang: Princeton University
Leonard Niekerken: Princeton University
Mariano Schain: Google Research
Zaid Zada: Princeton University
Bobbi Aubrey: Princeton University
Tom Sheffer: Google Research
Samuel A. Nastase: Princeton University
Harshvardhan Gazula: Princeton University
Aditi Singh: Princeton University
Aditi Rao: Princeton University
Gina Choe: Princeton University
Catherine Kim: Princeton University
Werner Doyle: New York University School of Medicine
Daniel Friedman: New York University School of Medicine
Sasha Devore: New York University School of Medicine
Patricia Dugan: New York University School of Medicine
Avinatan Hassidim: Google Research
Michael Brenner: Google Research
Yossi Matias: Google Research
Orrin Devinsky: New York University School of Medicine
Adeen Flinker: New York University School of Medicine
Uri Hasson: Princeton University

Nature Human Behaviour, 2025, vol. 9, issue 5, 1041-1055

Abstract: Abstract This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model (Whisper). We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension. Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model. The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and language processing, where sensory and motor regions better align with the model’s speech embeddings, and higher-level language areas better align with the model’s language embeddings. The Whisper model captures the temporal sequence of language-to-speech encoding before word articulation (speech production) and speech-to-language encoding post articulation (speech comprehension). The embeddings learned by this model outperform symbolic models in capturing neural activity supporting natural speech and language. These findings support a paradigm shift towards unified computational models that capture the entire processing hierarchy for speech comprehension and production in real-world conversations.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41562-025-02105-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nathum:v:9:y:2025:i:5:d:10.1038_s41562-025-02105-9

Ordering information: This journal article can be ordered from
https://www.nature.com/nathumbehav/

DOI: 10.1038/s41562-025-02105-9

Access Statistics for this article

Nature Human Behaviour is currently edited by Stavroula Kousta

More articles in Nature Human Behaviour from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().