A high-performance neuroprosthesis for speech decoding and avatar control

Metzger, Sean L.; Littlejohn, Kaylo T.; Silva, Alexander B.; Moses, David A.; Seaton, Margaret P.; Wang, Ran; Dougherty, Maximilian E.; Liu, Jessie R.; Wu, Peter; Berger, Michael A.; Zhuravleva, Inga; Tu-Chan, Adelyn; Ganguly, Karunesh; Anumanchipalli, Gopala K.; Chang, Edward F.

A high-performance neuroprosthesis for speech decoding and avatar control

Sean L. Metzger, Kaylo T. Littlejohn, Alexander B. Silva, David A. Moses, Margaret P. Seaton, Ran Wang, Maximilian E. Dougherty, Jessie R. Liu, Peter Wu, Michael A. Berger, Inga Zhuravleva, Adelyn Tu-Chan, Karunesh Ganguly, Gopala K. Anumanchipalli and Edward F. Chang ()
Additional contact information
Sean L. Metzger: University of California, San Francisco
Kaylo T. Littlejohn: University of California, San Francisco
Alexander B. Silva: University of California, San Francisco
David A. Moses: University of California, San Francisco
Margaret P. Seaton: University of California, San Francisco
Ran Wang: University of California, San Francisco
Maximilian E. Dougherty: University of California, San Francisco
Jessie R. Liu: University of California, San Francisco
Peter Wu: University of California, Berkeley
Michael A. Berger: Speech Graphics Ltd
Inga Zhuravleva: University of California, Berkeley
Adelyn Tu-Chan: University of California, San Francisco
Karunesh Ganguly: University of California, San Francisco
Gopala K. Anumanchipalli: University of California, San Francisco
Edward F. Chang: University of California, San Francisco

Nature, 2023, vol. 620, issue 7976, 1037-1046

Abstract: Abstract Speech neuroprostheses have the potential to restore communication to people living with paralysis, but naturalistic speed and expressivity are elusive1. Here we use high-density surface recordings of the speech cortex in a clinical-trial participant with severe limb and vocal paralysis to achieve high-performance real-time decoding across three complementary speech-related output modalities: text, speech audio and facial-avatar animation. We trained and evaluated deep-learning models using neural data collected as the participant attempted to silently speak sentences. For text, we demonstrate accurate and rapid large-vocabulary decoding with a median rate of 78 words per minute and median word error rate of 25%. For speech audio, we demonstrate intelligible and rapid speech synthesis and personalization to the participant’s pre-injury voice. For facial-avatar animation, we demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures. The decoders reached high performance with less than two weeks of training. Our findings introduce a multimodal speech-neuroprosthetic approach that has substantial promise to restore full, embodied communication to people living with severe paralysis.

Date: 2023
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.nature.com/articles/s41586-023-06443-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:620:y:2023:i:7976:d:10.1038_s41586-023-06443-4

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-023-06443-4

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().