Posted on by Dr. Francis Collins
Credit: Adapted from Nima Mesgarani, Columbia University’s Zuckerman Institute, New York
Computers have learned to do some amazing things, from beating the world’s ranking chess masters to providing the equivalent of feeling in prosthetic limbs. Now, as heard in this brief audio clip counting from zero to nine, an NIH-supported team has combined innovative speech synthesis technology and artificial intelligence to teach a computer to read a person’s thoughts and translate them into intelligible speech.
Turning brain waves into speech isn’t just fascinating science. It might also prove life changing for people who have lost the ability to speak from conditions such as amyotrophic lateral sclerosis (ALS) or a debilitating stroke.
When people speak or even think about talking, their brains fire off distinctive, but previously poorly decoded, patterns of neural activity. Nima Mesgarani and his team at Columbia University’s Zuckerman Institute, New York, wanted to learn how to decode this neural activity.
Mesgarani and his team started out with a vocoder, a voice synthesizer that produces sounds based on an analysis of speech. It’s the very same technology used by Amazon’s Alexa, Apple’s Siri, or other similar devices to listen and respond appropriately to everyday commands.
As reported in Scientific Reports, the first task was to train a vocoder to produce synthesized sounds in response to brain waves instead of speech . To do it, Mesgarani teamed up with neurosurgeon Ashesh Mehta, Hofstra Northwell School of Medicine, Manhasset, NY, who frequently performs brain mapping in people with epilepsy to pinpoint the sources of seizures before performing surgery to remove them.
In five patients already undergoing brain mapping, the researchers monitored activity in the auditory cortex, where the brain processes sound. The patients listened to recordings of short stories read by four speakers. In the first test, eight different sentences were repeated multiple times. In the next test, participants heard four new speakers repeat numbers from zero to nine.
From these exercises, the researchers reconstructed the words that people heard from their brain activity alone. Then the researchers tried various methods to reproduce intelligible speech from the recorded brain activity. They found it worked best to combine the vocoder technology with a form of computer artificial intelligence known as deep learning.
Deep learning is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others. In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.
In this case, the researchers used the deep learning networks to interpret the sounds produced by the vocoder in response to the brain activity patterns. When the vocoder-produced sounds were processed and “cleaned up” by those neural networks, it made the reconstructed sounds easier for a listener to understand as recognizable words, though this first attempt still sounds pretty robotic.
The researchers will continue testing their system with more complicated words and sentences. They also want to run the same tests on brain activity, comparing what happens when a person speaks or just imagines speaking. They ultimately envision an implant, similar to those already worn by some patients with epilepsy, that will translate a person’s thoughts into spoken words. That might open up all sorts of awkward moments if some of those thoughts weren’t intended for transmission!
Along with recently highlighted new ways to catch irregular heartbeats and cervical cancers, it’s yet another remarkable example of the many ways in which computers and artificial intelligence promise to transform the future of medicine.
 Towards reconstructing intelligible speech from the human auditory cortex. Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N. Sci Rep. 2019 Jan 29;9(1):874.
Advances in Neuroprosthetic Learning and Control. Carmena JM. PLoS Biol. 2013;11(5):e1001561.
Nima Mesgarani (Columbia University, New York)
NIH Support: National Institute on Deafness and Other Communication Disorders; National Institute of Mental Health