Skip to main content

speech

From Brain Waves to Real-Time Text Messaging

Posted on by Lawrence Tabak, D.D.S., Ph.D.

For people who have lost the ability to speak due to a severe disability, they want to get the words out. They just can’t physically do it. But in our digital age, there is now a fascinating way to overcome such profound physical limitations. Computers are being taught to decode brain waves as a person tries to speak and then interactively translate them onto a computer screen in real time.

The latest progress, demonstrated in the video above, establishes that it’s quite possible for computers trained with the help of current artificial intelligence (AI) methods to restore a vocabulary of more than a 1,000 words for people with the mental but not physical ability to speak. That covers more than 85 percent of most day-to-day communication in English. With further refinements, the researchers say a 9,000-word vocabulary is well within reach.

The findings published in the journal Nature Communications come from a team led by Edward Chang, University of California, San Francisco [1]. Earlier, Chang and colleagues established that this AI-enabled system could directly decode 50 full words in real time from brain waves alone in a person with paralysis trying to speak [2]. The study is known as BRAVO, short for Brain-computer interface Restoration Of Arm and Voice.

In the latest BRAVO study, the team wanted to figure out how to condense the English language into compact units for easier decoding and expand that 50-word vocabulary. They did it in the same way we all do: by focusing not on complete words, but on the 26-letter alphabet.

The study involved a 36-year-old male with severe limb and vocal paralysis. The team designed a sentence-spelling pipeline for this individual, which enabled him to silently spell out messages using code words corresponding to each of the 26 letters in his head. As he did so, a high-density array of electrodes implanted over the brain’s sensorimotor cortex, part of the cerebral cortex, recorded his brain waves.

A sophisticated system including signal processing, speech detection, word classification, and language modeling then translated those thoughts into coherent words and complete sentences on a computer screen. This so-called speech neuroprosthesis system allows those who have lost their speech to perform roughly the equivalent of text messaging.

Chang’s team put their spelling system to the test first by asking the participant to silently reproduce a sentence displayed on a screen. They then moved on to conversations, in which the participant was asked a question and could answer freely. For instance, as in the video above, when the computer asked, “How are you today?” he responded, “I am very good.” When asked about his favorite time of year, he answered, “summertime.” An attempted hand movement signaled the computer when he was done speaking.

The computer didn’t get it exactly right every time. For instance, in the initial trials with the target sentence, “good morning,” the computer got it exactly right in one case and in another came up with “good for legs.” But, overall, their tests show that their AI device could decode with a high degree of accuracy silently spoken letters to produce sentences from a 1,152-word vocabulary at a speed of about 29 characters per minute.

On average, the spelling system got it wrong 6 percent of the time. That’s really good when you consider how common it is for errors to arise with dictation software or in any text message conversation.

Of course, much more work is needed to test this approach in many more people. They don’t yet know how individual differences or specific medical conditions might affect the outcomes. They suspect that this general approach will work for anyone so long as they remain mentally capable of thinking through and attempting to speak.

They also envision future improvements as part of their BRAVO study. For instance, it may be possible to develop a system capable of more rapid decoding of many commonly used words or phrases. Such a system could then reserve the slower spelling method for other, less common words.

But, as these results clearly demonstrate, this combination of artificial intelligence and silently controlled speech neuroprostheses to restore not just speech but meaningful communication and authentic connection between individuals who’ve lost the ability to speak and their loved ones holds fantastic potential. For that, I say BRAVO.

References:

[1] Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Metzger SL, Liu JR, Moses DA, Dougherty ME, Seaton MP, Littlejohn KT, Chartier J, Anumanchipalli GK, Tu-CHan A, Gangly K, Chang, EF. Nature Communications (2022) 13: 6510.

[2] Neuroprosthesis for decoding speech in a paralyzed person with anarthria. Moses DA, Metzger SL, Liu JR, Tu-Chan A, Ganguly K, Chang EF, et al. N Engl J Med. 2021 Jul 15;385(3):217-227.

Links:

Voice, Speech, and Language (National Institute on Deafness and Other Communication Disorders/NIH)

ECoG BMI for Motor and Speech Control (BRAVO) (ClinicalTrials.gov)

Chang Lab (University of California, San Francisco)

NIH Support: National Institute on Deafness and Other Communication Disorders


Can a Mind-Reading Computer Speak for Those Who Cannot?

Posted on by Dr. Francis Collins

Credit: Adapted from Nima Mesgarani, Columbia University’s Zuckerman Institute, New York

Computers have learned to do some amazing things, from beating the world’s ranking chess masters to providing the equivalent of feeling in prosthetic limbs. Now, as heard in this brief audio clip counting from zero to nine, an NIH-supported team has combined innovative speech synthesis technology and artificial intelligence to teach a computer to read a person’s thoughts and translate them into intelligible speech.

Turning brain waves into speech isn’t just fascinating science. It might also prove life changing for people who have lost the ability to speak from conditions such as amyotrophic lateral sclerosis (ALS) or a debilitating stroke.

When people speak or even think about talking, their brains fire off distinctive, but previously poorly decoded, patterns of neural activity. Nima Mesgarani and his team at Columbia University’s Zuckerman Institute, New York, wanted to learn how to decode this neural activity.

Mesgarani and his team started out with a vocoder, a voice synthesizer that produces sounds based on an analysis of speech. It’s the very same technology used by Amazon’s Alexa, Apple’s Siri, or other similar devices to listen and respond appropriately to everyday commands.

As reported in Scientific Reports, the first task was to train a vocoder to produce synthesized sounds in response to brain waves instead of speech [1]. To do it, Mesgarani teamed up with neurosurgeon Ashesh Mehta, Hofstra Northwell School of Medicine, Manhasset, NY, who frequently performs brain mapping in people with epilepsy to pinpoint the sources of seizures before performing surgery to remove them.

In five patients already undergoing brain mapping, the researchers monitored activity in the auditory cortex, where the brain processes sound. The patients listened to recordings of short stories read by four speakers. In the first test, eight different sentences were repeated multiple times. In the next test, participants heard four new speakers repeat numbers from zero to nine.

From these exercises, the researchers reconstructed the words that people heard from their brain activity alone. Then the researchers tried various methods to reproduce intelligible speech from the recorded brain activity. They found it worked best to combine the vocoder technology with a form of computer artificial intelligence known as deep learning.

Deep learning is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others. In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.

In this case, the researchers used the deep learning networks to interpret the sounds produced by the vocoder in response to the brain activity patterns. When the vocoder-produced sounds were processed and “cleaned up” by those neural networks, it made the reconstructed sounds easier for a listener to understand as recognizable words, though this first attempt still sounds pretty robotic.

The researchers will continue testing their system with more complicated words and sentences. They also want to run the same tests on brain activity, comparing what happens when a person speaks or just imagines speaking. They ultimately envision an implant, similar to those already worn by some patients with epilepsy, that will translate a person’s thoughts into spoken words. That might open up all sorts of awkward moments if some of those thoughts weren’t intended for transmission!

Along with recently highlighted new ways to catch irregular heartbeats and cervical cancers, it’s yet another remarkable example of the many ways in which computers and artificial intelligence promise to transform the future of medicine.

Reference:

[1] Towards reconstructing intelligible speech from the human auditory cortex. Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N. Sci Rep. 2019 Jan 29;9(1):874.

Links:

Advances in Neuroprosthetic Learning and Control. Carmena JM. PLoS Biol. 2013;11(5):e1001561.

Nima Mesgarani (Columbia University, New York)

NIH Support: National Institute on Deafness and Other Communication Disorders; National Institute of Mental Health


How the Brain Regulates Vocal Pitch

Posted on by Dr. Francis Collins

Credit: University of California, San Francisco

Whether it’s hitting a high note, delivering a punch line, or reading a bedtime story, the pitch of our voices is a vital part of human communication. Now, as part of their ongoing quest to produce a dynamic picture of neural function in real time, researchers funded by the NIH’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative have identified the part of the brain that controls vocal pitch [1].

This improved understanding of how the human brain regulates the pitch of sounds emanating from the voice box, or larynx, is more than cool neuroscience. It could aid in the development of new, more natural-sounding technologies to assist people who have speech disorders or who’ve had their larynxes removed due to injury or disease.


Creative Minds: A Baby’s Eye View of Language Development

Posted on by Dr. Francis Collins

Click to start videoIf you are a fan of wildlife shows, you’ve probably seen those tiny video cameras rigged to animals in the wild that provide a sneak peek into their secret domains. But not all research cams are mounted on creatures with fur, feathers, or fins. One of NIH’s 2014 Early Independence Award winners has developed a baby-friendly, head-mounted camera system (shown above) that captures the world from an infant’s perspective and explores one of our most human, but still imperfectly understood, traits: language.

Elika Bergelson

Elika Bergelson
Credit: Zachary T. Kern

Elika Bergelson, a young researcher at the University of Rochester in New York, wants to know exactly how and when infants acquire the ability to understand spoken words. Using innovative camera gear and other investigative tools, she hopes to refine current thinking about the natural timeline for language acquisition. Bergelson also hopes her work will pay off in a firmer theoretical foundation to help clinicians assess children with poor verbal skills or with neurodevelopmental conditions that impair information processing, such as autism spectrum disorders.


The Science of Stuttering

Posted on by Dr. Francis Collins

VP Biden: Portrait shoot by Andrew "Andy" Cutraro. 459 EEOB Studio
Credit: White House

Stuttering is a speech disorder that’s affected some very famous people, including King George VI, actress Marilyn Monroe, and, believe it or not, even Vice President Joe Biden.

About 5 percent of children stutter, but many like the Vice President outgrow the disorder.

About 1 percent of adults stutter. That’s about 3 million people in the United States and 60 million worldwide.

Until recently, the cause of most stuttering was a mystery. However, researchers at the NIH’s National Institute on Deafness and Other Communication Disorders have identified several genes involved in inherited forms of stuttering and are busy looking for additional clues that may open new avenues for treatment. Find out more about what science is doing to help.