Skip to main content


Artificial Intelligence Accurately Predicts RNA Structures, Too

Posted on by

A mechanical claw grabs molecular models
Credit: Camille L.L. Townshend

Researchers recently showed that a computer could “learn” from many examples of protein folding to predict the 3D structure of proteins with great speed and precision. Now a recent study in the journal Science shows that a computer also can predict the 3D shapes of RNA molecules [1]. This includes the mRNA that codes for proteins and the non-coding RNA that performs a range of cellular functions.

This work marks an important basic science advance. RNA therapeutics—from COVID-19 vaccines to cancer drugs—have already benefited millions of people and will help many more in the future. Now, the ability to predict RNA shapes quickly and accurately on a computer will help to accelerate understanding these critical molecules and expand their healthcare uses.

Like proteins, the shapes of single-stranded RNA molecules are important for their ability to function properly inside cells. Yet far less is known about these RNA structures and the rules that determine their precise shapes. The RNA elements (bases) can form internal hydrogen-bonded pairs, but the number of possible combinations of pairings is almost astronomical for any RNA molecule with more than a few dozen bases.

In hopes of moving the field forward, a team led by Stephan Eismann and Raphael Townshend in the lab of Ron Dror, Stanford University, Palo Alto, CA, looked to a machine learning approach known as deep learning. It is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.

In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.

One of the things that makes deep learning so powerful is it doesn’t rely on any preconceived notions. It also can pick up on important features and patterns that humans can’t possibly detect. But, as successful as this approach has been in solving many different kinds of problems, it has primarily been applied to areas of biology, such as protein folding, in which lots of data were available for researchers to train the computers.

That’s not the case with RNA molecules. To work around this problem, Dror’s team designed a neural network they call ARES. (No, it’s not the Greek god of war. It’s short for Atomic Rotationally Equivariant Scorer.)

To start, the researchers trained ARES on just 18 small RNA molecules for which structures had been experimentally determined. They gave ARES these structural models specified only by their atomic structure and chemical elements.

The next test was to see if ARES could determine from this small training set the best structural model for RNA sequences it had never seen before. The researchers put it to the test with RNA molecules whose structures had been determined more recently.

ARES, however, doesn’t come up with the structures itself. Instead, the researchers give ARES a sequence and at least 1,500 possible 3D structures it might take, all generated using another computer program. Based on patterns in the training set, ARES scores each of the possible structures to find the one it predicts is closest to the actual structure. Remarkably, it does this without being provided any prior information about features important for determining RNA shapes, such as nucleotides, steric constraints, and hydrogen bonds.

It turns out that ARES consistently outperforms humans and all other previous methods to produce the best results. In fact, it outperformed at least nine other methods to come out on top in a community-wide RNA-puzzles contest. It also can make predictions about RNA molecules that are significantly larger and more complex than those upon which it was trained.

The success of ARES and this deep learning approach will help to elucidate RNA molecules with potentially important implications for health and disease. It’s another compelling example of how deep learning promises to solve many other problems in structural biology, chemistry, and the material sciences when—at the outset—very little is known.


[1] Geometric deep learning of RNA structure. Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, Dror RO. Science. 2021 Aug 27;373(6558):1047-1051.


Structural Biology (National Institute of General Medical Sciences/NIH)

The Structures of Life (National Institute of General Medical Sciences/NIH)

RNA Biology (NIH)

RNA Puzzles

Dror Lab (Stanford University, Palo Alto, CA)

NIH Support: National Cancer Institute; National Institute of General Medical Sciences

Artificial Intelligence Accurately Predicts Protein Folding

Posted on by

Caption: Researchers used artificial intelligence to map hundreds of new protein structures, including this 3D view of human interleukin-12 (blue) bound to its receptor (purple). Credit: Ian Haydon, University of Washington Institute for Protein Design, Seattle

Proteins are the workhorses of the cell. Mapping the precise shapes of the most important of these workhorses helps to unlock their life-supporting functions or, in the case of disease, potential for dysfunction. While the amino acid sequence of a protein provides the basis for its 3D structure, deducing the atom-by-atom map from principles of quantum mechanics has been beyond the ability of computer programs—until now. 

In a recent study in the journal Science, researchers reported they have developed artificial intelligence approaches for predicting the three-dimensional structure of proteins in record time, based solely on their one-dimensional amino acid sequences [1]. This groundbreaking approach will not only aid researchers in the lab, but guide drug developers in coming up with safer and more effective ways to treat and prevent disease.

This new NIH-supported advance is now freely available to scientists around the world. In fact, it has already helped to solve especially challenging protein structures in cases where experimental data were lacking and other modeling methods hadn’t been enough to get a final answer. It also can now provide key structural information about proteins for which more time-consuming and costly imaging data are not yet available.

The new work comes from a group led by David Baker and Minkyung Baek, University of Washington, Seattle, Institute for Protein Design. Over the course of the pandemic, Baker’s team has been working hard to design promising COVID-19 therapeutics. They’ve also been working to design proteins that might offer promising new ways to treat cancer and other conditions. As part of this effort, they’ve developed new computational approaches for determining precisely how a chain of amino acids, which are the building blocks of proteins, will fold up in space to form a finished protein.

But the ability to predict a protein’s precise structure or shape from its sequence alone had proven to be a difficult problem to solve despite decades of effort. In search of a solution, research teams from around the world have come together every two years since 1994 at the Critical Assessment of Structure Prediction (CASP) meetings. At these gatherings, teams compete against each other with the goal of developing computational methods and software capable of predicting any of nature’s 200 million or more protein structures from sequences alone with the greatest accuracy.

Last year, a London-based company called DeepMind shook up the structural biology world with their entry into CASP called AlphaFold. (AlphaFold was one of Science’s 2020 Breakthroughs of the Year.) They showed that their artificial intelligence approach—which took advantage of the 170,000 proteins with known structures in a reiterative process called deep learning—could predict protein structure with amazing accuracy. In fact, it could predict most protein structures almost as accurately as other high-resolution protein mapping techniques, including today’s go-to strategies of X-ray crystallography and cryo-EM.

The DeepMind performance showed what was possible, but because the advances were made by a world-leading deep learning company, the details on how it worked weren’t made publicly available at the time. The findings left Baker, Baek, and others eager to learn more and to see if they could replicate the impressive predictive ability of AlphaFold outside of such a well-resourced company.

In the new work, Baker and Baek’s team has made stunning progress—using only a fraction of the computational processing power and time required by AlphaFold. The new software, called RoseTTAFold, also relies on a deep learning approach. In deep learning, computers look for patterns in large collections of data. As they begin to recognize complex relationships, some connections in the network are strengthened while others are weakened. The finished network is typically composed of multiple information-processing layers, which operate on the data to return a result—in this case, a protein structure.

Given the complexity of the problem, instead of using a single neural network, RoseTTAFold relies on three. The three-track neural network integrates and simultaneously processes one-dimensional protein sequence information, two-dimensional information about the distance between amino acids, and three-dimensional atomic structure all at once. Information from these separate tracks flows back and forth to generate accurate models of proteins rapidly from sequence information alone, including structures in complex with other proteins.

As soon as the researchers had what they thought was a reasonable working approach to solve protein structures, they began sharing it with their structural biologist colleagues. In many cases, it became immediately clear that RoseTTAFold worked remarkably well. What’s more, it has been put to work to solve challenging structural biology problems that had vexed scientists for many years with earlier methods.

RoseTTAFold already has solved hundreds of new protein structures, many of which represent poorly understood human proteins. The 3D rendering of a complex showing a human protein called interleukin-12 in complex with its receptor (above image) is just one example. The researchers have generated other structures directly relevant to human health, including some that are related to lipid metabolism, inflammatory conditions, and cancer. The program is now available on the web and has been downloaded by dozens of research teams around the world.

Cryo-EM and other experimental mapping methods will remain essential to solve protein structures in the lab. But with the artificial intelligence advances demonstrated by RoseTTAFold and AlphaFold, which has now also been released in an open-source version and reported in the journal Nature [2], researchers now can make the critical protein structure predictions at their desktops. This newfound ability will be a boon to basic science studies and has great potential to speed life-saving therapeutic advances.


[1] Accurate prediction of protein structures and interactions using a three-track neural network. Baek M, DiMaio F, Anishchenko I, Dauparas J, Grishin NV, Adams PD, Read RJ, Baker D., et al. Science. 2021 Jul 15:eabj8754.

[2] Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, Green T, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. et al. Nature. 2021 Jul 15.


Structural Biology (National Institute of General Medical Sciences/NIH)

The Structures of Life (NIGMS)

Baker Lab (University of Washington, Seattle)

CASP 14 (University of California, Davis)

NIH Support: National Institute of Allergy and Infectious Diseases; National Institute of General Medical Sciences

Artificial Intelligence Speeds Brain Tumor Diagnosis

Posted on by

Real time diagnostics in the operating room
Caption: Artificial intelligence speeds diagnosis of brain tumors. Top, doctor reviews digitized tumor specimen in operating room; left, the AI program predicts diagnosis; right, surgeons review results in near real-time.
Credit: Joe Hallisy, Michigan Medicine, Ann Arbor

Computers are now being trained to “see” the patterns of disease often hidden in our cells and tissues. Now comes word of yet another remarkable use of computer-generated artificial intelligence (AI): swiftly providing neurosurgeons with valuable, real-time information about what type of brain tumor is present, while the patient is still on the operating table.

This latest advance comes from an NIH-funded clinical trial of 278 patients undergoing brain surgery. The researchers found they could take a small tumor biopsy during surgery, feed it into a trained computer in the operating room, and receive a diagnosis that rivals the accuracy of an expert pathologist.

Traditionally, sending out a biopsy to an expert pathologist and getting back a diagnosis optimally takes about 40 minutes. But the computer can do it in the operating room on average in under 3 minutes. The time saved helps to inform surgeons how to proceed with their delicate surgery and make immediate and potentially life-saving treatment decisions to assist their patients.

As reported in Nature Medicine, researchers led by Daniel Orringer, NYU Langone Health, New York, and Todd Hollon, University of Michigan, Ann Arbor, took advantage of AI and another technological advance called stimulated Raman histology (SRH). The latter is an emerging clinical imaging technique that makes it possible to generate detailed images of a tissue sample without the usual processing steps.

The SRH technique starts off by bouncing laser light rapidly through a tissue sample. This light enables a nearby fiberoptic microscope to capture the cellular and structural details within the sample. Remarkably, it does so by picking up on subtle differences in the way lipids, proteins, and nucleic acids vibrate when exposed to the light.

Then, using a virtual coloring program, the microscope quickly pieces together and colors in the fine structural details, pixel by pixel. The result: a high-resolution, detailed image that you might expect from a pathology lab, minus the staining of cells, mounting of slides, and the other time-consuming processing procedures.

To interpret the SRH images, the researchers turned to computers and machine learning. To teach a computer, it must be fed large datasets of examples in order to learn how to perform a given task. In this case, they used a special class of machine learning called deep neural networks, or deep learning. It’s inspired by the way neural networks in the human brain process information.

In deep learning, computers look for patterns in large collections of data. As they begin to recognize complex relationships, some connections in the network are strengthened while others are weakened. The finished network is typically composed of multiple information-processing layers, which operate on the data to return a result, in this case a brain tumor diagnosis.

The team trained the computer to classify tissues samples into one of 13 categories commonly found in a brain tumor sample. Those categories included the most common brain tumors: malignant glioma, lymphoma, metastatic tumors, and meningioma. The training was based on more than 2.5 million labeled images representing samples from 415 patients.

Next, they put the machine to the test. The researchers split each of 278 brain tissue samples into two specimens. One was sent to a conventional pathology lab for prepping and diagnosis. The other was imaged with SRH, and then the trained machine made a diagnosis.

Overall, the machine’s performance was quite impressive, returning the right answer about 95 percent of the time. That’s compared to an accuracy of 94 percent for conventional pathology.

Interestingly, the machine made a correct diagnosis in all 17 cases that a pathologist got wrong. Likewise, the pathologist got the right answer in all 14 cases in which the machine slipped up.

The findings show that the combination of SRH and AI can be used to make real-time predictions of a patient’s brain tumor diagnosis to inform surgical decision-making. That may be especially important in places where expert neuropathologists are hard to find.

Ultimately, the researchers suggest that AI may yield even more useful information about a tumor’s underlying molecular alterations, adding ever greater precision to the diagnosis. Similar approaches are also likely to work in supplying timely information to surgeons operating on patients with other cancers too, including cancers of the skin and breast. The research team has made a brief video to give you a more detailed look at the new automated tissue-to-diagnosis pipeline.


[1] Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Hollon TC, Pandian B, Adapa AR, Urias E, Save AV, Khalsa SSS, Eichberg DG, D’Amico RS, Farooq ZU, Lewis S, Petridis PD, Marie T, Shah AH, Garton HJL, Maher CO, Heth JA, McKean EL, Sullivan SE, Hervey-Jumper SL, Patil PG, Thompson BG, Sagher O, McKhann GM 2nd, Komotar RJ, Ivan ME, Snuderl M, Otten ML, Johnson TD, Sisti MB, Bruce JN, Muraszko KM, Trautman J, Freudiger CW, Canoll P, Lee H, Camelo-Piragua S, Orringer DA. Nat Med. 2020 Jan 6.


Video: Artificial Intelligence: Collecting Data to Maximize Potential (NIH)

New Imaging Technique Allows Quick, Automated Analysis of Brain Tumor Tissue During Surgery (National Institute of Biomedical Imaging and Bioengineering/NIH)

Daniel Orringer (NYU Langone, Perlmutter Cancer Center, New York City)

Todd Hollon (University of Michigan, Ann Arbor)

NIH Support: National Cancer Institute; National Institute of Biomedical Imaging and Bioengineering

Whole-Genome Sequencing Plus AI Yields Same-Day Genetic Diagnoses

Posted on by

Caption: Rapid whole-genome sequencing helped doctors diagnose Sebastiana Manuel with Ohtahara syndrome, a neurological condition that causes seizures. Her data are now being used as part of an effort to speed the diagnosis of other children born with unexplained illnesses. Credits: Getty Images (left); Jenny Siegwart (right).

Back in April 2003, when the international Human Genome Project successfully completed the first reference sequence of the human DNA blueprint, we were thrilled to have achieved that feat in just 13 years. Sure, the U.S. contribution to that first human reference sequence cost an estimated $400 million, but we knew (or at least we hoped) that the costs would come down quickly, and the speed would accelerate. How far we’ve come since then! A new study shows that whole genome sequencing—combined with artificial intelligence (AI)—can now be used to diagnose genetic diseases in seriously ill babies in less than 24 hours.

Take a moment to absorb this. I would submit that there is no other technology in the history of planet Earth that has experienced this degree of progress in speed and affordability. And, at the same time, DNA sequence technology has achieved spectacularly high levels of accuracy. The time-honored adage that you can only get two out of three for “faster, better, and cheaper” has been broken—all three have been dramatically enhanced by the advances of the last 16 years.

Rapid diagnosis is critical for infants born with mysterious conditions because it enables them to receive potentially life-saving interventions as soon as possible after birth. In a study in Science Translational Medicine, NIH-funded researchers describe development of a highly automated, genome-sequencing pipeline that’s capable of routinely delivering a diagnosis to anxious parents and health-care professionals dramatically earlier than typically has been possible [1].

While the cost of rapid DNA sequencing continues to fall, challenges remain in utilizing this valuable tool to make quick diagnostic decisions. In most clinical settings, the wait for whole-genome sequencing results still runs more than two weeks. Attempts to obtain faster results also have been labor intensive, requiring dedicated teams of experts to sift through the data, one sample at a time.

In the new study, a research team led by Stephen Kingsmore, Rady Children’s Institute for Genomic Medicine, San Diego, CA, describes a streamlined approach that accelerates every step in the process, making it possible to obtain whole-genome test results in a median time of about 20 hours and with much less manual labor. They propose that the system could deliver answers for 30 patients per week using a single genome sequencing instrument.

Here’s how it works: Instead of manually preparing blood samples, his team used special microbeads to isolate DNA much more rapidly with very little labor. The approach reduced the time for sample preparation from 10 hours to less than three. Then, using a state-of-the-art DNA sequencer, they sequence those samples to obtain good quality whole genome data in just 15.5 hours.

The next potentially time-consuming challenge is making sense of all that data. To speed up the analysis, Kingsmore’s team took advantage of a machine-learning system called MOON. The automated platform sifts through all the data using artificial intelligence to search for potentially disease-causing variants.

The researchers paired MOON with a clinical language processing system, which allowed them to extract relevant information from the child’s electronic health records within seconds. Teaming that patient-specific information with data on more than 13,000 known genetic diseases in the scientific literature, the machine-learning system could pick out a likely disease-causing mutation out of 4.5 million potential variants in an impressive 5 minutes or less!

To put the system to the test, the researchers first evaluated its ability to reach a correct diagnosis in a sample of 101 children with 105 previously diagnosed genetic diseases. In nearly every case, the automated diagnosis matched the opinions reached previously via the more lengthy and laborious manual interpretation of experts.

Next, the researchers tested the automated system in assisting diagnosis of seven seriously ill infants in the intensive care unit, and three previously diagnosed infants. They showed that their automated system could reach a diagnosis in less than 20 hours. That’s compared to the fastest manual approach, which typically took about 48 hours. The automated system also required about 90 percent less manpower.

The system nailed a rapid diagnosis for 3 of 7 infants without returning any false-positive results. Those diagnoses were made with an average time savings of more than 22 hours. In each case, the early diagnosis immediately influenced the treatment those children received. That’s key given that, for young children suffering from serious and unexplained symptoms such as seizures, metabolic abnormalities, or immunodeficiencies, time is of the essence.

Of course, artificial intelligence may never replace doctors and other healthcare providers. Kingsmore notes that 106 years after the invention of the autopilot, two pilots are still required to fly a commercial aircraft. Likewise, health care decisions based on genome interpretation also will continue to require the expertise of skilled physicians.

Still, such a rapid automated system will prove incredibly useful. For instance, this system can provide immediate provisional diagnosis, allowing the experts to focus their attention on more difficult unsolved cases or other needs. It may also prove useful in re-evaluating the evidence in the many cases in which manual interpretation by experts fails to provide an answer.

The automated system may also be useful for periodically reanalyzing data in the many cases that remain unsolved. Keeping up with such reanalysis is a particular challenge considering that researchers continue to discover hundreds of disease-associated genes and thousands of variants each and every year. The hope is that in the years ahead, the combination of whole genome sequencing, artificial intelligence, and expert care will make all the difference in the lives of many more seriously ill babies and their families.


[1] Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Clark MM, Hildreth A, Batalov S, Ding Y, Chowdhury S, Watkins K, Ellsworth K, Camp B, Kint CI, Yacoubian C, Farnaes L, Bainbridge MN, Beebe C, Braun JJA, Bray M, Carroll J, Cakici JA, Caylor SA, Clarke C, Creed MP, Friedman J, Frith A, Gain R, Gaughran M, George S, Gilmer S, Gleeson J, Gore J, Grunenwald H, Hovey RL, Janes ML, Lin K, McDonagh PD, McBride K, Mulrooney P, Nahas S, Oh D, Oriol A, Puckett L, Rady Z, Reese MG, Ryu J, Salz L, Sanford E, Stewart L, Sweeney N, Tokita M, Van Der Kraan L, White S, Wigby K, Williams B, Wong T, Wright MS, Yamada C, Schols P, Reynders J, Hall K, Dimmock D, Veeraraghavan N, Defay T, Kingsmore SF. Sci Transl Med. 2019 Apr 24;11(489).


DNA Sequencing Fact Sheet (National Human Genome Research Institute/NIH)

Genomics and Medicine (NHGRI/NIH)

Genetic and Rare Disease Information Center (National Center for Advancing Translational Sciences/NIH)

Stephen Kingsmore (Rady Children’s Institute for Genomic Medicine, San Diego, CA)

NIH Support: National Institute of Child Health and Human Development; National Human Genome Research Institute; National Center for Advancing Translational Sciences

Can a Mind-Reading Computer Speak for Those Who Cannot?

Posted on by

Credit: Adapted from Nima Mesgarani, Columbia University’s Zuckerman Institute, New York

Computers have learned to do some amazing things, from beating the world’s ranking chess masters to providing the equivalent of feeling in prosthetic limbs. Now, as heard in this brief audio clip counting from zero to nine, an NIH-supported team has combined innovative speech synthesis technology and artificial intelligence to teach a computer to read a person’s thoughts and translate them into intelligible speech.

Turning brain waves into speech isn’t just fascinating science. It might also prove life changing for people who have lost the ability to speak from conditions such as amyotrophic lateral sclerosis (ALS) or a debilitating stroke.

When people speak or even think about talking, their brains fire off distinctive, but previously poorly decoded, patterns of neural activity. Nima Mesgarani and his team at Columbia University’s Zuckerman Institute, New York, wanted to learn how to decode this neural activity.

Mesgarani and his team started out with a vocoder, a voice synthesizer that produces sounds based on an analysis of speech. It’s the very same technology used by Amazon’s Alexa, Apple’s Siri, or other similar devices to listen and respond appropriately to everyday commands.

As reported in Scientific Reports, the first task was to train a vocoder to produce synthesized sounds in response to brain waves instead of speech [1]. To do it, Mesgarani teamed up with neurosurgeon Ashesh Mehta, Hofstra Northwell School of Medicine, Manhasset, NY, who frequently performs brain mapping in people with epilepsy to pinpoint the sources of seizures before performing surgery to remove them.

In five patients already undergoing brain mapping, the researchers monitored activity in the auditory cortex, where the brain processes sound. The patients listened to recordings of short stories read by four speakers. In the first test, eight different sentences were repeated multiple times. In the next test, participants heard four new speakers repeat numbers from zero to nine.

From these exercises, the researchers reconstructed the words that people heard from their brain activity alone. Then the researchers tried various methods to reproduce intelligible speech from the recorded brain activity. They found it worked best to combine the vocoder technology with a form of computer artificial intelligence known as deep learning.

Deep learning is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others. In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.

In this case, the researchers used the deep learning networks to interpret the sounds produced by the vocoder in response to the brain activity patterns. When the vocoder-produced sounds were processed and “cleaned up” by those neural networks, it made the reconstructed sounds easier for a listener to understand as recognizable words, though this first attempt still sounds pretty robotic.

The researchers will continue testing their system with more complicated words and sentences. They also want to run the same tests on brain activity, comparing what happens when a person speaks or just imagines speaking. They ultimately envision an implant, similar to those already worn by some patients with epilepsy, that will translate a person’s thoughts into spoken words. That might open up all sorts of awkward moments if some of those thoughts weren’t intended for transmission!

Along with recently highlighted new ways to catch irregular heartbeats and cervical cancers, it’s yet another remarkable example of the many ways in which computers and artificial intelligence promise to transform the future of medicine.


[1] Towards reconstructing intelligible speech from the human auditory cortex. Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N. Sci Rep. 2019 Jan 29;9(1):874.


Advances in Neuroprosthetic Learning and Control. Carmena JM. PLoS Biol. 2013;11(5):e1001561.

Nima Mesgarani (Columbia University, New York)

NIH Support: National Institute on Deafness and Other Communication Disorders; National Institute of Mental Health

Next Page