Posted on by Dr. Francis Collins
Researchers recently showed that a computer could “learn” from many examples of protein folding to predict the 3D structure of proteins with great speed and precision. Now a recent study in the journal Science shows that a computer also can predict the 3D shapes of RNA molecules . This includes the mRNA that codes for proteins and the non-coding RNA that performs a range of cellular functions.
This work marks an important basic science advance. RNA therapeutics—from COVID-19 vaccines to cancer drugs—have already benefited millions of people and will help many more in the future. Now, the ability to predict RNA shapes quickly and accurately on a computer will help to accelerate understanding these critical molecules and expand their healthcare uses.
Like proteins, the shapes of single-stranded RNA molecules are important for their ability to function properly inside cells. Yet far less is known about these RNA structures and the rules that determine their precise shapes. The RNA elements (bases) can form internal hydrogen-bonded pairs, but the number of possible combinations of pairings is almost astronomical for any RNA molecule with more than a few dozen bases.
In hopes of moving the field forward, a team led by Stephan Eismann and Raphael Townshend in the lab of Ron Dror, Stanford University, Palo Alto, CA, looked to a machine learning approach known as deep learning. It is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.
In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.
One of the things that makes deep learning so powerful is it doesn’t rely on any preconceived notions. It also can pick up on important features and patterns that humans can’t possibly detect. But, as successful as this approach has been in solving many different kinds of problems, it has primarily been applied to areas of biology, such as protein folding, in which lots of data were available for researchers to train the computers.
That’s not the case with RNA molecules. To work around this problem, Dror’s team designed a neural network they call ARES. (No, it’s not the Greek god of war. It’s short for Atomic Rotationally Equivariant Scorer.)
To start, the researchers trained ARES on just 18 small RNA molecules for which structures had been experimentally determined. They gave ARES these structural models specified only by their atomic structure and chemical elements.
The next test was to see if ARES could determine from this small training set the best structural model for RNA sequences it had never seen before. The researchers put it to the test with RNA molecules whose structures had been determined more recently.
ARES, however, doesn’t come up with the structures itself. Instead, the researchers give ARES a sequence and at least 1,500 possible 3D structures it might take, all generated using another computer program. Based on patterns in the training set, ARES scores each of the possible structures to find the one it predicts is closest to the actual structure. Remarkably, it does this without being provided any prior information about features important for determining RNA shapes, such as nucleotides, steric constraints, and hydrogen bonds.
It turns out that ARES consistently outperforms humans and all other previous methods to produce the best results. In fact, it outperformed at least nine other methods to come out on top in a community-wide RNA-puzzles contest. It also can make predictions about RNA molecules that are significantly larger and more complex than those upon which it was trained.
The success of ARES and this deep learning approach will help to elucidate RNA molecules with potentially important implications for health and disease. It’s another compelling example of how deep learning promises to solve many other problems in structural biology, chemistry, and the material sciences when—at the outset—very little is known.
 Geometric deep learning of RNA structure. Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, Dror RO. Science. 2021 Aug 27;373(6558):1047-1051.
Structural Biology (National Institute of General Medical Sciences/NIH)
The Structures of Life (National Institute of General Medical Sciences/NIH)
RNA Biology (NIH)
Dror Lab (Stanford University, Palo Alto, CA)
NIH Support: National Cancer Institute; National Institute of General Medical Sciences
Posted on by Dr. Francis Collins
Back in April 2003, when the international Human Genome Project successfully completed the first reference sequence of the human DNA blueprint, we were thrilled to have achieved that feat in just 13 years. Sure, the U.S. contribution to that first human reference sequence cost an estimated $400 million, but we knew (or at least we hoped) that the costs would come down quickly, and the speed would accelerate. How far we’ve come since then! A new study shows that whole genome sequencing—combined with artificial intelligence (AI)—can now be used to diagnose genetic diseases in seriously ill babies in less than 24 hours.
Take a moment to absorb this. I would submit that there is no other technology in the history of planet Earth that has experienced this degree of progress in speed and affordability. And, at the same time, DNA sequence technology has achieved spectacularly high levels of accuracy. The time-honored adage that you can only get two out of three for “faster, better, and cheaper” has been broken—all three have been dramatically enhanced by the advances of the last 16 years.
Rapid diagnosis is critical for infants born with mysterious conditions because it enables them to receive potentially life-saving interventions as soon as possible after birth. In a study in Science Translational Medicine, NIH-funded researchers describe development of a highly automated, genome-sequencing pipeline that’s capable of routinely delivering a diagnosis to anxious parents and health-care professionals dramatically earlier than typically has been possible .
While the cost of rapid DNA sequencing continues to fall, challenges remain in utilizing this valuable tool to make quick diagnostic decisions. In most clinical settings, the wait for whole-genome sequencing results still runs more than two weeks. Attempts to obtain faster results also have been labor intensive, requiring dedicated teams of experts to sift through the data, one sample at a time.
In the new study, a research team led by Stephen Kingsmore, Rady Children’s Institute for Genomic Medicine, San Diego, CA, describes a streamlined approach that accelerates every step in the process, making it possible to obtain whole-genome test results in a median time of about 20 hours and with much less manual labor. They propose that the system could deliver answers for 30 patients per week using a single genome sequencing instrument.
Here’s how it works: Instead of manually preparing blood samples, his team used special microbeads to isolate DNA much more rapidly with very little labor. The approach reduced the time for sample preparation from 10 hours to less than three. Then, using a state-of-the-art DNA sequencer, they sequence those samples to obtain good quality whole genome data in just 15.5 hours.
The next potentially time-consuming challenge is making sense of all that data. To speed up the analysis, Kingsmore’s team took advantage of a machine-learning system called MOON. The automated platform sifts through all the data using artificial intelligence to search for potentially disease-causing variants.
The researchers paired MOON with a clinical language processing system, which allowed them to extract relevant information from the child’s electronic health records within seconds. Teaming that patient-specific information with data on more than 13,000 known genetic diseases in the scientific literature, the machine-learning system could pick out a likely disease-causing mutation out of 4.5 million potential variants in an impressive 5 minutes or less!
To put the system to the test, the researchers first evaluated its ability to reach a correct diagnosis in a sample of 101 children with 105 previously diagnosed genetic diseases. In nearly every case, the automated diagnosis matched the opinions reached previously via the more lengthy and laborious manual interpretation of experts.
Next, the researchers tested the automated system in assisting diagnosis of seven seriously ill infants in the intensive care unit, and three previously diagnosed infants. They showed that their automated system could reach a diagnosis in less than 20 hours. That’s compared to the fastest manual approach, which typically took about 48 hours. The automated system also required about 90 percent less manpower.
The system nailed a rapid diagnosis for 3 of 7 infants without returning any false-positive results. Those diagnoses were made with an average time savings of more than 22 hours. In each case, the early diagnosis immediately influenced the treatment those children received. That’s key given that, for young children suffering from serious and unexplained symptoms such as seizures, metabolic abnormalities, or immunodeficiencies, time is of the essence.
Of course, artificial intelligence may never replace doctors and other healthcare providers. Kingsmore notes that 106 years after the invention of the autopilot, two pilots are still required to fly a commercial aircraft. Likewise, health care decisions based on genome interpretation also will continue to require the expertise of skilled physicians.
Still, such a rapid automated system will prove incredibly useful. For instance, this system can provide immediate provisional diagnosis, allowing the experts to focus their attention on more difficult unsolved cases or other needs. It may also prove useful in re-evaluating the evidence in the many cases in which manual interpretation by experts fails to provide an answer.
The automated system may also be useful for periodically reanalyzing data in the many cases that remain unsolved. Keeping up with such reanalysis is a particular challenge considering that researchers continue to discover hundreds of disease-associated genes and thousands of variants each and every year. The hope is that in the years ahead, the combination of whole genome sequencing, artificial intelligence, and expert care will make all the difference in the lives of many more seriously ill babies and their families.
 Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Clark MM, Hildreth A, Batalov S, Ding Y, Chowdhury S, Watkins K, Ellsworth K, Camp B, Kint CI, Yacoubian C, Farnaes L, Bainbridge MN, Beebe C, Braun JJA, Bray M, Carroll J, Cakici JA, Caylor SA, Clarke C, Creed MP, Friedman J, Frith A, Gain R, Gaughran M, George S, Gilmer S, Gleeson J, Gore J, Grunenwald H, Hovey RL, Janes ML, Lin K, McDonagh PD, McBride K, Mulrooney P, Nahas S, Oh D, Oriol A, Puckett L, Rady Z, Reese MG, Ryu J, Salz L, Sanford E, Stewart L, Sweeney N, Tokita M, Van Der Kraan L, White S, Wigby K, Williams B, Wong T, Wright MS, Yamada C, Schols P, Reynders J, Hall K, Dimmock D, Veeraraghavan N, Defay T, Kingsmore SF. Sci Transl Med. 2019 Apr 24;11(489).
DNA Sequencing Fact Sheet (National Human Genome Research Institute/NIH)
Genomics and Medicine (NHGRI/NIH)
Genetic and Rare Disease Information Center (National Center for Advancing Translational Sciences/NIH)
Stephen Kingsmore (Rady Children’s Institute for Genomic Medicine, San Diego, CA)
NIH Support: National Institute of Child Health and Human Development; National Human Genome Research Institute; National Center for Advancing Translational Sciences
Posted on by Dr. Francis Collins
My last post highlighted the use of artificial intelligence (AI) to create an algorithm capable of detecting 10 different kinds of irregular heart rhythms. But that’s just one of the many potential medical uses of AI. In this post, I’ll tell you how NIH researchers are pairing AI analysis with smartphone cameras to help more women avoid cervical cancer.
In work described in the Journal of the National Cancer Institute , researchers used a high-performance computer to analyze thousands of cervical photographs, obtained more than 20 years ago from volunteers in a cancer screening study. The computer learned to recognize specific patterns associated with pre-cancerous and cancerous changes of the cervix, and that information was used to develop an algorithm for reliably detecting such changes in the collection of images. In fact, the AI-generated algorithm outperformed human expert reviewers and all standard screening tests in detecting pre-cancerous changes.
Nearly all cervical cancers are caused by the human papillomavirus (HPV). Cervical cancer screening—first with Pap smears and now also using HPV testing—have greatly reduced deaths from cervical cancer. But this cancer still claims the lives of more than 4,000 U.S. women each year, with higher frequency among women who are black or older . Around the world, more than a quarter-million women die of this preventable disease, mostly in poor and remote areas .
These troubling numbers have kept researchers on the lookout for low cost, but easy-to-use, tools that could be highly effective at detecting HPV infections most likely to advance to cervical cancer. Such tools would also need to work well in areas with limited resources for sample preparation and lab analysis. That’s what led to this collaboration involving researchers from NIH’s National Cancer Institute (NCI) and Global Good, Bellevue, WA, which is an Intellectual Ventures collaboration with Bill Gates to invent life-changing technologies for the developing world.
Global Good researchers contacted NCI experts hoping to apply AI to a large dataset of cervical images. The NCI experts suggested an 18-year cervical cancer screening study in Costa Rica. The NCI-supported project, completed in the 1990s, generated nearly 60,000 cervical images, later digitized by NIH’s National Library of Medicine and stored away safely.
The researchers agreed that all these images, obtained in a highly standardized way, would serve as perfect training material for a computer to develop a detection algorithm for cervical cancer. This type of AI, called machine learning, involves feeding tens of thousands of images into a computer equipped with one or more high-powered graphics processing units (GPUs), similar to something you’d find in an Xbox or PlayStation. The GPUs allow the computer to crunch large sets of visual data in the images and devise a set of rules, or algorithms, that allow it to learn to “see” physical features.
Here’s how they did it. First, the researchers got the computer to create a convolutional neural network. That’s a fancy way of saying that they trained it to read images, filter out the millions of non-essential bytes, and retain the few hundred bytes in the photo that make it uniquely identifiable. They fed 1.28 million color images covering hundreds of common objects into the computer to create layers of processing ability that, like the human visual system, can distinguish objects and their qualities.
Once the convolutional neural network was formed, the researchers took the next big step: training the system to see the physical properties of a healthy cervix, a cervix with worrisome cellular changes, or a cervix with pre-cancer. That’s where the thousands of cervical images from the Costa Rican screening trial literally entered the picture.
When all these layers of processing ability were formed, the researchers had created the “automated visual evaluation” algorithm. It went on to identify with remarkable accuracy the images associated with the Costa Rican study’s 241 known precancers and 38 known cancers. The algorithm’s few minor hiccups came mainly from suboptimal images with faded colors or slightly blurred focus.
These minor glitches have the researchers now working hard to optimize the process, including determining how health workers can capture good quality photos of the cervix with a smartphone during a routine pelvic exam and how to outfit smartphones with the necessary software to analyze cervical photos quickly in real-world settings. The goal is to enable health workers to use a smartphone or similar device to provide women with cervical screening and treatment during a single visit.
In fact, the researchers are already field testing their AI-inspired approach on smartphones in the United States and abroad. If all goes well, this low-cost, mobile approach could provide a valuable new tool to help reduce the burden of cervical cancer among underserved populations.
The day that cervical cancer no longer steals the lives of hundreds of thousands of women a year worldwide will be a joyful moment for cancer researchers, as well as a major victory for women’s health.
 An observational study of Deep Learning and automated evaluation of cervical images for cancer screening. Hu L, Bell D, Antani S, Xue Z, Yu K, Horning MP, Gachuhi N, Wilson B, Jaiswal MS, Befano B, Long LR, Herrero R, Einstein MH, Burk RD, Demarco M, Gage JC, Rodriguez AC, Wentzensen N, Schiffman M. J Natl Cancer Inst. 2019 Jan 10. [Epub ahead of print]
 “Study: Death Rate from Cervical Cancer Higher Than Thought,” American Cancer Society, Jan. 25, 2017.
 “World Cancer Day,” World Health Organization, Feb. 2, 2017.
Posted on by Dr. Francis Collins
Thanks to advances in wearable health technologies, it’s now possible for people to monitor their heart rhythms at home for days, weeks, or even months via wireless electrocardiogram (EKG) patches. In fact, my Apple Watch makes it possible to record a real-time EKG whenever I want. (I’m glad to say I am in normal sinus rhythm.)
For true medical benefit, however, the challenge lies in analyzing the vast amounts of data—often hundreds of hours worth per person—to distinguish reliably between harmless rhythm irregularities and potentially life-threatening problems. Now, NIH-funded researchers have found that artificial intelligence (AI) can help.
A powerful computer “studied” more than 90,000 EKG recordings, from which it “learned” to recognize patterns, form rules, and apply them accurately to future EKG readings. The computer became so “smart” that it could classify 10 different types of irregular heart rhythms, including atrial fibrillation (AFib). In fact, after just seven months of training, the computer-devised algorithm was as good—and in some cases even better than—cardiology experts at making the correct diagnostic call.
EKG tests measure electrical impulses in the heart, which signal the heart muscle to contract and pump blood to the rest of the body. The precise, wave-like features of the electrical impulses allow doctors to determine whether a person’s heart is beating normally.
For example, in people with AFib, the heart’s upper chambers (the atria) contract rapidly and unpredictably, causing the ventricles (the main heart muscle) to contract irregularly rather than in a steady rhythm. This is an important arrhythmia to detect, even if it may only be present occasionally over many days of monitoring. That’s not always easy to do with current methods.
Here’s where the team, led by computer scientists Awni Hannun and Andrew Ng, Stanford University, Palo Alto, CA, saw an AI opportunity. As published in Nature Medicine, the Stanford team started by assembling a large EKG dataset from more than 53,000 people . The data included various forms of arrhythmia and normal heart rhythms from people who had worn the FDA-approved Zio patch for about two weeks.
The Zio patch is a 2-by-5-inch adhesive patch, worn much like a bandage, on the upper left side of the chest. It’s water resistant and can be kept on around the clock while a person sleeps, exercises, or takes a shower. The wireless patch continuously monitors heart rhythms, storing EKG data for later analysis.
The Stanford researchers looked to machine learning to process all the EKG data. In machine learning, computers rely on large datasets of examples in order to learn how to perform a given task. The accuracy improves as the machine “sees” more data.
But the team’s real interest was in utilizing a special class of machine learning called deep neural networks, or deep learning. Deep learning is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.
In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened. The network is typically composed of multiple information-processing layers, which operate on the data and compute increasingly complex and abstract representations.
Those data reach the final output layer, which acts as a classifier, assigning each bit of data to a particular category or, in the case of the EKG readings, a diagnosis. In this way, computers can learn to analyze and sort highly complex data using both more obvious and hidden features.
Ultimately, the computer in the new study could differentiate between EKG readings representing 10 different arrhythmias as well as a normal heart rhythm. It could also tell the difference between irregular heart rhythms and background “noise” caused by interference of one kind or another, such as a jostled or disconnected Zio patch.
For validation, the computer attempted to assign a diagnosis to the EKG readings of 328 additional patients. Independently, several expert cardiologists also read those EKGs and reached a consensus diagnosis for each patient. In almost all cases, the computer’s diagnosis agreed with the consensus of the cardiologists. The computer also made its calls much faster.
Next, the researchers compared the computer’s diagnoses to those of six individual cardiologists who weren’t part of the original consensus committee. And, the results show that the computer actually outperformed these experienced cardiologists!
The findings suggest that artificial intelligence can be used to improve the accuracy and efficiency of EKG readings. In fact, Hannun reports that iRhythm Technologies, maker of the Zio patch, has already incorporated the algorithm into the interpretation now being used to analyze data from real patients.
As impressive as this is, we are surely just at the beginning of AI applications to health and health care. In recognition of the opportunities ahead, NIH has recently launched a working group on AI to explore ways to make the best use of existing data, and harness the potential of artificial intelligence and machine learning to advance biomedical research and the practice of medicine.
Meanwhile, more and more impressive NIH-supported research featuring AI is being published. In my next blog, I’ll highlight a recent paper that uses AI to make a real difference for cervical cancer, particularly in low resource settings.
 Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY.
Nat Med. 2019 Jan;25(1):65-69.
Arrhythmia (National Heart, Lung, and Blood Institute/NIH)
Andrew Ng (Palo Alto, CA)
NIH Support: National Heart, Lung, and Blood Institute