Posted on by Dr. Francis Collins
Researchers recently showed that a computer could “learn” from many examples of protein folding to predict the 3D structure of proteins with great speed and precision. Now a recent study in the journal Science shows that a computer also can predict the 3D shapes of RNA molecules . This includes the mRNA that codes for proteins and the non-coding RNA that performs a range of cellular functions.
This work marks an important basic science advance. RNA therapeutics—from COVID-19 vaccines to cancer drugs—have already benefited millions of people and will help many more in the future. Now, the ability to predict RNA shapes quickly and accurately on a computer will help to accelerate understanding these critical molecules and expand their healthcare uses.
Like proteins, the shapes of single-stranded RNA molecules are important for their ability to function properly inside cells. Yet far less is known about these RNA structures and the rules that determine their precise shapes. The RNA elements (bases) can form internal hydrogen-bonded pairs, but the number of possible combinations of pairings is almost astronomical for any RNA molecule with more than a few dozen bases.
In hopes of moving the field forward, a team led by Stephan Eismann and Raphael Townshend in the lab of Ron Dror, Stanford University, Palo Alto, CA, looked to a machine learning approach known as deep learning. It is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.
In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.
One of the things that makes deep learning so powerful is it doesn’t rely on any preconceived notions. It also can pick up on important features and patterns that humans can’t possibly detect. But, as successful as this approach has been in solving many different kinds of problems, it has primarily been applied to areas of biology, such as protein folding, in which lots of data were available for researchers to train the computers.
That’s not the case with RNA molecules. To work around this problem, Dror’s team designed a neural network they call ARES. (No, it’s not the Greek god of war. It’s short for Atomic Rotationally Equivariant Scorer.)
To start, the researchers trained ARES on just 18 small RNA molecules for which structures had been experimentally determined. They gave ARES these structural models specified only by their atomic structure and chemical elements.
The next test was to see if ARES could determine from this small training set the best structural model for RNA sequences it had never seen before. The researchers put it to the test with RNA molecules whose structures had been determined more recently.
ARES, however, doesn’t come up with the structures itself. Instead, the researchers give ARES a sequence and at least 1,500 possible 3D structures it might take, all generated using another computer program. Based on patterns in the training set, ARES scores each of the possible structures to find the one it predicts is closest to the actual structure. Remarkably, it does this without being provided any prior information about features important for determining RNA shapes, such as nucleotides, steric constraints, and hydrogen bonds.
It turns out that ARES consistently outperforms humans and all other previous methods to produce the best results. In fact, it outperformed at least nine other methods to come out on top in a community-wide RNA-puzzles contest. It also can make predictions about RNA molecules that are significantly larger and more complex than those upon which it was trained.
The success of ARES and this deep learning approach will help to elucidate RNA molecules with potentially important implications for health and disease. It’s another compelling example of how deep learning promises to solve many other problems in structural biology, chemistry, and the material sciences when—at the outset—very little is known.
 Geometric deep learning of RNA structure. Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, Dror RO. Science. 2021 Aug 27;373(6558):1047-1051.
Structural Biology (National Institute of General Medical Sciences/NIH)
The Structures of Life (National Institute of General Medical Sciences/NIH)
RNA Biology (NIH)
Dror Lab (Stanford University, Palo Alto, CA)
NIH Support: National Cancer Institute; National Institute of General Medical Sciences
Posted on by Dr. Francis Collins
More than 30 years ago, I co-led the Michigan-Toronto team that discovered that cystic fibrosis (CF) is caused by an inherited misspelling in the cystic fibrosis transmembrane conductance regulator (CFTR) gene . The CFTR protein’s normal function on the surface of epithelial cells is to serve as a gated channel for chloride ions to pass in and out of the cell. But this function is lost in individuals for whom both copies of CFTR are misspelled. As a consequence, water and salt get out of balance, leading to the production of the thick mucus that leaves people with CF prone to life-threatening lung infections.
It took three decades, but that CFTR gene discovery has now led to the development of a precise triple drug therapy that activates the dysfunctional CFTR protein and provides major benefit to most children and adults with CF. But about 10 percent of individuals with CF have mutations that result in the production of virtually no CFTR protein, which means there is nothing for current triple therapy to correct or activate.
That’s why more basic research is needed to tease out other factors that contribute to CF and, if treatable, could help even more people control the condition and live longer lives with less chronic illness. A recent NIH-supported study, published in the journal Nature Medicine , offers an interesting basic clue, and it’s visible in the image above.
The healthy lung tissue (left) shows a well-defined and orderly layer of ciliated cells (green), which use hair-like extensions to clear away mucus and debris. Running closely alongside it is a layer of basal cells (outlined in red), which includes stem cells that are essential for repairing and regenerating upper airway tissue. (DNA indicating the position of cell is stained in blue).
In the CF-affected airways (right), those same cell types are present. However, compared to the healthy lung tissue, they appear to be in a state of disarray. Upon closer inspection, there’s something else that’s unusual if you look carefully: large numbers of a third, transitional cell subtype (outlined in red with green in the nucleus) that combines properties of both basal stem cells and ciliated cells, which is suggestive of cells in transition. The image below more clearly shows these cells (yellow arrows).
The increased number of cells with transitional characteristics suggests an unsuccessful attempt by the lungs to produce more cells capable of clearing the mucus buildup that occurs in airways of people with CF. The data offer an important foundation and reference for continued study.
These findings come from a team led by Kathrin Plath and Brigitte Gomperts, University of California, Los Angeles; John Mahoney, Cystic Fibrosis Foundation, Lexington, MA; and Barry Stripp, Cedars-Sinai, Los Angeles. Together with their lab members, they’re part of a larger research team assembled through the Cystic Fibrosis Foundation’s Epithelial Stem Cell Consortium, which seeks to learn how the disease changes the lung’s cellular makeup and use that new knowledge to make treatment advances.
In this study, researchers analyzed the lungs of 19 people with CF and another 19 individuals with no evidence of lung disease. Those with CF had donated their lungs for research in the process of receiving a lung transplant. Those with healthy lungs were organ donors who died of other causes.
The researchers analyzed, one by one, many thousands of cells from the airway and classified them into subtypes based on their distinctive RNA patterns. Those patterns indicate which genes are switched on or off in each cell, as well as the degree to which they are activated. Using a sophisticated computer-based approach to sift through and compare data, the team created a comprehensive catalog of cell types and subtypes present in healthy airways and in those affected by CF.
The new catalogs also revealed that the airways of people with CF had alterations in the types and proportions of basal cells. Those differences included a relative overabundance of cells that appeared to be transitioning from basal stem cells into the specialized ciliated cells, which are so essential for clearing mucus from the lungs.
We are not yet at our journey’s end when it comes to realizing the full dream of defeating CF. For the 10 percent of CF patients who don’t benefit from the triple-drug therapy, the continuing work to find other treatment strategies should be encouraging news. Keep daring to dream of breathing free. Through continued research, we can make the story of CF into history!
 Identification of the cystic fibrosis gene: chromosome walking and jumping. Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N, et al. Science.1989 Sep 8;245(4922):1059-65.
 Transcriptional analysis of cystic fibrosis airways at single-cell resolution reveals altered epithelial cell states and composition. Carraro G, Langerman J, Sabri S, Lorenzana Z, Purkayastha A, Zhang G, Konda B, Aros CJ, Calvert BA, Szymaniak A, Wilson E, Mulligan M, Bhatt P, Lu J, Vijayaraj P, Yao C, Shia DW, Lund AJ, Israely E, Rickabaugh TM, Ernst J, Mense M, Randell SH, Vladar EK, Ryan AL, Plath K, Mahoney JE, Stripp BR, Gomperts BN. Nat Med. 2021 May;27(5):806-814.
Cystic Fibrosis (National Heart, Lung, and Blood Institute/NIH)
Kathrin Plath (University of California, Los Angeles)
Brigitte Gomperts (UCLA)
Stripp Lab (Cedars-Sinai, Los Angeles)
Cystic Fibrosis Foundation (Lexington, MA)
Epithelial Stem Cell Consortium (Cystic Fibrosis Foundation, Lexington, MA)
NIH Support: National Heart, Lung, and Blood Institute; National Institute of Diabetes and Digestive and Kidney Diseases; National Institute of General Medical Sciences; National Cancer Institute; National Center for Advancing Translational Sciences
Posted on by Dr. Francis Collins
One of the best ways to learn how something works is to understand how it’s built. How it came to be. That’s true not only if you play a guitar or repair motorcycle engines, but also if you study the biological systems that make life possible. Evolutionary studies, comparing the development of these systems across animals and organisms, are now leading to many unexpected biological discoveries and promising possibilities for preventing and treating human disease.
While there are many evolutionary questions to ask, Brenda Bass, a distinguished biochemist at University of Utah, Salt Lake City, has set her sights on a particularly profound one: How has innate immunity evolved through the millennia in all living things, including humans? Innate immunity is the immune system’s frontline defense, the first responders that take control of an emerging infectious situation and, if needed, signal for backup.
Exploring the millennia for clues about innate immunity takes a special team, and Bass has assembled a talented one. It includes her Utah colleague Nels Elde, a geneticist; immunologist Dan Stetson, University of Washington, Seattle; and biochemist Jane Jackman, Ohio State University, Columbus.
With a 2020 NIH Director’s Transformative Research Award, this hard-working team will embark on studies looking back at 450 million years of evolution: the point in time when animals diverged to develop very distinct methods of innate immune defense . The team members hope to uncover new possibilities encoded in the innate immune system, especially those that might be latent but still workable. The researchers will then explore whether their finds can be repurposed not only to boost our body’s natural response to external threats but also to internal threats like cancer.
Bass brings a unique perspective to the project. As a postdoc in the 1980s, she stumbled upon a whole new class of enzymes, called ADARs, that edit RNA . Their function was mysterious at the time. It turns out that ADARs specifically edit a molecule called double-stranded RNA (dsRNA). When viruses infect cells in animals, including humans, they make dsRNA, which the innate immune system detects as a sign that a cell has been invaded.
It also turns out that animal cells make their own dsRNA. Over the years, Bass and her lab have identified thousands of dsRNAs made in animal cells—in fact, a significant number of human genes produce dsRNA . Also interesting, ADARs are crucial to marking our own dsRNA as “self” to avoid triggering an immune response when we don’t need it .
Bass and others have found that evolution has produced dramatic differences in the biochemical pathways powering the innate immune system. In vertebrate animals, dsRNA leads to release of the immune chemical interferon, a signaling pathway that invertebrate species don’t have. Instead, in response to detecting dsRNA from an invader, and repelling it, worms and other invertebrates trigger a gene-silencing pathway known as RNA interference, or RNAi.
With the new funding, Bass and team plan to mix and match immune strategies from simple and advanced species, across evolutionary time, to craft an entirely new set of immune tools to fight disease. The team will also build new types of targeted immunotherapies based on the principles of innate immunity. Current immunotherapies, which harness a person’s own immune system to fight disease, target infections, autoimmune disorders, and cancer. But they work through our second-line adaptive immune response, which is a biological system unique to vertebrates.
Bass and her team will first hunt for more molecules like ADARs: innate immune checkpoints, as they refer to them. The name comes from a functional resemblance to the better-known adaptive immune checkpoints PD-1 and CTLA-4, which sparked a revolution in cancer immunotherapy. The team will run several screens that sort molecules successful at activating innate immune responses—both in invertebrates and in mammals—hoping to identify a range of durable new immune switches that evolution skipped over but that might be repurposed today.
Another intriguing direction for this research stems from the observation that decreasing normal levels of ADARs in tumors kickstarts innate immune responses that kill cancer cells . Along these lines, the scientists plan to test newly identified immune switches to look for novel ways to fight cancer where existing approaches have not worked.
Evolution is the founding principle for all of biology—organisms learn from what works to improve their ability to survive. In this case, research to re-examine such lessons and apply them for new uses may help transform bygone evolution into a therapeutic revolution!
 Evolution of adaptive immunity from transposable elements combined with innate immune systems. Koonin EV, Krupovic M. Nat Rev Genet. 2015 Mar;16(3):184-192.
 A developmentally regulated activity that unwinds RNA duplexes. Bass BL, Weintraub H. Cell. 1987 Feb 27;48(4):607-613.
 Mapping the dsRNA World. Reich DP, Bass BL. Cold Spring Harb Perspect Biol. 2019 Mar 1;11(3):a035352.
 To protect and modify double-stranded RNA – the critical roles of ADARs in development, immunity and oncogenesis. Erdmann EA, Mahapatra A, Mukherjee P, Yang B, Hundley HA. Crit Rev Biochem Mol Biol. 2021 Feb;56(1):54-87.
 Loss of ADAR1 in tumours overcomes resistance to immune checkpoint blockade. Ishizuka JJ, Manguso RT, Cheruiyot CK, Bi K, Panda A, et al. Nature. 2019 Jan;565(7737):43-48.
Bass Lab (University of Utah, Salt Lake City)
Elde Lab (University of Utah)
Jackman Lab (Ohio State University, Columbus)
Stetson Lab (University of Washington, Seattle)
Bass/Elde/Jackman/Stetson Project Information (NIH RePORTER)
NIH Director’s Transformative Research Award Program (Common Fund)
NIH Support: Common Fund; National Cancer Institute
Posted on by Dr. Francis Collins
Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.
Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.
But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”
When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.
The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.
The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.
Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.
They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.
What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.
All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.
The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.
Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.
 Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]
Coronavirus (COVID-19) (NIH)
Vitali Sintchenko (University of Sydney, Australia)