RNA
New Approach to ‘Liquid Biopsy’ Relies on Repetitive RNA in the Bloodstream
Posted on by Lawrence Tabak, D.D.S., Ph.D.

It’s always best to diagnose cancer at an early stage when treatment is most likely to succeed. Unfortunately, far too many cancers are still detected only after cancer cells have escaped from a primary tumor and spread to distant parts of the body. This explains why there’s been so much effort in recent years to develop liquid biopsies, which are tests that can pick up on circulating cancer cells or molecular signs of cancer in blood or other bodily fluids and reliably trace them back to the organ in which a potentially life-threatening tumor is growing.
Earlier methods to develop liquid biopsies for detecting cancers often have relied on the presence of cancer-related proteins and/or DNA in the bloodstream. Now, an NIH-supported research team has encouraging evidence to suggest that this general approach to detecting cancers—including aggressive pancreatic cancers—may work even better by taking advantage of signals from a lesser-known form of genetic material called noncoding RNA.
The findings reported in Nature Biomedical Engineering suggest that the new liquid biopsy approach may aid in the diagnosis of many forms of cancer [1]. The studies show that the sensitivity of the tests varies—a highly sensitive test is one that rarely misses cases of disease. However, they already have evidence that millions of circulating RNA molecules may hold promise for detecting cancers of the liver, esophagus, colon, stomach, and lung.
How does it work? The human genome contains about 3 billion paired DNA letters. Most of those letters are transcribed, or copied, into single-stranded RNA molecules. While RNA is best known for encoding proteins that do the work of the cell, most RNA never gets translated into proteins at all. This noncoding RNA includes repetitive RNA that can be transcribed from millions of repeat elements—patterns of the same few DNA letters occurring multiple times in the genome.
Common approaches to studying RNA don’t analyze repetitive RNA, so its usefulness as a diagnostic tool has been unclear—until recently. Last year, the lab of Daniel Kim at the University of California, Santa Cruz reported [2] that a key genetic mutation that occurs early on in some cancers causes repetitive RNA molecules to be secreted in large quantities from cancer cells, even at the earliest stages of cancer. Non-cancerous cells, by comparison, release much less repetitive RNA.
The findings suggested that liquid biopsy tests that look for this repetitive, noncoding RNA might offer a powerful new way to detect cancers sooner, according to the authors. But first they needed a method capable of measuring it. Due to its oftentimes uncertain functions, the researchers have referred to repetitive, noncoding RNA as “dark matter.”
Using a liquid biopsy platform they developed called COMPLETE-seq, Kim’s team trained computers to detect cancers by looking for patterns in RNA data. The platform enables sequencing and analysis of all protein coding and noncoding RNAs—including any RNA from more than 5 million repeat elements—present in a blood sample. They found that their classifiers worked better when repetitive RNAs were included. The findings lend support to the idea that repetitive, noncoding RNA in the bloodstream is a rich source of information for detecting cancers, which has previously been overlooked.
In a study comparing blood samples from healthy people to those with pancreatic cancer, the COMPLETE-seq technology showed that nearly all people in the study with pancreatic cancer had more repetitive, noncoding RNA in their blood samples compared to healthy people, according to the researchers. They used the COMPLETE-seq test on blood samples from people with other types of cancer as well. For example, their test accurately detected 91% of colorectal cancer samples and 93% of lung cancer samples.
They now plan to look at many more cancer types with samples from additional patients representing a broad range of cancer stages. The goal is to develop a single RNA liquid biopsy test that could detect multiple forms of cancer with a high degree of accuracy and specificity. They note that such a test might also be used to guide treatment decisions and more readily detect a cancer’s recurrence. The hope is that one day a comprehensive liquid biopsy test including coding and noncoding RNA will catch many more cancers sooner, when treatment can be most successful.
References:
[1] RE Reggiardo et al. Profiling of repetitive RNA sequences in the blood plasma of patients with cancer. Nature Biomedical Engineering DOI: 10.1038/s41551-023-01081-7 (2023).
[2] RE Reggiardo et al. Mutant KRAS regulates transposable element RNA and innate immunity via KRAB zinc-finger genes. Cell Reports DOI: 10.1016/j.celrep.2022.111104 (2022).
Links:
Daniel Kim Lab (UC Santa Cruz)
Cancer Screening Overview (National Cancer Institute/NIH)
Early Detection (National Cancer Institute/NIH)
NIH Support: National Cancer Institute, National Heart, Lung, and Blood Institute, National Institute of Diabetes and Digestive and Kidney Diseases
New Tool Predicts Response to Immunotherapy in Lung Cancer Patients
Posted on by Douglas M. Sheeley, Sc.D., NIH Common Fund

With just a blood sample from a patient, a promising technology has the potential to accurately diagnose non-small cell lung cancer (NSCLC), the most-common form of the disease, more than 90 percent of the time. The same technology can even predict from the same blood sample whether a patient will respond well to a targeted immunotherapy treatment.
This work is a good example of research supported by the NIH Common Fund. Many Common Fund programs support development of new tools that catalyze research across the full spectrum of biomedical science without focusing on a single disease or organ system.
The emerging NSCLC prediction technology was developed as part of our Extracellular RNA Communication Program. The program develops technologies to understand RNA circulating in the body, known as extracellular RNA (exRNA). These molecules can be easily accessed in bodily fluids such as blood, urine, and saliva, and they have enormous potential as biomarkers to better understand cancer and other diseases.
When the body’s immune system detects a developing tumor, it activates various immune cells that work together to kill the suspicious cells. But many tumors have found a way to evade the immune system by producing a protein called PD-L1.
Displayed on the surface of a cancer cell, PD-L1 can bind to a protein found on immune cells with the similar designation of PD-1. The binding of the two proteins keeps immune cells from killing tumor cells. One type of immunotherapy interferes with this binding process and can restore the natural ability of the immune system to kill the tumor cells.
However, tumors differ from person to person, and this form of cancer immunotherapy doesn’t work for everyone. People with higher levels of PD-L1 in their tumors generally have better response rates to immunotherapy, and that’s why oncologists test for the protein before attempting the treatment.
Because cancer cells within a tumor can vary greatly, a single biopsy taken at a single site in the tumor may miss cells with PD-L1. In fact, current prediction technologies using tissue biopsies correctly predict just 20 – 40 percent of NSCLC patients who will respond well to immunotherapy. This means some people receive immunotherapy who shouldn’t, while others don’t get it who might benefit.
To improve these predictions, a research team led by Eduardo Reátegui, The Ohio State University, Columbus, engineered a new technology to measure exRNA and proteins found within and on the surface of extracellular vesicles (EVs) [1]. EVs are tiny molecular containers released by cells. They carry RNA and proteins (including PD-L1) throughout the body and are known to play a role in communication between cells.
As the illustration above shows, EVs can be shed from tumors and then circulate in the bloodstream. That means their characteristics and internal cargo, including exRNA, can provide insight into the features of a tumor. But collecting EVs, breaking them open, and pooling their contents for assessment means that molecules occurring in small quantities (like PD-L1) can get lost in the mix. It also exposes delicate exRNA molecules to potential breakdown outside the protective EV.
The new technology solves these problems. It sorts and isolates individual EVs and measures both PD-1 and PD-L1 proteins, as well as exRNA that contains their genetic codes. This provides a more comprehensive picture of PD-L1 production within the tumor compared to a single biopsy sample. But also, measuring surface proteins and the contents of individual EVs makes this technique exquisitely sensitive.
By measuring proteins and the exRNA cargo from individual EVs, Reátegui and team found that the technology correctly predicted whether a patient had NSCLC 93.2 percent of the time. It also predicted immunotherapy response with an accuracy of 72.2 percent, far exceeding the current gold standard method.
The researchers are working on scaling up the technology, which would increase precision and allow for more simultaneous measurements. They are also working with the James Comprehensive Cancer Center at The Ohio State University to expand their testing. That includes validating the technology using banked clinical samples of blood and other bodily fluids from large groups of cancer patients. With continued development, this new technology could improve NSCLC treatment while, critically, lowering its cost.
The real power of the technology, though, lies in its flexibility. Its components can be swapped out to recognize any number of marker molecules for other diseases and conditions. That includes other cancers, neurodegenerative diseases, traumatic brain injury, viral diseases, and cardiovascular diseases. This broad applicability is an example of how Common Fund investments catalyze advances across the research spectrum that will help many people now and in the future.
Reference:
[1] An immunogold single extracellular vesicular RNA and protein (AuSERP) biochip to predict responses to immunotherapy in non-small cell lung cancer patients. Nguyen LTH, Zhang J, Rima XY, Wang X, Kwak KJ, Okimoto T, Amann J, Yoon MJ, Shukuya T, Chiang CL, Walters N, Ma Y, Belcher D, Li H, Palmer AF, Carbone DP, Lee LJ, Reátegui E. J Extracell Vesicles. 11(9):e12258. doi: 10.1002/jev2.12258.
Links:
Video: Unlocking the Mysteries of Extracellular RNA Communication (Common Fund)
Extracellular RNA Communication Program (ERCC) (Common Fund)
Upcoming Meeting: ERCC19 Research Meeting (May 1-2, 2023)
Eduardo Reátegui Group for Bioengineering Research (The Ohio State University College of Engineering, Columbus)
Note: Dr. Lawrence Tabak, who performs the duties of the NIH Director, has asked the heads of NIH’s Institutes, Centers, and Offices to contribute occasional guest posts to the blog to highlight some of the interesting science that they support and conduct. This is the 27th in the series of NIH guest posts that will run until a new permanent NIH director is in place.
Visualizing The Placenta, a Critical but Poorly Understood Organ
Posted on by Diana W. Bianchi, M.D., Eunice Kennedy Shriver National Institute of Child Health and Human Development

The placenta is the Rodney Dangerfield of organs; it gets no respect, no respect at all. This short-lived but critical organ supports pregnancy by bringing nutrients and oxygen to the fetus, removing waste, providing immune protection, and producing hormones to support fetal development.
It also influences the lifelong health of both mother and child. Problems with the placenta can lead to preeclampsia, gestational diabetes, poor fetal growth, preterm birth, and stillbirth. Although we were all connected to one, the placenta is the least understood, and least studied, of all human organs.
What we do know about the human placenta largely comes from studying it after delivery. But that’s like studying the heart after it’s stopped beating. It doesn’t help us predict complications in time to avert a crisis.
To fill these knowledge gaps, NIH’s Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) developed the Human Placenta Project (HPP) to noninvasively study the placenta during pregnancy. Since 2014, this approximately $88 million collaborative research effort has been developing ultrasound, magnetic resonance imaging (MRI), and blood-based biomarker methods to study how the placenta functions in real time and in greater detail.
As illustrated in the image above, advanced ultrasound tools allowed HPP researchers at Eastern Virginia Medical School, Norfolk, and the University of Texas Medical Branch, Galveston, to gain a detailed look at the placenta’s intricate arrangement of blood vessels, or vasculature. By evaluating both fetal (left panel) and maternal (right panel) placental vasculature in 610 pregnant people starting at 13 weeks of gestation, the investigators aimed to identify early changes that predicted later complications.
They observed that such changes can start in the first trimester and affect both the vasculature and placental tissue. While further research is needed, these findings suggest that placental ultrasound monitoring can inform efforts to prevent and treat pregnancy complications.
Another HPP team led by Boston Children’s Hospital is developing an MRI strategy to monitor blood flow and oxygen transport through the placenta during pregnancy. Interpreting and visualizing MRI data of the placenta is challenging because of its variable shape, the tendency of muscles in the uterus to begin tightening or contracting well before labor [1], and other factors.
As shown in the video above, the researchers developed a way to account for the motion of the uterus and “freeze” the placenta to make it easier to study (left two panels of video) [2]. They also developed algorithms to better visualize the complex patterns of placental oxygen content during contractions (center panel) [3]. The scientists then carried out initial visualizations of blood flow through the placenta shortly after delivery (second panel from right) [4].
They now intend to map these MRI findings to the placenta itself after delivery (far right panel), which will allow them to explore how additional factors such as gene expression patterns and genetic variants contribute to placental function. Ultimately, they plan to apply these MRI techniques to monitor the placenta in real time during pregnancy and identify changes that indicate compromised function early enough to adjust maternal management as needed.
Other HPP efforts focus on identifying components in maternal blood that reflect the status of the placenta. For example, an HPP research team led by scientists at the University of California, Los Angeles, adapted non-invasive prenatal testing methods to analyze genetic material shed from the placenta into the maternal bloodstream. Their findings suggest that distinctive patterns in this genetic material detected early in pregnancy may indicate risk for later complications [5].
Another HPP team, led by investigators at Columbia University, New York, helped establish that extracellular RNAs (exRNAs) released by the placenta into maternal circulation reflect the placenta’s status at a cellular level beginning in the first trimester. To harness the potential of exRNA biomarkers, the investigators are optimizing methods to isolate, sequence, and analyze exRNAs in maternal blood.
These are just a few examples of the cutting-edge work being funded through the HPP, which complements NICHD’s longstanding investment in basic research to unravel the physiology of and real-time gene expression in the placenta. Unlocking the secrets of the placenta may one day help us to prevent and treat a range of common pregnancy complications, while also providing insights into other areas of science and medicine such as cardiovascular disease and aging. NICHD is committed to giving this important organ the respect it deserves.
References:
[1] Placental MRI: Effect of maternal position and uterine contractions on placental BOLD MRI measurements. Abaci Turk E, Abulnaga SM, Luo J, Stout JN, Feldman H, Turk A, Gagoski B, Wald LL, Adalsteinsson E, Roberts DJ, Bibbo C, Robinson JN, Golland P, Grant PE, Barth, Jr WH. Placenta. 2020 Jun 1; 95: 69-77.
[2] Spatiotemporal alignment of in utero BOLD-MRI series. Turk EA, Luo J, Gagoski B, Pascau J, Bibbo C, Robinson JN, Grant PE, Adalsteinsson E, Golland P, Malpica N. J Magn Reson Imaging. 2017 Aug;46(2):403-412.
[3] Volumetric parameterization of the placenta to a flattened template. Abulnaga SM, Turk EA, Bessmeltsev M, Grant PE, Solomon J, Golland P. IEEE transactions on medical imaging. 2022 April;41(4):925-936.
[4] Placental MRI: development of an MRI compatible ex vivo system for whole placenta dual perfusion. Stout JN, Rouhani S, Turk EA, Ha CG, Luo J, Rich K, Wald LL, Adalsteinsson E, Barth, Jr WH, Grant PE, Roberts DJ. Placenta. 2020 Nov 1; 101: 4-12.
[5] Cell-free DNA methylation and transcriptomic signature prediction of pregnancies with adverse outcomes. Del Vecchio G, Li Q, Li W, Thamotharan S, Tosevska A, Morselli M, Sung K, Janzen C, Zhou X, Pellegrini M, Devaskar SU. Epigenetics. 2021 Jun;16(6):642-661.
Links:
Human Placenta Project (Eunice Kennedy Shriver National Institute of Child Health and Human Development/NIH)
Preeclampsia (NICHD)
Understanding Gestational Diabetes (NICHD)
Preterm Labor and Birth (NICHD)
Stillbirth (NICHD)
Abuhamad Project Information (NIH RePORTER)
Grant Project Information (NIH RePORTER)
Devaskar Project Information (NIH RePORTER)
Williams Project Information (NIH RePORTER)
Note: Acting NIH Director Lawrence Tabak has asked the heads of NIH’s Institutes and Centers (ICs) to contribute occasional guest posts to the blog to highlight some of the interesting science that they support and conduct. This is the 10th in the series of NIH IC guest posts that will run until a new permanent NIH director is in place.
Millions of Single-Cell Analyses Yield Most Comprehensive Human Cell Atlas Yet
Posted on by Lawrence Tabak, D.D.S., Ph.D.

There are 37 trillion or so cells in our bodies that work together to give us life. But it may surprise you that we still haven’t put a good number on how many distinct cell types there are within those trillions of cells.
That’s why in 2016, a team of researchers from around the globe launched a historic project called the Human Cell Atlas (HCA) consortium to identify and define the hundreds of presumed distinct cell types in our bodies. Knowing where each cell type resides in the body, and which genes each one turns on or off to create its own unique molecular identity, will revolutionize our studies of human biology and medicine across the board.
Since its launch, the HCA has progressed rapidly. In fact, it has already reached an important milestone with the recent publication in the journal Science of four studies that, together, comprise the first multi-tissue drafts of the human cell atlas. This draft, based on analyses of millions of cells, defines more than 500 different cell types in more than 30 human tissues. A second draft, with even finer definition, is already in the works.
Making the HCA possible are recent technological advances in RNA sequencing. RNA sequencing is a topic that’s been mentioned frequently on this blog in a range of research areas, from neuroscience to skin rashes. Researchers use it to detect and analyze all the messenger RNA (mRNA) molecules in a biological sample, in this case individual human cells from a wide range of tissues, organs, and individuals who voluntarily donated their tissues.
By quantifying these RNA messages, researchers can capture the thousands of genes that any given cell actively expresses at any one time. These precise gene expression profiles can be used to catalogue cells from throughout the body and understand the important similarities and differences among them.
In one of the published studies, funded in part by the NIH, a team co-led by Aviv Regev, a founding co-chair of the consortium at the Broad Institute of MIT and Harvard, Cambridge, MA, established a framework for multi-tissue human cell atlases [1]. (Regev is now on leave from the Broad Institute and MIT and has recently moved to Genentech Research and Early Development, South San Francisco, CA.)
Among its many advances, Regev’s team optimized single-cell RNA sequencing for use on cell nuclei isolated from frozen tissue. This technological advance paved the way for single-cell analyses of the vast numbers of samples that are stored in research collections and freezers all around the world.
Using their new pipeline, Regev and team built an atlas including more than 200,000 single-cell RNA sequence profiles from eight tissue types collected from 16 individuals. These samples were archived earlier by NIH’s Genotype-Tissue Expression (GTEx) project. The team’s data revealed unexpected differences among cell types but surprising similarities, too.
For example, they found that genetic profiles seen in muscle cells were also present in connective tissue cells in the lungs. Using novel machine learning approaches to help make sense of their data, they’ve linked the cells in their atlases with thousands of genetic diseases and traits to identify cell types and genetic profiles that may contribute to a wide range of human conditions.
By cross-referencing 6,000 genes previously implicated in causing specific genetic disorders with their single-cell genetic profiles, they identified new cell types that may play unexpected roles. For instance, they found some non-muscle cells that may play a role in muscular dystrophy, a group of conditions in which muscles progressively weaken. More research will be needed to make sense of these fascinating, but vital, discoveries.
The team also compared genes that are more active in specific cell types to genes with previously identified links to more complex conditions. Again, their data surprised them. They identified new cell types that may play a role in conditions such as heart disease and inflammatory bowel disease.
Two of the other papers, one of which was funded in part by NIH, explored the immune system, especially the similarities and differences among immune cells that reside in specific tissues, such as scavenging macrophages [2,3] This is a critical area of study. Most of our understanding of the immune system comes from immune cells that circulate in the bloodstream, not these resident macrophages and other immune cells.
These immune cell atlases, which are still first drafts, already provide an invaluable resource toward designing new treatments to bolster immune responses, such as vaccines and anti-cancer treatments. They also may have implications for understanding what goes wrong in various autoimmune conditions.
Scientists have been working for more than 150 years to characterize the trillions of cells in our bodies. Thanks to this timely effort and its advances in describing and cataloguing cell types, we now have a much better foundation for understanding these fundamental units of the human body.
But the latest data are just the tip of the iceberg, with vast flows of biological information from throughout the human body surely to be released in the years ahead. And while consortium members continue making history, their hard work to date is freely available to the scientific community to explore critical biological questions with far-reaching implications for human health and disease.
References:
[1] Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Segrè AV, Aguet F, Rozenblatt-Rosen O, Ardlie KG, Regev A, et al. Science. 2022 May 13;376(6594):eabl4290.
[2] Cross-tissue immune cell analysis reveals tissue-specific features in humans. Domínguez Conde C, Xu C, Jarvis LB, Rainbow DB, Farber DL, Saeb-Parsy K, Jones JL,Teichmann SA, et al. Science. 2022 May 13;376(6594):eabl5197.
[3] Mapping the developing human immune system across organs. Suo C, Dann E, Goh I, Jardine L, Marioni JC, Clatworthy MR, Haniffa M, Teichmann SA, et al. Science. 2022 May 12:eabo0510.
Links:
Ribonucleic acid (RNA) (National Human Genome Research Institute/NIH)
Studying Cells (National Institute of General Medical Sciences/NIH)
Regev Lab (Broad Institute of MIT and Harvard, Cambridge, MA)
NIH Support: Common Fund; National Cancer Institute; National Human Genome Research Institute; National Heart, Lung, and Blood Institute; National Institute on Drug Abuse; National Institute of Mental Health; National Institute on Aging; National Institute of Allergy and Infectious Diseases; National Institute of Neurological Disorders and Stroke; National Eye Institute
New Clues to Delta Variant’s Spread in Studies of Virus-Like Particles
Posted on by Dr. Francis Collins

About 70,000 people in the United States are diagnosed with COVID-19 each and every day. It’s clear that these new cases are being driven by the more-infectious Delta variant of SARS-CoV-2, the novel coronavirus that causes COVID-19. But why does the Delta variant spread more easily than other viral variants from one person to the next?
Now, an NIH-funded team has discovered at least part of Delta’s secret, and it’s not all attributable to those widely studied mutations in the spike protein that links up to human cells through the ACE2 receptor. It turns out that a specific mutation found within the N protein coding region of the Delta genome also enables the virus to pack more of its RNA code into the infected host cell. As a result, there is increased production of fully functional new viral particles, which can go on to infect someone else.
This finding, published in the journal Science [1], comes from the lab of Nobel laureate Jennifer Doudna at the Howard Hughes Medical Institute, the Gladstone Institutes, San Francisco, and the Innovative Genomics Institute at the University of California, Berkeley. Co-leading the team was Melanie Ott, Gladstone Institutes.
The Doudna and Ott teams have developed an exciting new tool to study variants of the coronavirus. It’s a lab construct called a virus-like particle (VLP). These specially made VLPs have all the structural proteins of SARS-CoV-2 (shown above), but they contain no genetic material. Consequently, they are non-infectious replicas of the real virus that can be studied safely in any lab. Scientists don’t have to reserve time in labs equipped with heightened levels of biosafety, as is required when working with whole virus.
The VLPs also allow researchers to explore changes found in the coronavirus’s other essential proteins, not just the spike protein on its surface. In fact, all of the SARS-CoV-2 variants of concern, as defined by the World Health Organization (WHO), carry at least one mutation within the same stretch of seven amino acids in a viral protein known as the nucleocapsid (N protein). This protein, which hasn’t been widely studied, is required for the virus to make more of itself. It is also involved in the virus’s ability to package and release infectious RNA.
In the Science paper, Doudna and colleagues took a closer look at the N protein. They did so by developing a special system that used VLPs to package and deliver viral RNA messages into human cells.
Here’s how it works: The VLPs include all four of SARS-CoV-2’s structural proteins, including the spike and N proteins. In addition, they contain the RNA sequence that allows the virus to recognize its genetic material within the cell, so that it can be packaged into the next generation of viral particles.
Though the particles look just like SARS-CoV-2 from the outside, they lack the vast majority of the viral genome on the inside. But they do have one other key component: a snippet of RNA that makes cells invaded by VLPs glow. In fact, the more RNA messages a VLP delivers, the brighter the cells will glow. It allowed the researchers to spot successful invasions, while also quantifying the amount of RNA a particular VLP packed into a cell.
The researchers then produced SARS-CoV-2 VLPs including four mutations that are universally found within the N proteins of more transmissible variants of concern. That’s when they discovered those variants produced and delivered 10 times more RNA messages into cells.
The increased RNA also fits with what has been observed in people infected with the Delta variant. They produce about 10 times more virus in their nose and throat compared to people infected with the older variants.
But did those findings match what happens in the real virus? To find out, the researchers and their colleagues tested the N protein mutation found in the Delta variant in a high-level biosafety lab. And, indeed, their studies showed that the mutated virus within infected human lung cells produced about 50 times more infectious virus compared to the original SARS-CoV-2 variant.
The findings suggest that the N protein could be an important new target for effective COVID-19 therapeutics, and that tracking newly emerging mutations in the N protein might also be important for identifying new viral variants of concern. This new system is a powerful tool, and one that can also be used for exploring how newly arising variants in the future might affect the course of this terrible pandemic.
Reference:
[1] Rapid assessment of SARS-CoV-2 evolved variants using virus-like particles. Syed AM, Taha TY, Tabata T, Chen IP, Ciling A, Khalid MM, Sreekumar B, Chen PY, Hayashi JM, Soczek KM, Ott M, Doudna JA. Science. 2021 Nov 4:eabl6184.
Links:
COVID-19 Research (NIH)
NIH Support: National Institute of Allergy and Infectious Diseases
Artificial Intelligence Accurately Predicts RNA Structures, Too
Posted on by Dr. Francis Collins

Researchers recently showed that a computer could “learn” from many examples of protein folding to predict the 3D structure of proteins with great speed and precision. Now a recent study in the journal Science shows that a computer also can predict the 3D shapes of RNA molecules [1]. This includes the mRNA that codes for proteins and the non-coding RNA that performs a range of cellular functions.
This work marks an important basic science advance. RNA therapeutics—from COVID-19 vaccines to cancer drugs—have already benefited millions of people and will help many more in the future. Now, the ability to predict RNA shapes quickly and accurately on a computer will help to accelerate understanding these critical molecules and expand their healthcare uses.
Like proteins, the shapes of single-stranded RNA molecules are important for their ability to function properly inside cells. Yet far less is known about these RNA structures and the rules that determine their precise shapes. The RNA elements (bases) can form internal hydrogen-bonded pairs, but the number of possible combinations of pairings is almost astronomical for any RNA molecule with more than a few dozen bases.
In hopes of moving the field forward, a team led by Stephan Eismann and Raphael Townshend in the lab of Ron Dror, Stanford University, Palo Alto, CA, looked to a machine learning approach known as deep learning. It is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.
In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened.
One of the things that makes deep learning so powerful is it doesn’t rely on any preconceived notions. It also can pick up on important features and patterns that humans can’t possibly detect. But, as successful as this approach has been in solving many different kinds of problems, it has primarily been applied to areas of biology, such as protein folding, in which lots of data were available for researchers to train the computers.
That’s not the case with RNA molecules. To work around this problem, Dror’s team designed a neural network they call ARES. (No, it’s not the Greek god of war. It’s short for Atomic Rotationally Equivariant Scorer.)
To start, the researchers trained ARES on just 18 small RNA molecules for which structures had been experimentally determined. They gave ARES these structural models specified only by their atomic structure and chemical elements.
The next test was to see if ARES could determine from this small training set the best structural model for RNA sequences it had never seen before. The researchers put it to the test with RNA molecules whose structures had been determined more recently.
ARES, however, doesn’t come up with the structures itself. Instead, the researchers give ARES a sequence and at least 1,500 possible 3D structures it might take, all generated using another computer program. Based on patterns in the training set, ARES scores each of the possible structures to find the one it predicts is closest to the actual structure. Remarkably, it does this without being provided any prior information about features important for determining RNA shapes, such as nucleotides, steric constraints, and hydrogen bonds.
It turns out that ARES consistently outperforms humans and all other previous methods to produce the best results. In fact, it outperformed at least nine other methods to come out on top in a community-wide RNA-puzzles contest. It also can make predictions about RNA molecules that are significantly larger and more complex than those upon which it was trained.
The success of ARES and this deep learning approach will help to elucidate RNA molecules with potentially important implications for health and disease. It’s another compelling example of how deep learning promises to solve many other problems in structural biology, chemistry, and the material sciences when—at the outset—very little is known.
Reference:
[1] Geometric deep learning of RNA structure. Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, Dror RO. Science. 2021 Aug 27;373(6558):1047-1051.
Links:
Structural Biology (National Institute of General Medical Sciences/NIH)
The Structures of Life (National Institute of General Medical Sciences/NIH)
RNA Biology (NIH)
Dror Lab (Stanford University, Palo Alto, CA)
NIH Support: National Cancer Institute; National Institute of General Medical Sciences
Single-Cell Study Offers New Clue into Causes of Cystic Fibrosis
Posted on by Dr. Francis Collins

More than 30 years ago, I co-led the Michigan-Toronto team that discovered that cystic fibrosis (CF) is caused by an inherited misspelling in the cystic fibrosis transmembrane conductance regulator (CFTR) gene [1]. The CFTR protein’s normal function on the surface of epithelial cells is to serve as a gated channel for chloride ions to pass in and out of the cell. But this function is lost in individuals for whom both copies of CFTR are misspelled. As a consequence, water and salt get out of balance, leading to the production of the thick mucus that leaves people with CF prone to life-threatening lung infections.
It took three decades, but that CFTR gene discovery has now led to the development of a precise triple drug therapy that activates the dysfunctional CFTR protein and provides major benefit to most children and adults with CF. But about 10 percent of individuals with CF have mutations that result in the production of virtually no CFTR protein, which means there is nothing for current triple therapy to correct or activate.
That’s why more basic research is needed to tease out other factors that contribute to CF and, if treatable, could help even more people control the condition and live longer lives with less chronic illness. A recent NIH-supported study, published in the journal Nature Medicine [2], offers an interesting basic clue, and it’s visible in the image above.
The healthy lung tissue (left) shows a well-defined and orderly layer of ciliated cells (green), which use hair-like extensions to clear away mucus and debris. Running closely alongside it is a layer of basal cells (outlined in red), which includes stem cells that are essential for repairing and regenerating upper airway tissue. (DNA indicating the position of cell is stained in blue).
In the CF-affected airways (right), those same cell types are present. However, compared to the healthy lung tissue, they appear to be in a state of disarray. Upon closer inspection, there’s something else that’s unusual if you look carefully: large numbers of a third, transitional cell subtype (outlined in red with green in the nucleus) that combines properties of both basal stem cells and ciliated cells, which is suggestive of cells in transition. The image below more clearly shows these cells (yellow arrows).

The increased number of cells with transitional characteristics suggests an unsuccessful attempt by the lungs to produce more cells capable of clearing the mucus buildup that occurs in airways of people with CF. The data offer an important foundation and reference for continued study.
These findings come from a team led by Kathrin Plath and Brigitte Gomperts, University of California, Los Angeles; John Mahoney, Cystic Fibrosis Foundation, Lexington, MA; and Barry Stripp, Cedars-Sinai, Los Angeles. Together with their lab members, they’re part of a larger research team assembled through the Cystic Fibrosis Foundation’s Epithelial Stem Cell Consortium, which seeks to learn how the disease changes the lung’s cellular makeup and use that new knowledge to make treatment advances.
In this study, researchers analyzed the lungs of 19 people with CF and another 19 individuals with no evidence of lung disease. Those with CF had donated their lungs for research in the process of receiving a lung transplant. Those with healthy lungs were organ donors who died of other causes.
The researchers analyzed, one by one, many thousands of cells from the airway and classified them into subtypes based on their distinctive RNA patterns. Those patterns indicate which genes are switched on or off in each cell, as well as the degree to which they are activated. Using a sophisticated computer-based approach to sift through and compare data, the team created a comprehensive catalog of cell types and subtypes present in healthy airways and in those affected by CF.
The new catalogs also revealed that the airways of people with CF had alterations in the types and proportions of basal cells. Those differences included a relative overabundance of cells that appeared to be transitioning from basal stem cells into the specialized ciliated cells, which are so essential for clearing mucus from the lungs.
We are not yet at our journey’s end when it comes to realizing the full dream of defeating CF. For the 10 percent of CF patients who don’t benefit from the triple-drug therapy, the continuing work to find other treatment strategies should be encouraging news. Keep daring to dream of breathing free. Through continued research, we can make the story of CF into history!
References:
[1] Identification of the cystic fibrosis gene: chromosome walking and jumping. Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N, et al. Science.1989 Sep 8;245(4922):1059-65.
[2] Transcriptional analysis of cystic fibrosis airways at single-cell resolution reveals altered epithelial cell states and composition. Carraro G, Langerman J, Sabri S, Lorenzana Z, Purkayastha A, Zhang G, Konda B, Aros CJ, Calvert BA, Szymaniak A, Wilson E, Mulligan M, Bhatt P, Lu J, Vijayaraj P, Yao C, Shia DW, Lund AJ, Israely E, Rickabaugh TM, Ernst J, Mense M, Randell SH, Vladar EK, Ryan AL, Plath K, Mahoney JE, Stripp BR, Gomperts BN. Nat Med. 2021 May;27(5):806-814.
Links:
Cystic Fibrosis (National Heart, Lung, and Blood Institute/NIH)
Kathrin Plath (University of California, Los Angeles)
Brigitte Gomperts (UCLA)
Stripp Lab (Cedars-Sinai, Los Angeles)
Cystic Fibrosis Foundation (Lexington, MA)
Epithelial Stem Cell Consortium (Cystic Fibrosis Foundation, Lexington, MA)
NIH Support: National Heart, Lung, and Blood Institute; National Institute of Diabetes and Digestive and Kidney Diseases; National Institute of General Medical Sciences; National Cancer Institute; National Center for Advancing Translational Sciences
An Evolutionary Guide to New Immunotherapies
Posted on by Dr. Francis Collins

One of the best ways to learn how something works is to understand how it’s built. How it came to be. That’s true not only if you play a guitar or repair motorcycle engines, but also if you study the biological systems that make life possible. Evolutionary studies, comparing the development of these systems across animals and organisms, are now leading to many unexpected biological discoveries and promising possibilities for preventing and treating human disease.
While there are many evolutionary questions to ask, Brenda Bass, a distinguished biochemist at University of Utah, Salt Lake City, has set her sights on a particularly profound one: How has innate immunity evolved through the millennia in all living things, including humans? Innate immunity is the immune system’s frontline defense, the first responders that take control of an emerging infectious situation and, if needed, signal for backup.
Exploring the millennia for clues about innate immunity takes a special team, and Bass has assembled a talented one. It includes her Utah colleague Nels Elde, a geneticist; immunologist Dan Stetson, University of Washington, Seattle; and biochemist Jane Jackman, Ohio State University, Columbus.
With a 2020 NIH Director’s Transformative Research Award, this hard-working team will embark on studies looking back at 450 million years of evolution: the point in time when animals diverged to develop very distinct methods of innate immune defense [1]. The team members hope to uncover new possibilities encoded in the innate immune system, especially those that might be latent but still workable. The researchers will then explore whether their finds can be repurposed not only to boost our body’s natural response to external threats but also to internal threats like cancer.
Bass brings a unique perspective to the project. As a postdoc in the 1980s, she stumbled upon a whole new class of enzymes, called ADARs, that edit RNA [2]. Their function was mysterious at the time. It turns out that ADARs specifically edit a molecule called double-stranded RNA (dsRNA). When viruses infect cells in animals, including humans, they make dsRNA, which the innate immune system detects as a sign that a cell has been invaded.
It also turns out that animal cells make their own dsRNA. Over the years, Bass and her lab have identified thousands of dsRNAs made in animal cells—in fact, a significant number of human genes produce dsRNA [3]. Also interesting, ADARs are crucial to marking our own dsRNA as “self” to avoid triggering an immune response when we don’t need it [4].
Bass and others have found that evolution has produced dramatic differences in the biochemical pathways powering the innate immune system. In vertebrate animals, dsRNA leads to release of the immune chemical interferon, a signaling pathway that invertebrate species don’t have. Instead, in response to detecting dsRNA from an invader, and repelling it, worms and other invertebrates trigger a gene-silencing pathway known as RNA interference, or RNAi.
With the new funding, Bass and team plan to mix and match immune strategies from simple and advanced species, across evolutionary time, to craft an entirely new set of immune tools to fight disease. The team will also build new types of targeted immunotherapies based on the principles of innate immunity. Current immunotherapies, which harness a person’s own immune system to fight disease, target infections, autoimmune disorders, and cancer. But they work through our second-line adaptive immune response, which is a biological system unique to vertebrates.
Bass and her team will first hunt for more molecules like ADARs: innate immune checkpoints, as they refer to them. The name comes from a functional resemblance to the better-known adaptive immune checkpoints PD-1 and CTLA-4, which sparked a revolution in cancer immunotherapy. The team will run several screens that sort molecules successful at activating innate immune responses—both in invertebrates and in mammals—hoping to identify a range of durable new immune switches that evolution skipped over but that might be repurposed today.
Another intriguing direction for this research stems from the observation that decreasing normal levels of ADARs in tumors kickstarts innate immune responses that kill cancer cells [5]. Along these lines, the scientists plan to test newly identified immune switches to look for novel ways to fight cancer where existing approaches have not worked.
Evolution is the founding principle for all of biology—organisms learn from what works to improve their ability to survive. In this case, research to re-examine such lessons and apply them for new uses may help transform bygone evolution into a therapeutic revolution!
References:
[1] Evolution of adaptive immunity from transposable elements combined with innate immune systems. Koonin EV, Krupovic M. Nat Rev Genet. 2015 Mar;16(3):184-192.
[2] A developmentally regulated activity that unwinds RNA duplexes. Bass BL, Weintraub H. Cell. 1987 Feb 27;48(4):607-613.
[3] Mapping the dsRNA World. Reich DP, Bass BL. Cold Spring Harb Perspect Biol. 2019 Mar 1;11(3):a035352.
[4] To protect and modify double-stranded RNA – the critical roles of ADARs in development, immunity and oncogenesis. Erdmann EA, Mahapatra A, Mukherjee P, Yang B, Hundley HA. Crit Rev Biochem Mol Biol. 2021 Feb;56(1):54-87.
[5] Loss of ADAR1 in tumours overcomes resistance to immune checkpoint blockade. Ishizuka JJ, Manguso RT, Cheruiyot CK, Bi K, Panda A, et al. Nature. 2019 Jan;565(7737):43-48.
Links:
Bass Lab (University of Utah, Salt Lake City)
Elde Lab (University of Utah)
Jackman Lab (Ohio State University, Columbus)
Stetson Lab (University of Washington, Seattle)
Bass/Elde/Jackman/Stetson Project Information (NIH RePORTER)
NIH Director’s Transformative Research Award Program (Common Fund)
NIH Support: Common Fund; National Cancer Institute
Genome Data Help Track Community Spread of COVID-19
Posted on by Dr. Francis Collins

Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.
Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.
But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”
When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.
The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.
The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.
Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.
They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.
What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.
All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.
The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.
Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.
Reference:
[1] Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]
Links:
Coronavirus (COVID-19) (NIH)
Vitali Sintchenko (University of Sydney, Australia)
Next Page