Using R2D2 to Understand RNA Folding

If you love learning more about biology at a fundamental level, I have a great video for you! It simulates the 3D folding of RNA. RNA is a single stranded molecule, but it is still capable of forming internal loops that can be stabilized by base pairing, just like its famously double-stranded parent, DNA. Understanding more about RNA folding may be valuable in many different areas of biomedical research, including developing ways to help people with RNA-related diseases, such as certain cancers and neuromuscular disorders, and designing better mRNA vaccines against infectious disease threats (like COVID-19).

Because RNA folding starts even while an RNA is still being made in the cell, the process has proven hugely challenging to follow closely. An innovative solution, shown in this video, comes from the labs of NIH grantees Julius Lucks, Northwestern University, Evanston, IL, and Alan Chen, State University of New York at Albany. The team, led by graduate student Angela Yu and including several diehard Star Wars fans, realized that to visualize RNA folding they needed a technology platform that, like a Star Wars droid, is able to “see” things that others can’t. So, they created R2D2, which is short for Reconstructing RNA Dynamics from Data.

What’s so groundbreaking about the R2D2 approach, which was published recently in Molecular Cell, is that it combines experimental data on RNA folding at the nucleotide level with predictive algorithms at the atomic level to simulate RNA folding in ultra-slow motion [1]. While other computer simulations have been available for decades, they have lacked much-needed experimental data of this complex folding process to confirm their mathematical modeling.

As a gene is transcribed into RNA one building block, or nucleotide, at a time, the elongating RNA strand folds immediately before the whole molecule is fully assembled. But such folding can create a problem: the new strand can tie itself up into a knot-like structure that’s incompatible with the shape it needs to function in a cell.

To slip this knot, the cell has evolved immediate corrective pathways, or countermoves. In this R2D2 video, you can see one countermove called a toehold-mediated strand displacement. In this example, the maneuver is performed by an ancient molecule called a single recognition particle (SRP) RNA. Though SRP RNAs are found in all forms of life, this one comes from the bacterium Escherichia coli and is made up of 114 nucleotides.

The colors in this video highlight different domains of the RNA molecule, all at different stages in the folding process. Some (orange, turquoise) have already folded properly, while another domain (dark purple) is temporarily knotted. For this knotted domain to slip its knot, about 5 seconds into the video, another newly forming region (fuchsia) wiggles down to gain a “toehold.” About 9 seconds in, the temporarily knotted domain untangles and unwinds, and, finally, at about 23 seconds, the strand starts to get reconfigured into the shape it needs to do its job in the cell.

Why would evolution favor such a seemingly inefficient folding process? Well, it might not be inefficient as it first appears. In fact, as Chen noted, some nanotechnologists previously invented toehold displacement as a design principle for generating synthetic DNA and RNA circuits. Little did they know that nature may have scooped them many millennia ago!


[1] Computationally reconstructing cotranscriptional RNA folding from experimental data reveals rearrangement of non-naïve folding intermediates. Yu AM, Gasper PM Cheng L, Chen AA, Lucks JB, et. al. Molecular Cell 8, 1-14. 18 February 2021.


Genome Data Help Track Community Spread of COVID-19

RNA Virus
Credit: iStock/vchal

Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.

Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.

But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”

When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.

The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.

The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.

Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.

They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.

What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.

All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.

The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.

Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.


[1] Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]


The Prime Cellular Targets for the Novel Coronavirus

Credit: NIH

There’s still a lot to learn about SARS-CoV-2, the novel coronavirus that causes COVID-19. But it has been remarkable and gratifying to watch researchers from around the world pull together and share their time, expertise, and hard-earned data in the urgent quest to control this devastating virus.

That collaborative spirit was on full display in a recent study that characterized the specific human cells that SARS-CoV-2 likely singles out for infection [1]. This information can now be used to study precisely how each cell type interacts with the virus. It might ultimately help to explain why some people are more susceptible to SARS-CoV-2 than others, and how exactly to target the virus with drugs, immunotherapies, and vaccines to prevent or treat infections.

This work was driven by the mostly shuttered labs of Alex K. Shalek, Massachusetts Institute of Technology, Ragon Institute of MGH, MIT, and Harvard, and Broad Institute of MIT and Harvard, Cambridge; and Jose Ordovas-Montanes at Boston Children’s Hospital. In the end, it brought together (if only remotely) dozens of their colleagues in the Human Cell Atlas Lung Biological Network and others across the U.S., Europe, and South Africa.

The project began when Shalek, Ordovas-Montanes, and others read that before infecting human cells, SARS-CoV-2 docks on a protein receptor called angiotensin-converting enzyme 2 (ACE2). This enzyme plays a role in helping the body maintain blood pressure and fluid balance.

The group was intrigued, especially when they also learned about a second enzyme that the virus uses to enter cells. This enzyme goes by the long acronym TMPRSS2, and it gets “tricked” into priming the spike proteins that cover SARS-CoV-2 to attack the cell. It’s the combination of these two proteins that provide a welcome mat for the virus.

Shalek, Ordovas-Montanes, and an international team including graduate students, post-docs, staff scientists, and principal investigators decided to dig a little deeper to find out precisely where in the body one finds cells that express this gene combination. Their curiosity took them to the wealth of data they and others had generated from model organisms and humans, the latter as part of the Human Cell Atlas. This collaborative international project is producing a comprehensive reference map of all human cells. For its first draft, the Human Cell Atlas aims to gather information on at least 10 billion cells.

To gather this information, the project relies, in part, on relatively new capabilities in sequencing the RNA of individual cells. Keep in mind that every cell in the body has essentially the same DNA genome. But different cells use different programs to decide which genes to turn on—expressing those as RNA molecules that can be translated into protein. The single-cell analysis of RNA allows them to characterize the gene expression and activities within each and every unique cell type. Based on what was known about the virus and the symptoms of COVID-19, the team focused their attention on the hundreds of cell types they identified in the lungs, nasal passages, and intestines.

As reported in Cell, by filtering through the data to identify cells that express ACE2 and TMPRSS2, the researchers narrowed the list of cell types in the nasal passages down to the mucus-producing goblet secretory cells. In the lung, evidence for activity of these two genes turned up in cells called type II pneumocytes, which line small air sacs known as alveoli and help to keep them open. In the intestine, it was the absorptive enterocytes, which play an important role in the body’s ability to take in nutrients.

The data also turned up another unexpected and potentially important connection. In these cells of interest, all of which are found in epithelial tissues that cover or line body surfaces, the ACE2 gene appeared to ramp up its activity in concert with other genes known to respond to interferon, a protein that the body makes in response to viral infections.

To dig further in the lab, the researchers treated cultured cells that line airways in the lungs with interferon. And indeed, the treatment increased ACE2 expression.

Earlier studies have suggested that ACE2 helps the lungs to tolerate damage. Completely missed was its connection to the interferon response. The researchers now suspect that’s because it hadn’t been studied in these specific human epithelial cells before.

The discovery suggests that SARS-CoV-2 and potentially other coronaviruses that rely on ACE2 may take advantage of the immune system’s natural defenses. When the body responds to the infection by producing more interferon, that in turn results in production of more ACE2, enhancing the ability of the virus to attach more readily to lung cells. While much more work is needed, the finding indicates that any potential use of interferon as a treatment to fight COVID-19 will require careful monitoring to determine if and when it might help patients.

It’s clear that these new findings, from data that weren’t originally generated with COVID-19 in mind, contained several potentially important new leads. This is another demonstration of the value of basic science. We can also rest assured that, with the outpouring of effort from members of the scientific community around the globe to meet this new challenge, progress along these and many other fronts will continue at a remarkable pace.


[1] SARS-CoV-2 receptor ACE2 is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues. Ziegler, CGK et al. Cell. April 20, 2020.


A Neuronal Light Show

Credit: Chen X, Cell, 2019

These colorful lights might look like a video vignette from one of the spectacular evening light shows taking place this holiday season. But they actually aren’t. These lights are illuminating the way to a much fuller understanding of the mammalian brain.

The video features a new research method called BARseq (Barcoded Anatomy Resolved by Sequencing). Created by a team of NIH-funded researchers led by Anthony Zador, Cold Spring Harbor Laboratory, NY, BARseq enables scientists to map in a matter of weeks the location of thousands of neurons in the mouse brain with greater precision than has ever been possible before.

How does it work? With BARseq, researchers generate uniquely identifying RNA barcodes and then tag one to each individual neuron within brain tissue. As reported recently in the journal Cell, those barcodes allow them to keep track of the location of an individual cell amid millions of neurons [1]. This also enables researchers to map the tangled paths of individual neurons from one region of the mouse brain to the next.

The video shows how the researchers read the barcodes. Each twinkling light is a barcoded neuron within a thin slice of mouse brain tissue. The changing colors from frame to frame correspond to one of the four letters, or chemical bases, in RNA (A=purple, G=blue, U=yellow, and C=white). A neuron that flashes blue, purple, yellow, white is tagged with a barcode that reads GAUC, while yellow, white, white, white is UCCC.

By sequencing and reading the barcodes to distinguish among seemingly identical cells, the researchers mapped the connections of more than 3,500 neurons in a mouse’s auditory cortex, a part of the brain involved in hearing. In fact, they report they’re now able to map tens of thousands of individual neurons in a mouse in a matter of weeks.

What makes BARseq even better than the team’s previous mapping approach, called MAPseq, is its ability to read the barcodes at their original location in the brain tissue [2]. As a result, they can produce maps with much finer resolution. It’s also possible to maintain other important information about each mapped neuron’s identity and function, including the expression of its genes.

Zador reports that they’re continuing to use BARseq to produce maps of other essential areas of the mouse brain with more detail than had previously been possible. Ultimately, these maps will provide a firm foundation for better understanding of human thought, consciousness, and decision-making, along with how such mental processes get altered in conditions such as autism spectrum disorder, schizophrenia, and depression.

Here’s wishing everyone a safe and happy holiday season. It’s been a fantastic year in science, and I look forward to bringing you more cool NIH-supported research in 2020!


[1] High-Throughput Mapping of Long-Range Neuronal Projection Using In Situ Sequencing. Chen X, Sun YC, Zhan H, Kebschull JM, Fischer S, Matho K, Huang ZJ, Gillis J, Zador AM. Cell. 2019 Oct 17;179(3):772-786.e19.

[2] High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA. Kebschull JM, Garcia da Silva P, Reid AP, Peikon ID, Albeanu DF, Zador AM. Neuron. 2016 Sep 7;91(5):975-987.


Using MicroRNA to Starve a Tumor?

Posted on by

Endothelial cells are inhibited from sprouting
Credit: Dudley Lab, University of Virginia School of Medicine, Charlottesville

Tumor cells thrive by exploiting the willingness of normal cells in their neighborhood to act as accomplices. One of their sneakier stunts involves tricking the body into helping them form new blood vessels. This growth-enabling process of sprouting new blood vessels, called tumor angiogenesis, remains a vital area of cancer research and continues to yield important clues into how to beat this deadly disease.

The two-panel image above shows one such promising lead from recent lab studies with endothelial cells, specialized cells that line the inside of all blood vessels. In tumors, endothelial cells are induced to issue non-stop SOS signals that falsely alert the body to dispatch needed materials to rescue these cells. The endothelial cells then use the help to replicate and sprout new blood vessels.

The left panel demonstrates the basics of this growth process under normal conditions. Endothelial cells (red and blue) were cultured under special conditions that help them grow in the lab. When given the right cues, those cells sprout spiky extensions to form new vessels.

But in the right panel, the cells can’t sprout. The reason is because the cells are bathed in a molecule called miR-30c, which isn’t visible in the photo. These specialized microRNA molecules—and humans make a few thousand different versions of them—control protein production by binding to and disabling longer RNA templates, called messenger RNA.

This new anti-angiogenic lead, published in the Journal of Clinical Investigation, comes from a research team led by Andrew Dudley, University of Virginia Medical School, Charlottesville [1]. The team made its discovery while studying a protein called TGF-beta that tumors like to exploit to fuel their growth.

Their studies in mice showed that loss of TGF-beta signals in endothelial cells blocked the growth of new blood vessels and thus tumors. Further study showed that those effects were due in part to elevated levels of miR-30c. The two interact in endothelial cells as part of a previously unrecognized signaling pathway that coordinates the growth of new blood vessels in tumors.

Dudley’s team went on to show that levels of miR-30c vary widely amongst endothelial cells, even when those cells come from the very same tumor. Cells rich in miR-30c struggled to sprout new vessels, while those with less of this microRNA grew new vessels with ease.

Intriguingly, they found that levels of this microRNA also predicted the outcomes for patients with breast cancer. Those whose cancers had high levels of the vessel-stunting miR-30c fared better than those with lower miR-30c levels. While more research is needed, it does offer a potentially promising new lead in the fight against cancer.


[1] Endothelial miR-30c suppresses tumor growth via inhibition of TGF-β-induced Serpine1. McCann JV, Xiao L, Kim DJ, Khan OF, Kowalski PS, Anderson DG, Pecot CV, Azam SH, Parker JS, Tsai YS, Wolberg AS, Turner SD, Tatsumi K, Mackman N, Dudley AC. J Clin Invest. 2019 Mar 11;130:1654-1670.


