Skip to main content

structural biology

Artificial Intelligence Accurately Predicts Protein Folding

Posted on by

Caption: Researchers used artificial intelligence to map hundreds of new protein structures, including this 3D view of human interleukin-12 (blue) bound to its receptor (purple). Credit: Ian Haydon, University of Washington Institute for Protein Design, Seattle

Proteins are the workhorses of the cell. Mapping the precise shapes of the most important of these workhorses helps to unlock their life-supporting functions or, in the case of disease, potential for dysfunction. While the amino acid sequence of a protein provides the basis for its 3D structure, deducing the atom-by-atom map from principles of quantum mechanics has been beyond the ability of computer programs—until now. 

In a recent study in the journal Science, researchers reported they have developed artificial intelligence approaches for predicting the three-dimensional structure of proteins in record time, based solely on their one-dimensional amino acid sequences [1]. This groundbreaking approach will not only aid researchers in the lab, but guide drug developers in coming up with safer and more effective ways to treat and prevent disease.

This new NIH-supported advance is now freely available to scientists around the world. In fact, it has already helped to solve especially challenging protein structures in cases where experimental data were lacking and other modeling methods hadn’t been enough to get a final answer. It also can now provide key structural information about proteins for which more time-consuming and costly imaging data are not yet available.

The new work comes from a group led by David Baker and Minkyung Baek, University of Washington, Seattle, Institute for Protein Design. Over the course of the pandemic, Baker’s team has been working hard to design promising COVID-19 therapeutics. They’ve also been working to design proteins that might offer promising new ways to treat cancer and other conditions. As part of this effort, they’ve developed new computational approaches for determining precisely how a chain of amino acids, which are the building blocks of proteins, will fold up in space to form a finished protein.

But the ability to predict a protein’s precise structure or shape from its sequence alone had proven to be a difficult problem to solve despite decades of effort. In search of a solution, research teams from around the world have come together every two years since 1994 at the Critical Assessment of Structure Prediction (CASP) meetings. At these gatherings, teams compete against each other with the goal of developing computational methods and software capable of predicting any of nature’s 200 million or more protein structures from sequences alone with the greatest accuracy.

Last year, a London-based company called DeepMind shook up the structural biology world with their entry into CASP called AlphaFold. (AlphaFold was one of Science’s 2020 Breakthroughs of the Year.) They showed that their artificial intelligence approach—which took advantage of the 170,000 proteins with known structures in a reiterative process called deep learning—could predict protein structure with amazing accuracy. In fact, it could predict most protein structures almost as accurately as other high-resolution protein mapping techniques, including today’s go-to strategies of X-ray crystallography and cryo-EM.

The DeepMind performance showed what was possible, but because the advances were made by a world-leading deep learning company, the details on how it worked weren’t made publicly available at the time. The findings left Baker, Baek, and others eager to learn more and to see if they could replicate the impressive predictive ability of AlphaFold outside of such a well-resourced company.

In the new work, Baker and Baek’s team has made stunning progress—using only a fraction of the computational processing power and time required by AlphaFold. The new software, called RoseTTAFold, also relies on a deep learning approach. In deep learning, computers look for patterns in large collections of data. As they begin to recognize complex relationships, some connections in the network are strengthened while others are weakened. The finished network is typically composed of multiple information-processing layers, which operate on the data to return a result—in this case, a protein structure.

Given the complexity of the problem, instead of using a single neural network, RoseTTAFold relies on three. The three-track neural network integrates and simultaneously processes one-dimensional protein sequence information, two-dimensional information about the distance between amino acids, and three-dimensional atomic structure all at once. Information from these separate tracks flows back and forth to generate accurate models of proteins rapidly from sequence information alone, including structures in complex with other proteins.

As soon as the researchers had what they thought was a reasonable working approach to solve protein structures, they began sharing it with their structural biologist colleagues. In many cases, it became immediately clear that RoseTTAFold worked remarkably well. What’s more, it has been put to work to solve challenging structural biology problems that had vexed scientists for many years with earlier methods.

RoseTTAFold already has solved hundreds of new protein structures, many of which represent poorly understood human proteins. The 3D rendering of a complex showing a human protein called interleukin-12 in complex with its receptor (above image) is just one example. The researchers have generated other structures directly relevant to human health, including some that are related to lipid metabolism, inflammatory conditions, and cancer. The program is now available on the web and has been downloaded by dozens of research teams around the world.

Cryo-EM and other experimental mapping methods will remain essential to solve protein structures in the lab. But with the artificial intelligence advances demonstrated by RoseTTAFold and AlphaFold, which has now also been released in an open-source version and reported in the journal Nature [2], researchers now can make the critical protein structure predictions at their desktops. This newfound ability will be a boon to basic science studies and has great potential to speed life-saving therapeutic advances.


[1] Accurate prediction of protein structures and interactions using a three-track neural network. Baek M, DiMaio F, Anishchenko I, Dauparas J, Grishin NV, Adams PD, Read RJ, Baker D., et al. Science. 2021 Jul 15:eabj8754.

[2] Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, Green T, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. et al. Nature. 2021 Jul 15.


Structural Biology (National Institute of General Medical Sciences/NIH)

The Structures of Life (NIGMS)

Baker Lab (University of Washington, Seattle)

CASP 14 (University of California, Davis)

NIH Support: National Institute of Allergy and Infectious Diseases; National Institute of General Medical Sciences

Human Antibodies Target Many Parts of Coronavirus Spike Protein

Posted on by

Viral spike with labels Receptor-binding domain (RBD) antibody, N-terminal domain (NTD) antibody, S2 subunit antibody
Caption: People who recovered from mild COVID-19 infections produced antibodies circulating in their blood that target three different parts of the coronavirus’s spike protein (gray). Credit: University of Texas at Austin

For many people who’ve had COVID-19, the infections were thankfully mild and relatively brief. But these individuals’ immune systems still hold onto enduring clues about how best to neutralize SARS-CoV-2, the coronavirus that causes COVID-19. Discovering these clues could point the way for researchers to design highly targeted treatments that could help to save the lives of folks with more severe infections.

An NIH-funded study, published recently in the journal Science, offers the most-detailed picture yet of the array of antibodies against SARS-CoV-2 found in people who’ve fully recovered from mild cases of COVID-19. This picture suggests that an effective neutralizing immune response targets a wider swath of the virus’ now-infamous spike protein than previously recognized.

To date, most studies of natural antibodies that block SARS-CoV-2 have zeroed in on those that target a specific portion of the spike protein known as the receptor-binding domain (RBD)—and with good reason. The RBD is the portion of the spike that attaches directly to human cells. As a result, antibodies specifically targeting the RBD were an excellent place to begin the search for antibodies capable of fending off SARS-CoV-2.

The new study, led by Gregory Ippolito and Jason Lavinder, The University of Texas at Austin, took a different approach. Rather than narrowing the search, Ippolito, Lavinder, and colleagues analyzed the complete repertoire of antibodies against the spike protein from four people soon after their recoveries from mild COVID-19.

What the researchers found was a bit of a surprise: the vast majority of antibodies—about 84 percent—targeted other portions of the spike protein than the RBD. This suggests a successful immune response doesn’t concentrate on the RBD. It involves production of antibodies capable of covering areas across the entire spike.

The researchers liken the spike protein to an umbrella, with the RBD at the tip of the “canopy.” While some antibodies do bind RBD at the tip, many others apparently target the protein’s canopy, known as the N-terminal domain (NTD).

Further study in cell culture showed that NTD-directed antibodies do indeed neutralize the virus. They also prevented a lethal mouse-adapted version of the coronavirus from infecting mice.

One reason these findings are particularly noteworthy is that the NTD is one part of the viral spike protein that has mutated frequently, especially in several emerging variants of concern, including the B.1.1.7 “U.K. variant” and the B.1.351 “South African variant.” It suggests that one reason these variants are so effective at evading our immune systems to cause breakthrough infections, or re-infections, is that they’ve mutated their way around some of the human antibodies that had been most successful in combating the original coronavirus variant.

Also noteworthy, about 40 percent of the circulating antibodies target yet another portion of the spike called the S2 subunit. This finding is especially encouraging because this portion of SARS-CoV-2 does not seem as mutable as the NTD segment, suggesting that S2-directed antibodies might offer a layer of protection against a wider array of variants. What’s more, the S2 subunit may make an ideal target for a possible pan-coronavirus vaccine since this portion of the spike is widely conserved in SARS-CoV-2 and related coronaviruses.

Taken together, these findings will prove useful for designing COVID-19 vaccine booster shots or future vaccines tailored to combat SARS-COV-2 variants of concern. The findings also drive home the conclusion that the more we learn about SARS-CoV-2 and the immune system’s response to neutralize it, the better position we all will be in to thwart this novel coronavirus and any others that might emerge in the future.


[1] Prevalent, protective, and convergent IgG recognition of SARS-CoV-2 non-RBD spike epitopes. Voss WN, Hou YJ, Johnson NV, Delidakis G, Kim JE, Javanmardi K, Horton AP, Bartzoka F, Paresi CJ, Tanno Y, Chou CW, Abbasi SA, Pickens W, George K, Boutz DR, Towers DM, McDaniel JR, Billick D, Goike J, Rowe L, Batra D, Pohl J, Lee J, Gangappa S, Sambhara S, Gadush M, Wang N, Person MD, Iverson BL, Gollihar JD, Dye J, Herbert A, Finkelstein IJ, Baric RS, McLellan JS, Georgiou G, Lavinder JJ, Ippolito GC. Science. 2021 May 4:eabg5268.


COVID-19 Research (NIH)

Gregory Ippolito (University of Texas at Austin)

NIH Support: National Institute of Allergy and Infectious Diseases; National Cancer Institute; National Institute of General Medical Sciences; National Center for Advancing Translational Sciences

Dynamic View of Spike Protein Reveals Prime Targets for COVID-19 Treatments

Posted on by

SARS-CoV-2’s spike protein showing attached glycans and regions for antibody binding.
Credit: Sikora M, PLoS Comput Biol, 2021

This striking portrait features the spike protein that crowns SARS-CoV-2, the coronavirus that causes COVID-19. This highly flexible protein has settled here into one of its many possible conformations during the process of docking onto a human cell before infecting it.

This portrait, however, isn’t painted on canvas. It was created on a computer screen from sophisticated 3D simulations of the spike protein in action. The aim was to map its many shape-shifting maneuvers accurately at the atomic level in hopes of detecting exploitable structural vulnerabilities to thwart the virus.

For example, notice the many chain-like structures (green) that adorn the protein’s surface (white). They are sugar molecules called glycans that are thought to shield the spike protein by sweeping away antibodies. Also notice areas (purple) that the simulation identified as the most-attractive targets for antibodies, based on their apparent lack of protection by those glycans.

This work, published recently in the journal PLoS Computational Biology [1], was performed by a German research team that included Mateusz Sikora, Max Planck Institute of Biophysics, Frankfurt. The researchers used a computer application called molecular dynamics (MD) simulation to power up and model the conformational changes in the spike protein on a time scale of a few microseconds. (A microsecond is 0.000001 second.)

The new simulations suggest that glycans act as a dynamic shield on the spike protein. They liken them to windshield wipers on a car. Rather than being fixed in space, those glycans sweep back and forth to protect more of the protein surface than initially meets the eye.

But just as wipers miss spots on a windshield that lie beyond their tips, glycans also miss spots of the protein just beyond their reach. It’s those spots that the researchers suggest might be prime targets on the spike protein that are especially promising for the design of future vaccines and therapeutic antibodies.

This same approach can now be applied to identifying weak spots in the coronavirus’s armor. It also may help researchers understand more fully the implications of newly emerging SARS-CoV-2 variants. The hope is that by capturing this devastating virus and its most critical proteins in action, we can continue to develop and improve upon vaccines and therapeutics.


[1] Computational epitope map of SARS-CoV-2 spike protein. Sikora M, von Bülow S, Blanc FEC, Gecht M, Covino R, Hummer G. PLoS Comput Biol. 2021 Apr 1;17(4):e1008790.


COVID-19 Research (NIH)

Mateusz Sikora (Max Planck Institute of Biophysics, Frankfurt, Germany)

The surprising properties of the coronavirus envelope (Interview with Mateusz Sikora), Scilog, November 16, 2020.

Finding New Ways to Fight Coronavirus … From Studying Bats

Posted on by

David Veesler/Credit: University of Washington Medicine, Seattle

David Veesler has spent nearly 20 years imaging in near-atomic detail the parts of various viruses, including coronaviruses, that enable them to infect Homo sapiens. In fact, his lab at the University of Washington, Seattle, was the first to elucidate the 3D architecture of the now infamous spike protein, which coronaviruses use to gain entry into human cells [1]. He uses these fundamental insights to guide the design of vaccines and therapeutics, including promising monoclonal antibodies.

Now, Veesler and his lab are turning to another mammal in their search for new leads for the next generation of antiviral treatments, including ones aimed at the coronavirus that causes COVID-19, SARS-CoV-2. With support from a 2020 NIH Director’s Pioneer Award, Veesler will study members of the order Chiroptera. Or, more colloquially, bats.

Why bats? Veesler says bats are remarkable creatures. They are the only mammals capable of sustained flight. They rarely get cancer and live unusually long lives for such small creatures. More importantly for Veesler’s research, bats host a wide range of viruses—more than any other mammal species. Despite carrying all of these viruses, bats rarely show symptoms of being sick. Yet they are the source for many of the viruses that have spilled over into humans with devastating effect, including rabies, Ebola virus, Nipah and Hendra viruses, severe acute respiratory syndrome coronavirus (SARS-CoV), and, likely, SARS-CoV-2.

Beyond what is already known about bats’ intriguing qualities, Veesler says humans still have much to discover about these flying mammals, including how their immune systems cope with such an onslaught of viral invaders. For example, it turns out that a bat’s learned, or adaptive, immune system is, for the most part, uncharted territory. As such, it offers an untapped source of potentially promising viral inhibitors just waiting to be unearthed, fully characterized, and then used to guide the development of new kinds of anti-viral therapeutics.

In his studies, Veesler will work with collaborators studying bats around the world to characterize their antibody production. He wants to learn how these antibodies contribute to bats’ impressive ability to tolerate viruses and other pathogens. What is it about the structure of bat antibodies that make them different from human antibodies? And, how can those structural differences serve as blueprints for promising new treatments to combat many potentially deadly viruses?

Interestingly, Veesler’s original grant proposal makes no mention of SARS-CoV-2 or COVID-19. That’s because he submitted it just months before the first reports of the novel coronavirus in Wuhan, China. But Veesler doesn’t consider himself a visionary by expanding his research to bats. He and others had been working on closely related coronaviruses for years, inspired by earlier outbreaks, including SARS in 2002 and Middle East respiratory syndrome (MERS) in 2012 (although MERS apparently came from camels). The researcher didn’t see SARS-CoV-2 coming, but he recognized the potential for some kind of novel coronavirus outbreak in the future.

These days, the Veesler lab has been hard at work to understand SARS-CoV-2 and the human immune response to the virus. His team showed that SARS-CoV-2 uses the human receptor ACE2 to gain entry into our cells [2]. He’s also a member of the international research team that identified a human antibody, called S309, from a person who’d been infected with SARS in 2003. This antibody is showing promise for treating COVID-19 [3], now in a phase 3 clinical trial in the United States.

In another recent study, reported as a pre-print in bioRxiv, Veesler’s team mapped dozens of distinct human antibodies capable of neutralizing SARS-CoV-2 by their ability to hit viral targets outside of the well-known spike protein [4]. Such discoveries may form the basis for new and promising combinations of antibodies to treat COVID-19 that won’t be disabled by concerning new variations in the SARS-CoV-2 spike protein. Perhaps, in the future, such therapeutic cocktails may include modified bat-inspired antibodies too.


[1] Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer. Walls AC, Tortorici MA, Bosch BJ, Frenz B, Rottier PJM, DiMaio F, Rey FA, Veesler D. Nature. 2016 Mar 3;531(7592):114-117.

[2] Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Walls AC, Park YJ, Tortorici MA, Wall A, McGuire AT, Veesler D. Cell. 2020 Apr 16;181(2):281-292.e6.

[3] Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody. Pinto D, Park YJ, Beltramello M, Veesler D, Cortil D, et al. Nature.18 May 2020 [Epub ahead of print]

[4] N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. McCallum M, Marco A, Lempp F, Tortorici MA, Pinto D, Walls AC, Whelan SPJ, Virgin HW, Corti D, Pizzuto MS, Veesler D, et al. bioRxiv. 2021 Jan 14.


COVID-19 Research (NIH)

Veesler Lab (University of Washington, Seattle)

Veesler Project Information (NIH RePORTER)

NIH Director’s Pioneer Award Program (Common Fund)

NIH Support: Common Fund; National Institute of Allergy and Infectious Diseases

Using R2D2 to Understand RNA Folding

Posted on by

If you love learning more about biology at a fundamental level, I have a great video for you! It simulates the 3D folding of RNA. RNA is a single stranded molecule, but it is still capable of forming internal loops that can be stabilized by base pairing, just like its famously double-stranded parent, DNA. Understanding more about RNA folding may be valuable in many different areas of biomedical research, including developing ways to help people with RNA-related diseases, such as certain cancers and neuromuscular disorders, and designing better mRNA vaccines against infectious disease threats (like COVID-19).

Because RNA folding starts even while an RNA is still being made in the cell, the process has proven hugely challenging to follow closely. An innovative solution, shown in this video, comes from the labs of NIH grantees Julius Lucks, Northwestern University, Evanston, IL, and Alan Chen, State University of New York at Albany. The team, led by graduate student Angela Yu and including several diehard Star Wars fans, realized that to visualize RNA folding they needed a technology platform that, like a Star Wars droid, is able to “see” things that others can’t. So, they created R2D2, which is short for Reconstructing RNA Dynamics from Data.

What’s so groundbreaking about the R2D2 approach, which was published recently in Molecular Cell, is that it combines experimental data on RNA folding at the nucleotide level with predictive algorithms at the atomic level to simulate RNA folding in ultra-slow motion [1]. While other computer simulations have been available for decades, they have lacked much-needed experimental data of this complex folding process to confirm their mathematical modeling.

As a gene is transcribed into RNA one building block, or nucleotide, at a time, the elongating RNA strand folds immediately before the whole molecule is fully assembled. But such folding can create a problem: the new strand can tie itself up into a knot-like structure that’s incompatible with the shape it needs to function in a cell.

To slip this knot, the cell has evolved immediate corrective pathways, or countermoves. In this R2D2 video, you can see one countermove called a toehold-mediated strand displacement. In this example, the maneuver is performed by an ancient molecule called a single recognition particle (SRP) RNA. Though SRP RNAs are found in all forms of life, this one comes from the bacterium Escherichia coli and is made up of 114 nucleotides.

The colors in this video highlight different domains of the RNA molecule, all at different stages in the folding process. Some (orange, turquoise) have already folded properly, while another domain (dark purple) is temporarily knotted. For this knotted domain to slip its knot, about 5 seconds into the video, another newly forming region (fuchsia) wiggles down to gain a “toehold.” About 9 seconds in, the temporarily knotted domain untangles and unwinds, and, finally, at about 23 seconds, the strand starts to get reconfigured into the shape it needs to do its job in the cell.

Why would evolution favor such a seemingly inefficient folding process? Well, it might not be inefficient as it first appears. In fact, as Chen noted, some nanotechnologists previously invented toehold displacement as a design principle for generating synthetic DNA and RNA circuits. Little did they know that nature may have scooped them many millennia ago!


[1] Computationally reconstructing cotranscriptional RNA folding from experimental data reveals rearrangement of non-naïve folding intermediates. Yu AM, Gasper PM Cheng L, Chen AA, Lucks JB, et. al. Molecular Cell 8, 1-14. 18 February 2021.


Ribonucleic Acid (RNA) (National Human Genome Research Institute/NIH)

Chen Lab (State University of New York at Albany)

Lucks Laboratory (Northwestern University, Evanston IL)

NIH Support: National Institute of General Medical Sciences; Common Fund

Next Page