Each year, more than 2.8 million people in the United States develop bacterial infections that don’t respond to treatment and sometimes turn life-threatening . Their infections are antibiotic-resistant, meaning the bacteria have changed in ways that allow them to withstand our current widely used arsenal of antibiotics. It’s a serious and growing health-care problem here and around the world. To fight back, doctors desperately need new antibiotics, including novel classes of drugs that bacteria haven’t seen and developed ways to resist.
Developing new antibiotics, however, involves much time, research, and expense. It’s also fraught with false leads. That’s why some researchers have turned to harnessing the predictive power of artificial intelligence (AI) in hopes of selecting the most promising leads faster and with greater precision.
It’s a potentially paradigm-shifting development in drug discovery, and a recent NIH-funded study, published in the journal Molecular Systems Biology, demonstrates AI’s potential to streamline the process of selecting future antibiotics . The results are also a bit sobering. They highlight the current limitations of one promising AI approach, showing that further refinement will still be needed to maximize its predictive capabilities.
These findings come from the lab of James Collins, Massachusetts Institute of Technology (MIT), Cambridge, and his recently launched Antibiotics-AI Project. His audacious goal is to develop seven new classes of antibiotics to treat seven of the world’s deadliest bacterial pathogens in just seven years. What makes this project so bold is that only two new classes of antibiotics have reached the market in the last 50 years!
In the latest study, Collins and his team looked to an AI program called AlphaFold2 . The name might ring a bell. AlphaFold’s AI-powered ability to predict protein structures was a finalist in Science Magazine’s 2020 Breakthrough of the Year. In fact, AlphaFold has been used already to predict the structures of more than 200 million proteins, or almost every known protein on the planet .
AlphaFold employs a deep learning approach that can predict most protein structures from their amino acid sequences about as well as more costly and time-consuming protein-mapping techniques. In the deep learning models used to predict protein structure, computers are “trained” on existing data. As computers “learn” to understand complex relationships within the training material, they develop a model that can then be applied for making predictions of 3D protein structures from linear amino acid sequences without relying on new experiments in the lab.
Collins and his team hoped to combine AlphaFold with computer simulations commonly used in drug discovery as a way to predict interactions between essential bacterial proteins and antibacterial compounds. If it worked, researchers could then conduct virtual rapid screens of millions of new synthetic drug compounds targeting key bacterial proteins that existing antibiotics don’t. It would also enable the rapid development of antibiotics that work in novel ways, exactly what doctors need to treat antibiotic-resistant infections.
To test the strategy, Collins and his team focused first on the predicted structures of 296 essential proteins from the Escherichia coli bacterium as well as 218 antibacterial compounds. Their computer simulations then predicted how strongly any two molecules (essential protein and antibacterial) would bind together based on their shapes and physical properties.
It turned out that screening many antibacterial compounds against many potential targets in E. coli led to inaccurate predictions. For example, when comparing their computational predictions with actual interactions for 12 essential proteins measured in the lab, they found that their simulated model had about a 50:50 chance of being right. In other words, it couldn’t identify true interactions between drugs and proteins any better than random guessing.
They suspect one reason for their model’s poor performance is that the protein structures used to train the computer are fixed, not flexible and shifting physical configurations as happens in real life. To improve their success rate, they ran their predictions through additional machine-learning models that had been trained on data to help them “learn” how proteins and other molecules reconfigure themselves and interact. While this souped-up model got somewhat better results, the researchers report that they still aren’t good enough to identify promising new drugs and their protein targets.
What now? In future studies, the Collins lab will continue to incorporate and train the computers on even more biochemical and biophysical data to help with the predictive process. That’s why this study should be interpreted as an interim progress report on an area of science that will only get better with time.
But it’s also a sobering reminder that the quest to find new classes of antibiotics won’t be easy—even when aided by powerful AI approaches. We certainly aren’t there yet, but I’m confident that we will get there to give doctors new therapeutic weapons and turn back the rise in antibiotic-resistant infections.
Hi everyone, I’m Larry Tabak. I’ve served as NIH’s Principal Deputy Director for over 11 years, and I will be the acting NIH director until a new permanent director is named. In my new role, my day-to-day responsibilities will certainly increase, but I promise to carve out time to blog about some of the latest research progress on COVID-19 and any other areas of science that catch my eye.
I’ve also invited the directors of NIH’s Institutes and Centers (ICs) to join me in the blogosphere and write about some of the cool science in their research portfolios. I will publish a couple of posts to start, then turn the blog over to our first IC director. From there, I envision alternating between posts from me and from various IC directors. That way, we’ll cover a broad array of NIH science and the tremendous opportunities now being pursued in biomedical research.
Since I’m up first, let’s start where the NIH Director’s Blog usually begins each year: by taking a look back at Science’s Breakthroughs of 2021. The breakthroughs were formally announced in December near the height of the holiday bustle. In case you missed the announcement, the biomedical sciences accounted for six of the journal Science’s 10 breakthroughs. Here, I’ll focus on four biomedical breakthroughs, the ones that NIH has played some role in advancing, starting with Science’s editorial and People’s Choice top-prize winner:
Breakthrough of the Year: AI-Powered Predictions of Protein Structure
The biochemist Christian Anfinsen, who had a distinguished career at NIH, shared the 1972 Nobel Prize in Chemistry, for work suggesting that the biochemical interactions among the amino acid building blocks of proteins were responsible for pulling them into the final shapes that are essential to their functions. In his Nobel acceptance speech, Anfinsen also made a bold prediction: one day it would be possible to determine the three-dimensional structure of any protein based on its amino acid sequence alone. Now, with advances in applying artificial intelligence to solve biological problems—Anfinsen’s bold prediction has been realized.
But getting there wasn’t easy. Every two years since 1994, research teams from around the world have gathered to compete against each other in developing computational methods for predicting protein structures from sequences alone. A score of 90 or above means that a predicted structure is extremely close to what’s known from more time-consuming work in the lab. In the early days, teams more often finished under 60.
In 2020, a London-based company called DeepMind made a leap with their entry called AlphaFold. Their deep learning approach—which took advantage of 170,000 proteins with known structures—most often scored above 90, meaning it could solve most protein structures about as well as more time-consuming and costly experimental protein-mapping techniques. (AlphaFold was one of Science’s runner-up breakthroughs last year.)
This year, the NIH-funded lab of David Baker and Minkyung Baek, University of Washington, Seattle, Institute for Protein Design, published that their artificial intelligence approach, dubbed RoseTTAFold, could accurately predict 3D protein structures from amino acid sequences with only a fraction of the computational processing power and time that AlphaFold required . They immediately applied it to solve hundreds of new protein structures, including many poorly known human proteins with important implications for human health.
The DeepMind and RoseTTAFold scientists continue to solve more and more proteins [1,2], both alone and in complex with other proteins. The code is now freely available for use by researchers anywhere in the world. In one timely example, AlphaFold helped to predict the structural changes in spike proteins of SARS-CoV-2 variants Delta and Omicron . This ability to predict protein structures, first envisioned all those years ago, now promises to speed fundamental new discoveries and the development of new ways to treat and prevent any number of diseases, making it this year’s Breakthrough of the Year.
Anti-Viral Pills for COVID-19
The development of the first vaccines to protect against COVID-19 topped Science’s 2020 breakthroughs. This year, we’ve also seen important progress in treating COVID-19, including the development of anti-viral pills.
First, there was the announcement in October of interim data from Merck, Kenilworth, NJ, and Ridgeback Biotherapeutics, Miami, FL, of a significant reduction in hospitalizations for those taking the anti-viral drug molnupiravir  (originally developed with an NIH grant to Emory University, Atlanta). Soon after came reports of a Pfizer anti-viral pill that might target SARS-CoV-2, the novel coronavirus that causes COVID-19, even more effectively. Trial results show that, when taken within three days of developing COVID-19 symptoms, the pill reduced the risk of hospitalization or death in adults at high risk of progressing to severe illness by 89 percent .
On December 22, the Food and Drug Administration (FDA) granted Emergency Use Authorization (EUA) for Pfizer’s Paxlovid to treat mild-to-moderate COVID-19 in people age 12 and up at high risk for progressing to severe illness, making it the first available pill to treat COVID-19 . The following day, the FDA granted an EUA for Merck’s molnupiravir to treat mild-to-moderate COVID-19 in unvaccinated, high-risk adults for whom other treatment options aren’t accessible or recommended, based on a final analysis showing a 30 percent reduction in hospitalization or death .
Additional promising anti-viral pills for COVID-19 are currently in development. For example, a recent NIH-funded preclinical study suggests that a drug related to molnupiravir, known as 4’-fluorouridine, might serve as a broad spectrum anti-viral with potential to treat infections with SARS-CoV-2 as well as respiratory syncytial virus (RSV) .
Monoclonal antibodies are artificially produced versions of the most powerful antibodies found in animal or human immune systems, made in large quantities for therapeutic use in the lab. Until recently, this approach had primarily been put to work in the fight against conditions including cancer, asthma, and autoimmune diseases. That changed in 2021 with success using monoclonal antibodies against infections with SARS-CoV-2 as well as respiratory syncytial virus (RSV), human immunodeficiency virus (HIV), and other infectious diseases. This earned them a prominent spot among Science’s breakthroughs of 2021.
Monoclonal antibodies delivered via intravenous infusions continue to play an important role in saving lives during the pandemic. But, there’s still room for improvement, including new formulations highlighted on the blog last year that might be much easier to deliver.
CRISPR Fixes Genes Inside the Body
One of the most promising areas of research in recent years has been gene editing, including CRISPR/Cas9, for fixing misspellings in genes to treat or even cure many conditions. This year has certainly been no exception.
CRISPR is a highly precise gene-editing system that uses guide RNA molecules to direct a scissor-like Cas9 enzyme to just the right spot in the genome to cut out or correct disease-causing misspellings. Science highlights a small study reported in The New England Journal of Medicine by researchers at Intellia Therapeutics, Cambridge, MA, and Regeneron Pharmaceuticals, Tarrytown, NY, in which six people with hereditary transthyretin (TTR) amyloidosis, a condition in which TTR proteins build up and damage the heart and nerves, received an infusion of guide RNA and CRISPR RNA encased in tiny balls of fat . The goal was for the liver to take them up, allowing Cas9 to cut and disable the TTR gene. Four weeks later, blood levels of TTR had dropped by at least half.
In another study not yet published, researchers at Editas Medicine, Cambridge, MA, injected a benign virus carrying a CRISPR gene-editing system into the eyes of six people with an inherited vision disorder called Leber congenital amaurosis 10. The goal was to remove extra DNA responsible for disrupting a critical gene expressed in the eye. A few months later, two of the six patients could sense more light, enabling one of them to navigate a dimly lit obstacle course . This work builds on earlier gene transfer studies begun more than a decade ago at NIH’s National Eye Institute.
Last year, in a research collaboration that included former NIH Director Francis Collins’s lab at the National Human Genome Research Institute (NHGRI), we also saw encouraging early evidence in mice that another type of gene editing, called DNA base editing, might one day correct Hutchinson-Gilford Progeria Syndrome, a rare genetic condition that causes rapid premature aging. Preclinical work has even suggested that gene-editing tools might help deliver long-lasting pain relief. The technology keeps getting better, too. This isn’t the first time that gene-editing advances have landed on Science’s annual Breakthrough of the Year list, and it surely won’t be the last.
The year 2021 was a difficult one as the pandemic continued in the U.S. and across the globe, taking far too many lives far too soon. But through it all, science has been relentless in seeking and finding life-saving answers, from the rapid development of highly effective COVID-19 vaccines to the breakthroughs highlighted above.
As this list also attests, the search for answers has progressed impressively in other research areas during these difficult times. These groundbreaking discoveries are something in which we can all take pride—even as they encourage us to look forward to even bigger breakthroughs in 2022. Happy New Year!
 CRISPR-Cas9 in vivo gene editing for transthyretin amyloidosis. Gillmore JD, Gane E, Taubel J, Kao J, Fontana M, Maitland ML, Seitzer J, O’Connell D, Walsh KR, Wood K, Phillips J, Xu Y, Amaral A, Boyd AP, Cehelsky JE, McKee MD, Schiermeier A, Harari O, Murphy A, Kyratsous CA, Zambrowicz B, Soltys R, Gutstein DE, Leonard J, Sepp-Lorenzino L, Lebwohl D. N Engl J Med. 2021 Aug 5;385(6):493-502.
Proteins are the workhorses of the cell. Mapping the precise shapes of the most important of these workhorses helps to unlock their life-supporting functions or, in the case of disease, potential for dysfunction. While the amino acid sequence of a protein provides the basis for its 3D structure, deducing the atom-by-atom map from principles of quantum mechanics has been beyond the ability of computer programs—until now.
In a recent study in the journal Science, researchers reported they have developed artificial intelligence approaches for predicting the three-dimensional structure of proteins in record time, based solely on their one-dimensional amino acid sequences . This groundbreaking approach will not only aid researchers in the lab, but guide drug developers in coming up with safer and more effective ways to treat and prevent disease.
This new NIH-supported advance is now freely available to scientists around the world. In fact, it has already helped to solve especially challenging protein structures in cases where experimental data were lacking and other modeling methods hadn’t been enough to get a final answer. It also can now provide key structural information about proteins for which more time-consuming and costly imaging data are not yet available.
The new work comes from a group led by David Baker and Minkyung Baek, University of Washington, Seattle, Institute for Protein Design. Over the course of the pandemic, Baker’s team has been working hard to design promising COVID-19 therapeutics. They’ve also been working to design proteins that might offer promising new ways to treat cancer and other conditions. As part of this effort, they’ve developed new computational approaches for determining precisely how a chain of amino acids, which are the building blocks of proteins, will fold up in space to form a finished protein.
But the ability to predict a protein’s precise structure or shape from its sequence alone had proven to be a difficult problem to solve despite decades of effort. In search of a solution, research teams from around the world have come together every two years since 1994 at the Critical Assessment of Structure Prediction (CASP) meetings. At these gatherings, teams compete against each other with the goal of developing computational methods and software capable of predicting any of nature’s 200 million or more protein structures from sequences alone with the greatest accuracy.
Last year, a London-based company called DeepMind shook up the structural biology world with their entry into CASP called AlphaFold. (AlphaFold was one of Science’s 2020 Breakthroughs of the Year.) They showed that their artificial intelligence approach—which took advantage of the 170,000 proteins with known structures in a reiterative process called deep learning—could predict protein structure with amazing accuracy. In fact, it could predict most protein structures almost as accurately as other high-resolution protein mapping techniques, including today’s go-to strategies of X-ray crystallography and cryo-EM.
The DeepMind performance showed what was possible, but because the advances were made by a world-leading deep learning company, the details on how it worked weren’t made publicly available at the time. The findings left Baker, Baek, and others eager to learn more and to see if they could replicate the impressive predictive ability of AlphaFold outside of such a well-resourced company.
In the new work, Baker and Baek’s team has made stunning progress—using only a fraction of the computational processing power and time required by AlphaFold. The new software, called RoseTTAFold, also relies on a deep learning approach. In deep learning, computers look for patterns in large collections of data. As they begin to recognize complex relationships, some connections in the network are strengthened while others are weakened. The finished network is typically composed of multiple information-processing layers, which operate on the data to return a result—in this case, a protein structure.
Given the complexity of the problem, instead of using a single neural network, RoseTTAFold relies on three. The three-track neural network integrates and simultaneously processes one-dimensional protein sequence information, two-dimensional information about the distance between amino acids, and three-dimensional atomic structure all at once. Information from these separate tracks flows back and forth to generate accurate models of proteins rapidly from sequence information alone, including structures in complex with other proteins.
As soon as the researchers had what they thought was a reasonable working approach to solve protein structures, they began sharing it with their structural biologist colleagues. In many cases, it became immediately clear that RoseTTAFold worked remarkably well. What’s more, it has been put to work to solve challenging structural biology problems that had vexed scientists for many years with earlier methods.
RoseTTAFold already has solved hundreds of new protein structures, many of which represent poorly understood human proteins. The 3D rendering of a complex showing a human protein called interleukin-12 in complex with its receptor (above image) is just one example. The researchers have generated other structures directly relevant to human health, including some that are related to lipid metabolism, inflammatory conditions, and cancer. The program is now available on the web and has been downloaded by dozens of research teams around the world.
Cryo-EM and other experimental mapping methods will remain essential to solve protein structures in the lab. But with the artificial intelligence advances demonstrated by RoseTTAFold and AlphaFold, which has now also been released in an open-source version and reported in the journal Nature , researchers now can make the critical protein structure predictions at their desktops. This newfound ability will be a boon to basic science studies and has great potential to speed life-saving therapeutic advances.
At the close of every year, editors and writers at the journal Science review the progress that’s been made in all fields of science—from anthropology to zoology—to select the biggest advance of the past 12 months. In most cases, this Breakthrough of the Year is as tough to predict as the Oscar for Best Picture. Not in 2020. In a year filled with a multitude of challenges posed by the emergence of the deadly coronavirus disease 2019 (COVID-2019), the breakthrough was the development of the first vaccines to protect against this pandemic that’s already claimed the lives of more than 360,000 Americans.
In keeping with its annual tradition, Science also selected nine runner-up breakthroughs. This impressive list includes at least three areas that involved efforts supported by NIH: therapeutic applications of gene editing, basic research understanding HIV, and scientists speaking up for diversity. Here’s a quick rundown of all the pioneering advances in biomedical research, both NIH and non-NIH funded:
Shots of Hope. A lot of things happened in 2020 that were unprecedented. At the top of the list was the rapid development of COVID-19 vaccines. Public and private researchers accomplished in 10 months what normally takes about 8 years to produce two vaccines for public use, with more on the way in 2021. In my more than 25 years at NIH, I’ve never encountered such a willingness among researchers to set aside their other concerns and gather around the same table to get the job done fast, safely, and efficiently for the world.
It’s also pretty amazing that the first two conditionally approved vaccines from Pfizer and Moderna were found to be more than 90 percent effective at protecting people from infection with SARS-CoV-2, the coronavirus that causes COVID-19. Both are innovative messenger RNA (mRNA) vaccines, a new approach to vaccination.
For this type of vaccine, the centerpiece is a small, non-infectious snippet of mRNA that encodes the instructions to make the spike protein that crowns the outer surface of SARS-CoV-2. When the mRNA is injected into a shoulder muscle, cells there will follow the encoded instructions and temporarily make copies of this signature viral protein. As the immune system detects these copies, it spurs the production of antibodies and helps the body remember how to fend off SARS-CoV-2 should the real thing be encountered.
It also can’t be understated that both mRNA vaccines—one developed by Pfizer and the other by Moderna in conjunction with NIH’s National Institute of Allergy and Infectious Diseases—were rigorously evaluated in clinical trials. Detailed data were posted online and discussed in all-day meetings of an FDA Advisory Committee, open to the public. In fact, given the high stakes, the level of review probably was more scientifically rigorous than ever.
First CRISPR Cures: One of the most promising areas of research now underway involves gene editing. These tools, still relatively new, hold the potential to fix gene misspellings—and potentially cure—a wide range of genetic diseases that were once to be out of reach. Much of the research focus has centered on CRISPR/Cas9. This highly precise gene-editing system relies on guide RNA molecules to direct a scissor-like Cas9 enzyme to just the right spot in the genome to cut out or correct a disease-causing misspelling.
In late 2020, a team of researchers in the United States and Europe succeeded for the first time in using CRISPR to treat 10 people with sickle cell disease and transfusion-dependent beta thalassemia. As published in the New England Journal of Medicine, several months after this non-heritable treatment, all patients no longer needed frequent blood transfusions and are living pain free .
The researchers tested a one-time treatment in which they removed bone marrow from each patient, modified the blood-forming hematopoietic stem cells outside the body using CRISPR, and then reinfused them into the body. To prepare for receiving the corrected cells, patients were given toxic bone marrow ablation therapy, in order to make room for the corrected cells. The result: the modified stem cells were reprogrammed to switch back to making ample amounts of a healthy form of hemoglobin that their bodies produced in the womb. While the treatment is still risky, complex, and prohibitively expensive, this work is an impressive start for more breakthroughs to come using gene editing technologies. NIH, including its Somatic Cell Genome Editing program, continues to push the technology to accelerate progress and make gene editing cures for many disorders simpler and less toxic.
Scientists Speak Up for Diversity: The year 2020 will be remembered not only for COVID-19, but also for the very public and inescapable evidence of the persistence of racial discrimination in the United States. Triggered by the killing of George Floyd and other similar events, Americans were forced to come to grips with the fact that our society does not provide equal opportunity and justice for all. And that applies to the scientific community as well.
Science thrives in safe, diverse, and inclusive research environments. It suffers when racism and bigotry find a home to stifle diversity—and community for all—in the sciences. For the nation’s leading science institutions, there is a place and a calling to encourage diversity in the scientific workplace and provide the resources to let it flourish to everyone’s benefit.
For those of us at NIH, last year’s peaceful protests and hashtags were noticed and taken to heart. That’s one of the many reasons why we will continue to strengthen our commitment to building a culturally diverse, inclusive workplace. For example, we have established the NIH Equity Committee. It allows for the systematic tracking and evaluation of diversity and inclusion metrics for the intramural research program for each NIH institute and center. There is also the recently founded Distinguished Scholars Program, which aims to increase the diversity of tenure track investigators at NIH. Recently, NIH also announced that it will provide support to institutions to recruit diverse groups or “cohorts” of early-stage research faculty and prepare them to thrive as NIH-funded researchers.
AI Disentangles Protein Folding: Proteins, which are the workhorses of the cell, are made up of long, interconnected strings of amino acids that fold into a wide variety of 3D shapes. Understanding the precise shape of a protein facilitates efforts to figure out its function, its potential role in a disease, and even how to target it with therapies. To gain such understanding, researchers often try to predict a protein’s precise 3D chemical structure using basic principles of physics—including quantum mechanics. But while nature does this in real time zillions of times a day, computational approaches have not been able to do this—until now.
Of the roughly 170,000 proteins mapped so far, most have had their structures deciphered using powerful imaging techniques such as x-ray crystallography and cryo–electron microscopy (cryo-EM). But researchers estimate that there are at least 200 million proteins in nature, and, as amazing as these imaging techniques are, they are laborious, and it can take many months or years to solve 3D structure of a single protein. So, a breakthrough certainly was needed!
In 2020, researchers with the company Deep Mind, London, developed an artificial intelligence (AI) program that rapidly predicts most protein structures as accurately as x-ray crystallography and cryo-EM can map them . The AI program, called AlphaFold, predicts a protein’s structure by computationally modeling the amino acid interactions that govern its 3D shape.
Getting there wasn’t easy. While a complete de novo calculation of protein structure still seemed out of reach, investigators reasoned that they could kick start the modeling if known structures were provided as a training set to the AI program. Utilizing a computer network built around 128 machine learning processors, the AlphaFold system was created by first focusing on the 170,000 proteins with known structures in a reiterative process called deep learning. The process, which is inspired by the way neural networks in the human brain process information, enables computers to look for patterns in large collections of data. In this case, AlphaFold learned to predict the underlying physical structure of a protein within a matter of days. This breakthrough has the potential to accelerate the fields of structural biology and protein research, fueling progress throughout the sciences.
How Elite Controllers Keep HIV at Bay: The term “elite controller” might make some people think of video game whizzes. But here, it refers to the less than 1 percent of people living with human immunodeficiency virus (HIV) who’ve somehow stayed healthy for years without taking antiretroviral drugs. In 2020, a team of NIH-supported researchers figured out why this is so.
In a study of 64 elite controllers, published in the journal Nature, the team discovered a link between their good health and where the virus has inserted itself in their genomes . When a cell transcribes a gene where HIV has settled, this so-called “provirus,” can produce more virus to infect other cells. But if it settles in a part of a chromosome that rarely gets transcribed, sometimes called a gene desert, the provirus is stuck with no way to replicate. Although this discovery won’t cure HIV/AIDS, it points to a new direction for developing better treatment strategies.
In closing, 2020 presented more than its share of personal and social challenges. Among those challenges was a flood of misinformation about COVID-19 that confused and divided many communities and even families. That’s why the editors and writers at Science singled out “a second pandemic of misinformation” as its Breakdown of the Year. This divisiveness should concern all of us greatly, as COVID-19 cases continue to soar around the country and our healthcare gets stretched to the breaking point. I hope and pray that we will all find a way to come together, both in science and in society, as we move forward in 2021.