Using AI to Find New Antibiotics Still a Work in Progress
Posted on by Lawrence Tabak, D.D.S., Ph.D.
Each year, more than 2.8 million people in the United States develop bacterial infections that don’t respond to treatment and sometimes turn life-threatening . Their infections are antibiotic-resistant, meaning the bacteria have changed in ways that allow them to withstand our current widely used arsenal of antibiotics. It’s a serious and growing health-care problem here and around the world. To fight back, doctors desperately need new antibiotics, including novel classes of drugs that bacteria haven’t seen and developed ways to resist.
Developing new antibiotics, however, involves much time, research, and expense. It’s also fraught with false leads. That’s why some researchers have turned to harnessing the predictive power of artificial intelligence (AI) in hopes of selecting the most promising leads faster and with greater precision.
It’s a potentially paradigm-shifting development in drug discovery, and a recent NIH-funded study, published in the journal Molecular Systems Biology, demonstrates AI’s potential to streamline the process of selecting future antibiotics . The results are also a bit sobering. They highlight the current limitations of one promising AI approach, showing that further refinement will still be needed to maximize its predictive capabilities.
These findings come from the lab of James Collins, Massachusetts Institute of Technology (MIT), Cambridge, and his recently launched Antibiotics-AI Project. His audacious goal is to develop seven new classes of antibiotics to treat seven of the world’s deadliest bacterial pathogens in just seven years. What makes this project so bold is that only two new classes of antibiotics have reached the market in the last 50 years!
In the latest study, Collins and his team looked to an AI program called AlphaFold2 . The name might ring a bell. AlphaFold’s AI-powered ability to predict protein structures was a finalist in Science Magazine’s 2020 Breakthrough of the Year. In fact, AlphaFold has been used already to predict the structures of more than 200 million proteins, or almost every known protein on the planet .
AlphaFold employs a deep learning approach that can predict most protein structures from their amino acid sequences about as well as more costly and time-consuming protein-mapping techniques.
In the deep learning models used to predict protein structure, computers are “trained” on existing data. As computers “learn” to understand complex relationships within the training material, they develop a model that can then be applied for making predictions of 3D protein structures from linear amino acid sequences without relying on new experiments in the lab.
Collins and his team hoped to combine AlphaFold with computer simulations commonly used in drug discovery as a way to predict interactions between essential bacterial proteins and antibacterial compounds. If it worked, researchers could then conduct virtual rapid screens of millions of new synthetic drug compounds targeting key bacterial proteins that existing antibiotics don’t. It would also enable the rapid development of antibiotics that work in novel ways, exactly what doctors need to treat antibiotic-resistant infections.
To test the strategy, Collins and his team focused first on the predicted structures of 296 essential proteins from the Escherichia coli bacterium as well as 218 antibacterial compounds. Their computer simulations then predicted how strongly any two molecules (essential protein and antibacterial) would bind together based on their shapes and physical properties.
It turned out that screening many antibacterial compounds against many potential targets in E. coli led to inaccurate predictions. For example, when comparing their computational predictions with actual interactions for 12 essential proteins measured in the lab, they found that their simulated model had about a 50:50 chance of being right. In other words, it couldn’t identify true interactions between drugs and proteins any better than random guessing.
They suspect one reason for their model’s poor performance is that the protein structures used to train the computer are fixed, not flexible and shifting physical configurations as happens in real life. To improve their success rate, they ran their predictions through additional machine-learning models that had been trained on data to help them “learn” how proteins and other molecules reconfigure themselves and interact. While this souped-up model got somewhat better results, the researchers report that they still aren’t good enough to identify promising new drugs and their protein targets.
What now? In future studies, the Collins lab will continue to incorporate and train the computers on even more biochemical and biophysical data to help with the predictive process. That’s why this study should be interpreted as an interim progress report on an area of science that will only get better with time.
But it’s also a sobering reminder that the quest to find new classes of antibiotics won’t be easy—even when aided by powerful AI approaches. We certainly aren’t there yet, but I’m confident that we will get there to give doctors new therapeutic weapons and turn back the rise in antibiotic-resistant infections.
 2019 Antibiotic resistance threats report. Centers for Disease Control and Prevention.
 Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Wong F, Krishnan A, Zheng EJ, Stark H, Manson AL, Earl AM, Jaakkola T, Collins JJ. Molecular Systems Biology. 2022 Sept 6. 18: e11081.
 Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, Kavukcuoglu K, Kohli P, Hassabis D., et al. Nature. 2021 Aug;596(7873):583-589.
 ‘The entire protein universe’: AI predicts shape of nearly every known protein. Callaway E. Nature. 2022 Aug;608(7921):15-16.
Antimicrobial (Drug) Resistance (National Institute of Allergy and Infectious Diseases/NIH)
Collins Lab (Massachusetts Institute of Technology, Cambridge)
The Antibiotics-AI Project, The Audacious Project (TED)
AlphaFold (Deep Mind, London, United Kingdom)
NIH Support: National Institute of Allergy and Infectious Diseases; National Institute of General Medical Sciences
Dynamic View of Spike Protein Reveals Prime Targets for COVID-19 Treatments
Posted on by Dr. Francis Collins
This striking portrait features the spike protein that crowns SARS-CoV-2, the coronavirus that causes COVID-19. This highly flexible protein has settled here into one of its many possible conformations during the process of docking onto a human cell before infecting it.
This portrait, however, isn’t painted on canvas. It was created on a computer screen from sophisticated 3D simulations of the spike protein in action. The aim was to map its many shape-shifting maneuvers accurately at the atomic level in hopes of detecting exploitable structural vulnerabilities to thwart the virus.
For example, notice the many chain-like structures (green) that adorn the protein’s surface (white). They are sugar molecules called glycans that are thought to shield the spike protein by sweeping away antibodies. Also notice areas (purple) that the simulation identified as the most-attractive targets for antibodies, based on their apparent lack of protection by those glycans.
This work, published recently in the journal PLoS Computational Biology , was performed by a German research team that included Mateusz Sikora, Max Planck Institute of Biophysics, Frankfurt. The researchers used a computer application called molecular dynamics (MD) simulation to power up and model the conformational changes in the spike protein on a time scale of a few microseconds. (A microsecond is 0.000001 second.)
The new simulations suggest that glycans act as a dynamic shield on the spike protein. They liken them to windshield wipers on a car. Rather than being fixed in space, those glycans sweep back and forth to protect more of the protein surface than initially meets the eye.
But just as wipers miss spots on a windshield that lie beyond their tips, glycans also miss spots of the protein just beyond their reach. It’s those spots that the researchers suggest might be prime targets on the spike protein that are especially promising for the design of future vaccines and therapeutic antibodies.
This same approach can now be applied to identifying weak spots in the coronavirus’s armor. It also may help researchers understand more fully the implications of newly emerging SARS-CoV-2 variants. The hope is that by capturing this devastating virus and its most critical proteins in action, we can continue to develop and improve upon vaccines and therapeutics.
 Computational epitope map of SARS-CoV-2 spike protein. Sikora M, von Bülow S, Blanc FEC, Gecht M, Covino R, Hummer G. PLoS Comput Biol. 2021 Apr 1;17(4):e1008790.
COVID-19 Research (NIH)
Mateusz Sikora (Max Planck Institute of Biophysics, Frankfurt, Germany)
The surprising properties of the coronavirus envelope (Interview with Mateusz Sikora), Scilog, November 16, 2020.
How COVID-19 Took Hold in North America and Europe
Posted on by Dr. Francis Collins
It was nearly 10 months ago on January 15 that a traveler returned home to the Seattle area after visiting family in Wuhan, China. A few days later, he started feeling poorly and became the first laboratory-confirmed case of coronavirus disease 2019 (COVID-19) in the United States. The rest is history.
However, new evidence published in the journal Science suggests that this first COVID-19 case on the West Coast didn’t snowball into the current epidemic. Instead, while public health officials in Washington state worked tirelessly and ultimately succeeded in containing its sustained transmission, the novel coronavirus slipped in via another individual about two weeks later, around the beginning of February.
COVID-19 is caused by the novel coronavirus SARS-CoV-2. Last winter, researchers sequenced the genetic material from the SARS-CoV-2 that was isolated from the returned Seattle traveler. While contact tracing didn’t identify any spread of this particular virus, dubbed WA1, questions arose when a genetically similar virus known as WA2 turned up in Washington state. Not long after, WA2-like viruses then appeared in California; British Columbia, Canada; and eventually 3,000 miles away in Connecticut. By mid-March, this WA2 cluster accounted for the vast majority—85 percent—of the cases in Washington state.
But was it possible that the WA2 cluster is a direct descendent of WA1? Did WA1 cause an unnoticed chain of transmission over several weeks, making the Seattle the epicenter of the outbreak in North America?
To answer those questions and others from around the globe, Michael Worobey, University of Arizona, Tucson, and his colleagues drew on multiple sources of information. These included data peretaining to viral genomes, airline passenger flow, and disease incidence in China’s Hubei Province and other places that likely would have influenced the probability that infected travelers were moving the virus around the globe. Based on all the evidence, the researchers simulated the outbreak more than 1,000 times on a computer over a two-month period, beginning on January 15 and assuming the epidemic started with WA1. And, not once did any of their simulated outbreaks match up to the actual genome data.
Those findings suggest to the researchers that the idea WA1 is responsible for all that came later is exceedingly unlikely. The evidence and simulations also appear to rule out the notion that the earliest cases in Washington state entered the United States by way of Canada. A deep dive into the data suggests a more likely scenario is that the outbreak was set off by one or more introductions of genetically similar viruses from China to the West Coast. Though we still don’t know exactly where, the Seattle area is the most likely site given the large number of WA2-like viruses sampled there.
Worobey’s team conducted a second analysis of the outbreak in Europe, and those simulations paint a similar picture to the one in the United States. The researchers conclude that the first known case of COVID-19 in Europe, arriving in Germany on January 20, led to a relatively small number of cases before being stamped out by aggressive testing and contact tracing efforts. That small, early outbreak probably didn’t spark the later one in Northern Italy, which eventually spread to the United States.
Their findings also show that the chain of transmission from China to Italy to New York City sparked outbreaks on the East Coast slightly later in February than those that spread from China directly to Washington state. It confirms that the Seattle outbreak was indeed the first, predating others on the East Coast and in California.
The findings in this report are yet another reminder of the value of integrating genome surveillance together with other sources of data when it comes to understanding, tracking, and containing the spread of COVID-19. They also show that swift and decisive public health measures to contain the virus worked when SARS-CoV-2 first entered the United States and Europe, and can now serve as models of containment.
Since the suffering and death from this pandemic continues in the United States, this historical reconstruction from early in 2020 is one more reminder that all of us have the opportunity and the responsibility to try to limit further spread. Wear your mask when you are outside the home; maintain physical distancing; wash your hands frequently; and don’t congregate indoors, where the risks are greatest. These lessons will enable us to better anticipate, prevent, and respond to additional outbreaks of COVID-19 or any other novel viruses that may arise in the future.
 The emergence of SARS-CoV-2 in Europe and North America. Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. Science. 2020 Sep 10:eabc8169 [Epub ahead of print]
Coronavirus (COVID-19) (NIH)
Michael Worobey (University of Arizona, Tucson)
NIH Support: National Institute of Allergy and Infectious Diseases; Fogarty International Center; National Library of Medicine