genomic surveillance
South Africa Study Shows Power of Genomic Surveillance Amid COVID-19 Pandemic
Posted on by Dr. Francis Collins

Considerable research is underway around the world to monitor the spread of new variants of SARS-CoV-2, the coronavirus that causes COVID-19. That includes the variant B.1.351 (also known as 501Y.V2), which emerged in South Africa towards the end of 2020 [1, 2]. Public health officials in South Africa have been busy tracing the spread of this genomic variant and others across their country. And a new analysis of such data reveals that dozens of distinct coronavirus variants were already circulating in South Africa well before the appearance of B.1.351.
A study of more than 1,300 near-whole genome sequences of SARS-CoV-2, published recently in the journal Nature Medicine, shows there were in fact at least 42 SARS-CoV-2 variants spreading in South Africa within the pandemic’s first six months in that country [3]. Among them were 16 variants that had never before been described. Most of the single-letter changes carried by these variants didn’t change the virus in important ways and didn’t rise to significant frequency. But the findings come as another critical reminder of the value of genomic surveillance to track the spread of SARS-CoV-2 to identify any potentially worrisome new variants and to inform measures to get this devastating pandemic under control.
SARS-CoV-2 was first detected in South Africa on March 5, 2020, in a traveler returning from Italy. By November 2020, despite considerable efforts to slow the spread, more than 785,000 people in South Africa were infected, accounting for about half of all reported COVID-19 cases on the African continent.
Recognizing the importance of genomic surveillance, researchers led by Houriiyah Tegally and Tulio de Oliveira, University of KwaZulu-Natal, Durban, South Africa, wasted no time in producing 1,365 near-complete SARS-CoV-2 genomes by mid-September, near the end of the coronavirus’s first peak in the country. Those samples had been collected in hundreds of clinics over the course of the pandemic in eight of South Africa’s nine provinces, offering a broad picture of the spread and emergence of new variants across the country.
The data revealed three main variants, dubbed B.1.1.54, B.1.1.56, and C.1, that were responsible for 42 percent of all the infections in South Africa’s first wave. Of the 16 newly described variants, most carried single-letter changes that haven’t been identified in other countries.
The majority of changes were what scientists refer to as “synonymous,” meaning that they don’t change the structure or function of any of the virus’s essential proteins. The exception is the newly identified C.1, which includes 16 single-letter changes compared to the original sequence from Wuhan, China. One of those 16 changes swaps a single amino acid for another on SARS-CoV-2’s spike protein. That’s notable because the spike protein is a key target of antibodies and also is essential to the virus’s ability to infect human cells.
In fact, four of the most prevalent variants in South Africa all carry this same mutation. The researchers also saw three other changes that would alter the spike protein in different ways, although the significance of these for viral spread and our efforts to stop it isn’t yet clear.
Importantly, the data show that the bulk of introductions to South Africa happened early on, before lockdown and travel restrictions were implemented in late March. Subsequently, much of the spread within South Africa stemmed from hospital outbreaks. For example, an outbreak of the C.1 variant in the North West Province in April ultimately led this variant to become the most geographically widespread in South Africa by the end of August. Meanwhile, an earlier identified South African-specific variant, B.1.106, first identified in April, vanished altogether after outbreaks were controlled in KwaZulu-Natal Province, where the researchers reside.
Genomic surveillance has remarkable power for understanding the evolution of SARS-CoV-2 and tracking the dynamics of its transmission. Tegally and de Oliveira’s team notes that this type of intensive genomic surveillance now can be used on a large scale across Africa and around the world to identify new variants of SARS-CoV-2 and to develop timely measures to control the spread of the virus. They’re now working with the African CDC to expand genomic surveillance across Africa [4].
Such genomic surveillance was crucial in the subsequent identification of the B.1.351 variant in South Africa that we’ve been hearing so much about, with its potential to evade our current treatments and vaccines. By picking up on such concerning mutations early through genomic surveillance and understanding how the virus is spreading over time and space, the hope is we’ll be better informed and more adept in our efforts to get this pandemic under control.
References:
[1] Emerging SARS-CoV-2 variants. Centers for Disease Control and Prevention.
[2] Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Bhiman J, Williamson C, de Oliveira T, et al. medRxiv 2020 Dec 22.
[3] Sixteen novel lineages of SARS-CoV-2 in South Africa. Tegally H, Wilkinson E, Lessells RJ, Giandhari J, Pillay S, Msomi N, Mlisana K, Bhiman JN, von Gottberg A, Walaza S, Fonseca V, Allam M, Ismail A, Glass AJ, Engelbrecht S, Van Zyl G, Preiser W, Williamson C, Petruccione F, Sigal A, Gazy I, Hardie D, Hsiao NY, Martin D, York D, Goedhals D, San EJ, Giovanetti M, Lourenço J, Alcantara LCJ, de Oliveira T. Nat Med. 2021 Feb 2.
[4] Accelerating genomics-based surveillance for COVID-19 response in Africa. Tessema SK, Inzaule SC, Christoffels A, Kebede Y, de Oliveira T, Ouma AEO, Happi CT, Nkengasong JN.Lancet Microbe. 2020 Aug 18.
Links:
COVID-19 Research (NIH)
Houriiyah Tegally (University of KwaZulu-Natal, Durban, South Africa)
Tulio de Oliveira (University of KwaZulu-Natal)
How COVID-19 Took Hold in North America and Europe
Posted on by Dr. Francis Collins

It was nearly 10 months ago on January 15 that a traveler returned home to the Seattle area after visiting family in Wuhan, China. A few days later, he started feeling poorly and became the first laboratory-confirmed case of coronavirus disease 2019 (COVID-19) in the United States. The rest is history.
However, new evidence published in the journal Science suggests that this first COVID-19 case on the West Coast didn’t snowball into the current epidemic. Instead, while public health officials in Washington state worked tirelessly and ultimately succeeded in containing its sustained transmission, the novel coronavirus slipped in via another individual about two weeks later, around the beginning of February.
COVID-19 is caused by the novel coronavirus SARS-CoV-2. Last winter, researchers sequenced the genetic material from the SARS-CoV-2 that was isolated from the returned Seattle traveler. While contact tracing didn’t identify any spread of this particular virus, dubbed WA1, questions arose when a genetically similar virus known as WA2 turned up in Washington state. Not long after, WA2-like viruses then appeared in California; British Columbia, Canada; and eventually 3,000 miles away in Connecticut. By mid-March, this WA2 cluster accounted for the vast majority—85 percent—of the cases in Washington state.
But was it possible that the WA2 cluster is a direct descendent of WA1? Did WA1 cause an unnoticed chain of transmission over several weeks, making the Seattle the epicenter of the outbreak in North America?
To answer those questions and others from around the globe, Michael Worobey, University of Arizona, Tucson, and his colleagues drew on multiple sources of information. These included data peretaining to viral genomes, airline passenger flow, and disease incidence in China’s Hubei Province and other places that likely would have influenced the probability that infected travelers were moving the virus around the globe. Based on all the evidence, the researchers simulated the outbreak more than 1,000 times on a computer over a two-month period, beginning on January 15 and assuming the epidemic started with WA1. And, not once did any of their simulated outbreaks match up to the actual genome data.
Those findings suggest to the researchers that the idea WA1 is responsible for all that came later is exceedingly unlikely. The evidence and simulations also appear to rule out the notion that the earliest cases in Washington state entered the United States by way of Canada. A deep dive into the data suggests a more likely scenario is that the outbreak was set off by one or more introductions of genetically similar viruses from China to the West Coast. Though we still don’t know exactly where, the Seattle area is the most likely site given the large number of WA2-like viruses sampled there.
Worobey’s team conducted a second analysis of the outbreak in Europe, and those simulations paint a similar picture to the one in the United States. The researchers conclude that the first known case of COVID-19 in Europe, arriving in Germany on January 20, led to a relatively small number of cases before being stamped out by aggressive testing and contact tracing efforts. That small, early outbreak probably didn’t spark the later one in Northern Italy, which eventually spread to the United States.
Their findings also show that the chain of transmission from China to Italy to New York City sparked outbreaks on the East Coast slightly later in February than those that spread from China directly to Washington state. It confirms that the Seattle outbreak was indeed the first, predating others on the East Coast and in California.
The findings in this report are yet another reminder of the value of integrating genome surveillance together with other sources of data when it comes to understanding, tracking, and containing the spread of COVID-19. They also show that swift and decisive public health measures to contain the virus worked when SARS-CoV-2 first entered the United States and Europe, and can now serve as models of containment.
Since the suffering and death from this pandemic continues in the United States, this historical reconstruction from early in 2020 is one more reminder that all of us have the opportunity and the responsibility to try to limit further spread. Wear your mask when you are outside the home; maintain physical distancing; wash your hands frequently; and don’t congregate indoors, where the risks are greatest. These lessons will enable us to better anticipate, prevent, and respond to additional outbreaks of COVID-19 or any other novel viruses that may arise in the future.
Reference:
[1] The emergence of SARS-CoV-2 in Europe and North America. Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. Science. 2020 Sep 10:eabc8169 [Epub ahead of print]
Links:
Coronavirus (COVID-19) (NIH)
Michael Worobey (University of Arizona, Tucson)
NIH Support: National Institute of Allergy and Infectious Diseases; Fogarty International Center; National Library of Medicine
Genome Data Help Track Community Spread of COVID-19
Posted on by Dr. Francis Collins

Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.
Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.
But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”
When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.
The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.
The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.
Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.
They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.
What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.
All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.
The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.
Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.
Reference:
[1] Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]
Links:
Coronavirus (COVID-19) (NIH)
Vitali Sintchenko (University of Sydney, Australia)