Skip to main content

community spread

Genome Data Help to Track COVID-19 Superspreading Event

Posted on by

Boston skyline
Credit: iStock/Chaay_Tee

When it comes to COVID-19, anyone, even without symptoms, can be a “superspreader” capable of unknowingly infecting a large number of people and causing a community outbreak. That’s why it is so important right now to wear masks when out in public and avoid large gatherings, especially those held indoors, where a superspreader can readily infect others with SARS-CoV-2, the virus responsible for COVID-19.

Driving home this point is a new NIH-funded study on the effects of just one superspreader event in the Boston area: an international biotech conference held in February, before the public health risks of COVID-19 had been fully realized [1]. Almost a hundred people were infected. But it didn’t end there.

In the study, the researchers sequenced close to 800 viral genomes, including cases from across the first wave of the epidemic in the Boston area. Using the fact that the viral genome changes in very subtle ways over time, they found that SARS-CoV-2 was actually introduced independently to the region more than 80 times, primarily from Europe and other parts of the United States. But the data also suggest that a single superspreading event at the biotech conference led to the infection of almost 20,000 people in the area, not to mention additional COVID-19 cases in other states and around the world.

The findings, posted on medRxiv as a pre-print, come from Bronwyn MacInnis and Pardis Sabeti at the Broad Institute of MIT and Harvard in Cambridge, MA, and their many close colleagues at Massachusetts General Hospital, the Massachusetts Department of Public Health, and the Boston Health Care for the Homeless Program. The initial focus of MacInnis, Sabeti, and their Broad colleagues has been on developing genome data and tools for surveillance of viruses and other infectious diseases in and viral outbreaks in West Africa, including Lassa fever and Ebola virus disease.

Closer to home, they’d expected to focus their attention on West Nile virus and tick-borne diseases. But, when the COVID-19 outbreak erupted, they were ready to pivot quickly to assist several Centers for Disease Control and Prevention (CDC) and state labs in the northeastern United States to use genomic tools to investigate local outbreaks.

It’s been clear from the beginning of the pandemic that COVID-19 cases often arise in clusters, linked to gatherings in places such as cruise ships, nursing homes, and homeless shelters. But the Broad Institute team and their colleagues realized, it’s difficult to see how extensively a virus spreads from such places into the wider community based on case counts alone.

Contact tracing certainly helps to track community spread of the virus. This surveillance strategy depends on quick, efficient identification of an infected individual. It follows up with the identification of all who’ve recently been in close contact with that person, allowing the contacts to self-quarantine and break the chain of transmission.

But contact tracing has its limitations. It’s not always possible to identify all the people that an infected person has been in recent contact with. Genome data, however, is particularly useful after the fact for connecting those dots to get a big picture view of viral transmission.

Here’s how it works: as SARS-CoV-2 spreads, the virus sometimes picks up a new mutation. Those tiny spelling changes in the viral genome usually have no effect on how the virus causes disease, but they do serve as distinct genomic fingerprints. Using those fingerprints to guide the way, researchers can trace the path the virus took through a community and beyond, identifying connections among cases that would be untrackable otherwise.

With this in mind, MacInnis and Sabeti’s team set out to help Boston’s public health officials understand just how the epidemic escalated so quickly in the Boston area, and just how much the February conference had contributed to community transmission of the virus. They also investigated other case clusters in the area, including within a skilled nursing facility, homeless shelters, and at Massachusetts General Hospital itself, to understand the spread of COVID-19 in these settings.

Based on contact tracing, officials had already connected approximately 90 cases of COVID-19 to the biotech conference, 28 of which were included in the original 772 viral genomes in this dataset. Based on the distinct genomic fingerprint carried by the 28 genomes, the researchers went on to discover that more than one-third of Boston area cases without any known link to the conference could indeed be traced back to the event.

When the researchers considered this proportion to the number of cases recorded in the region during the study, they extrapolated that the superspreader event led to nearly 20,000 cases in the Boston area. In contrast, the genome data show cases linked to another superspreader event that took place within a skilled nursing facility, while devastating to the residents, had much less of an impact on the surrounding community.

The analysis also uncovered some unexpected connections. The dataset showed that SARS-CoV-2 was brought to clients and staff at the Boston Health Care for the Homeless Program at least seven times. Remarkably, two of those introductions also traced back to the biotech conference. Researchers also found infections in Chelsea, Revere, and Everett, which were some of the hardest hit communities in the Boston area, that were connected to the original superspreading event.

There was some reassuring news about how precautions in hospitals are working. The researchers examined cases that were diagnosed among patients at Massachusetts General Hospital, raising concerns that the virus might have spread from one patient to another within the hospital. But the genome data show that those cases, in fact, weren’t part of the same transmission chain. They may have contracted the virus before they were hospitalized. Or it’s possible that staff may have inadvertently brought the virus into the hospital. But there was no patient-to-patient transmission.

Massachusetts is one of the states in which the COVID-19 pandemic had a particularly severe early impact. As such, these results present broadly applicable lessons for other states and urban areas about how the virus spreads. The findings highlight the value of genomic surveillance, along with standard contact tracing, for better understanding of viral transmission in our communities and improved prevention of future outbreaks.


[1] Phylogenetic analysis of SARS-CoV-2 in the Boston area highlights the role of recurrent importation and superspreading events. Lemieux J. et al. medRxiv. August 25, 2020.


Coronavirus (COVID-19) (NIH)

Bronwyn MacInnis (Broad Institute of Harvard and MIT, Cambridge, MA)

Sabeti Lab (Broad Institute of Harvard and MIT)

NIH Support: National Institute of Allergy and Infectious Diseases; National Human Genome Research Institute; National Institute of General Medical Sciences

Genome Data Help Track Community Spread of COVID-19

Posted on by

RNA Virus
Credit: iStock/vchal

Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.

Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.

But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”

When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.

The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.

The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.

Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.

They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.

What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.

All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.

The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.

Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.


[1] Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]


Coronavirus (COVID-19) (NIH)

Vitali Sintchenko (University of Sydney, Australia)

Structural Biology Points Way to Coronavirus Vaccine

Posted on by

Spike Protein on Novel Coronavirus
Caption: Atomic-level structure of the spike protein of the virus that causes COVID-19.
Credit: McLellan Lab, University of Texas at Austin

The recent COVID-19 outbreak of a novel type of coronavirus that began in China has prompted a massive global effort to contain and slow its spread. Despite those efforts, over the last month the virus has begun circulating outside of China in multiple countries and territories.

Cases have now appeared in the United States involving some affected individuals who haven’t traveled recently outside the country. They also have had no known contact with others who have recently arrived from China or other countries where the virus is spreading. The NIH and other U.S. public health agencies stand on high alert and have mobilized needed resources to help not only in its containment, but in the development of life-saving interventions.

On the treatment and prevention front, some encouraging news was recently reported. In record time, an NIH-funded team of researchers has created the first atomic-scale map of a promising protein target for vaccine development [1]. This is the so-called spike protein on the new coronavirus that causes COVID-19. As shown above, a portion of this spiky surface appendage (green) allows the virus to bind a receptor on human cells, causing other portions of the spike to fuse the viral and human cell membranes. This process is needed for the virus to gain entry into cells and infect them.

Preclinical studies in mice of a candidate vaccine based on this spike protein are already underway at NIH’s Vaccine Research Center (VRC), part of the National Institute of Allergy and Infectious Diseases (NIAID). An early-stage phase I clinical trial of this vaccine in people is expected to begin within weeks. But there will be many more steps after that to test safety and efficacy, and then to scale up to produce millions of doses. Even though this timetable will potentially break all previous speed records, a safe and effective vaccine will take at least another year to be ready for widespread deployment.

Coronaviruses are a large family of viruses, including some that cause “the common cold” in healthy humans. In fact, these viruses are found throughout the world and account for up to 30 percent of upper respiratory tract infections in adults.

This outbreak of COVID-19 marks the third time in recent years that a coronavirus has emerged to cause severe disease and death in some people. Earlier coronavirus outbreaks included SARS (severe acute respiratory syndrome), which emerged in late 2002 and disappeared two years later, and MERS (Middle East respiratory syndrome), which emerged in 2012 and continues to affect people in small numbers.

Soon after COVID-19 emerged, the new coronavirus, which is closely related to SARS, was recognized as its cause. NIH-funded researchers including Jason McLellan, an alumnus of the VRC and now at The University of Texas at Austin, were ready. They’d been studying coronaviruses in collaboration with NIAID investigators for years, with special attention to the spike proteins.

Just two weeks after Chinese scientists reported the first genome sequence of the virus [2], McLellan and his colleagues designed and produced samples of its spike protein. Importantly, his team had earlier developed a method to lock coronavirus spike proteins into a shape that makes them both easier to analyze structurally via the high-resolution imaging tool cryo-electron microscopy and to use in vaccine development efforts.

After locking the spike protein in the shape it takes before fusing with a human cell to infect it, the researchers reconstructed its atomic-scale 3D structural map in just 12 days. Their results, published in Science, confirm that the spike protein on the virus that causes COVID-19 is quite similar to that of its close relative, the SARS virus. It also appears to bind human cells more tightly than the SARS virus, which may help to explain why the new coronavirus appears to spread more easily from person to person, mainly by respiratory transmission.

McLellan’s team and his NIAID VRC counterparts also plan to use the stabilized spike protein as a probe to isolate naturally produced antibodies from people who’ve recovered from COVID-19. Such antibodies might form the basis of a treatment for people who’ve been exposed to the virus, such as health care workers.

The NIAID is now working with the biotechnology company Moderna, Cambridge, MA, to use the latest findings to develop a vaccine candidate using messenger RNA (mRNA), molecules that serve as templates for making proteins. The goal is to direct the body to produce a spike protein in such a way to elicit an immune response and the production of antibodies. An early clinical trial of the vaccine in people is expected to begin in the coming weeks. Other vaccine candidates are also in preclinical development.

Meanwhile, the first clinical trial in the U.S. to evaluate an experimental treatment for COVID-19 is already underway at the University of Nebraska Medical Center’s biocontainment unit [3]. The NIH-sponsored trial will evaluate the safety and efficacy of the experimental antiviral drug remdesivir in hospitalized adults diagnosed with COVID-19. The first participant is an American who was repatriated after being quarantined on the Diamond Princess cruise ship in Japan.

As noted, the risk of contracting COVID-19 in the United States is currently low, but the situation is changing rapidly. One of the features that makes the virus so challenging to stay in front of is its long latency period before the characteristic flu-like fever, cough, and shortness of breath manifest. In fact, people infected with the virus may not show any symptoms for up to two weeks, allowing them to pass it on to others in the meantime. You can track the reported cases in the United States on the Centers for Disease Control and Prevention’s website.

As the outbreak continues over the coming weeks and months, you can be certain that NIH and other U.S. public health organizations are working at full speed to understand this virus and to develop better diagnostics, treatments, and vaccines.


[1] Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh CL, Abiona O, Graham BS, McLellan JS. Science. 2020 Feb 19.

[2] A new coronavirus associated with human respiratory disease in China. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ. Nature. 2020 Feb 3.

[3] NIH clinical trial of remdesivir to treat COVID-19 begins. NIH News Release. Feb 25, 2020.


Coronaviruses (National Institute of Allergy and Infectious Diseases/NIH)

Coronavirus (COVID-19) (NIAID)

Coronavirus Disease 2019 (Centers for Disease Control and Prevention, Atlanta)

NIH Support: National Institute of Allergy and Infectious Diseases