National COVID Cohort Collaborative
Posted on by Lawrence Tabak, D.D.S., Ph.D.
The COVID-19 pandemic continues to present considerable public health challenges in the United States and around the globe. One of the most puzzling is why many people who get over an initial and often relatively mild COVID illness later develop new and potentially debilitating symptoms. These symptoms run the gamut including fatigue, shortness of breath, brain fog, anxiety, and gastrointestinal trouble.
People understandably want answers to help them manage this complex condition referred to as Long COVID syndrome. But because Long COVID is so variable from person to person, it’s extremely difficult to work backwards and determine what these people had in common that might have made them susceptible to Long COVID. The variability also makes it difficult to identify all those who have Long COVID, whether they realize it or not. But a recent study, published in the journal Lancet Digital Health, shows that a well-trained computer and its artificial intelligence can help.
Researchers found that computers, after scanning thousands of electronic health records (EHRs) from people with Long COVID, could reliably make the call. The results, though still preliminary and in need of further validation, point the way to developing a fast, easy-to-use computer algorithm to help determine whether a person with a positive COVID test is likely to battle Long COVID.
In this groundbreaking study, NIH-supported researchers led by Emily Pfaff, University of North Carolina, Chapel Hill, and Melissa Haendel, the University of Colorado Anschutz Medical Campus, Aurora, relied on machine learning. In machine learning, a computer sifts through vast amounts of data to look for patterns. One reason machine learning is so powerful is that it doesn’t require humans to tell the computer which features it should look for. As such, machine learning can pick up on subtle patterns that people would otherwise miss.
In this case, Pfaff, Haendel, and team decided to “train” their computer on EHRs from people who had reported a COVID-19 infection. (The records are de-identified to protect patient privacy.) The researchers found just what they needed in the National COVID Cohort Collaborative (N3C), a national, publicly available data resource sponsored by NIH’s National Center for Advancing Translational Sciences. It is part of NIH’s Researching COVID to Enhance Recovery (RECOVER) initiative, which aims to improve understanding of Long COVID.
The researchers defined a group of more than 1.5 million adults in N3C who either had been diagnosed with COVID-19 or had a record of a positive COVID-19 test at least 90 days prior. Next, they examined common features, including any doctor visits, diagnoses, or medications, from the group’s roughly 100,000 adults.
They fed that EHR data into a computer, along with health information from almost 600 patients who’d been seen at a Long COVID clinic. They developed three machine learning models: one to identify potential long COVID patients across the whole dataset and two others that focused separately on people who had or hadn’t been hospitalized.
All three models proved effective for identifying people with potential Long-COVID. Each of the models had an 85 percent or better discrimination threshold, indicating they are highly accurate. That’s important because, once researchers can identify those with Long COVID in a large database of people such as N3C, they can begin to ask and answer many critical questions about any differences in an individual’s risk factors or treatment that might explain why some get Long COVID and others don’t.
This new study is also an excellent example of N3C’s goal to assemble data from EHRs that enable researchers around the world to get rapid answers and seek effective interventions for COVID-19, including its long-term health effects. It’s also made important progress toward the urgent goal of the RECOVER initiative to identify people with or at risk for Long COVID who may be eligible to participate in clinical trials of promising new treatment approaches.
Long COVID remains a puzzling public health challenge. Another recent NIH study published in the journal Annals of Internal Medicine set out to identify people with symptoms of Long COVID, most of whom had recovered from mild-to-moderate COVID-19 . More than half had signs of Long COVID. But, despite extensive testing, the NIH researchers were unable to pinpoint any underlying cause of the Long COVID symptoms in most cases.
So if you’d like to help researchers solve this puzzle, RECOVER is now enrolling adults and kids—including those who have and have not had COVID—at more than 80 study sites around the country.
 Identifying who has long COVID in the USA: a machine learning approach using N3C data. Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, Dekermanjian JP, Jolley SE, Kahn MG, Kostka K, McMurry JA, Moffitt R, Walden A, Chute CG, Haendel MA; N3C Consortium. Lancet Digit Health. 2022 May 16:S2589-7500(22)00048-6.
 A longitudinal study of COVID-19 sequelae and immunity: baseline findings. Sneller MC, Liang CJ, Marques AR, Chung JY, Shanbhag SM, Fontana JR, Raza H, Okeke O, Dewar RL, Higgins BP, Tolstenko K, Kwan RW, Gittens KR, Seamon CA, McCormack G, Shaw JS, Okpali GM, Law M, Trihemasava K, Kennedy BD, Shi V, Justement JS, Buckner CM, Blazkova J, Moir S, Chun TW, Lane HC. Ann Intern Med. 2022 May 24:M21-4905.
COVID-19 Research (NIH)
National COVID Cohort Collaborative (N3C) (National Center for Advancing Translational Sciences/NIH)
Emily Pfaff (University of North Carolina, Chapel Hill)
Melissa Haendel (University of Colorado, Aurora)
NIH Support: National Center for Advancing Translational Sciences; National Institute of General Medical Sciences; National Institute of Allergy and Infectious Diseases