Did you know that the NIH’s National Library of Medicine (NLM) has been serving science and society since 1836? From its humble beginning as a small collection of books in the library of the U.S. Army Surgeon General’s office, NLM has grown not only to become the world’s largest biomedical library, but a leader in biomedical informatics and computational health data science research.
Think of NLM as a door through which you pass to connect with health data, literature, medical and scientific information, expertise, and sophisticated mathematical models or images that describe a clinical problem. This intersection of information, people, and technology allows NLM to foster discovery. NLM does so by ensuring that scientists, clinicians, librarians, patients, and the public have access to biomedical information 24 hours a day, 7 days a week.
The NLM also supports two research efforts: the Division of Extramural Programs (EP) and Intramural Research Program (IRP). Both programs are accelerating advances in biomedical informatics, data science, computational biology, and computational health. One of EP’s notable investments is focused on advancing artificial intelligence (AI) methods and reimagining how health care is delivered with the power of AI.
With support from NLM, Corey Lester and his colleagues at the University of Michigan College of Pharmacy, Ann Arbor, MI, are using AI to assist in pill verification, a standard procedure in pharmacies across the land. They want to help pharmacists avoid dangerous and costly dispensing errors. To do so, Lester is using AI to develop a real-time computer vision model. It views pills inside of a medication bottle, accurately identifies them, and determines that they are the correct or incorrect contents.
The IRP develops and applies computational methods and approaches to a broad range of information problems in biology, biomedicine, and human health. The IRP also offers intramural training opportunities and supports other training aimed at pre-baccalaureate to postdoctoral students and professionals.
The NLM principal investigators use biological data to advance computer algorithms and connect relationships between any level of biological organization and health conditions. They also use computational health sciences to focus on clinical information processing and analyze clinical data, assess clinical outcomes, and set health data standards.
NLM investigator Sameer Antani is collaborating with researchers in other NIH institutes to explore how AI can help us understand oral cancer, echocardiography, and pediatric tuberculosis. His research also is examining how images can be mined for data to predict the causes and outcomes of conditions. Examples of Antani’s work can be found in mobile radiology vehicles, which allow professionals to take chest X-rays (right) and screen for HIV and tuberculosis using software containing algorithms developed in his lab.
For AI to have its full impact, more algorithms and approaches that harness the power of data are needed. That’s why NLM supports hundreds of other intramural and extramural scientists who are addressing challenging health and biomedical problems. The NLM-funded research is focused on how AI can help people stay healthy through early disease detection, disease management, and clinical and treatment decision-making—all leading to the ultimate goal of helping people live healthier and happier lives.
The NLM is proud to lead the way in the use of AI to accelerate discovery and transform health care. Want to learn more? Follow me on Twitter. Or, you can follow my blog, NLM Musings from the Mezzanine and receive periodic NLM research updates.
I would like to thank Valerie Florance, Acting Scientific Director of NLM IRP, and Richard Palmer, Acting Director of NLM Division of EP, for their assistance with this post.
National Library of Medicine (National Library of Medicine/NIH)
Video: Using Machine Intelligence to Prevent Medication Dispensing Errors (NLM Funding Spotlight)
Video: Sameer Antani and Artificial Intelligence (NLM)
Principal Investigators (NLM)
Note: Dr. Lawrence Tabak, who performs the duties of the NIH Director, has asked the heads of NIH’s Institutes and Centers (ICs) to contribute occasional guest posts to the blog to highlight some of the interesting science that they support and conduct. This is the 20th in the series of NIH IC guest posts that will run until a new permanent NIH director is in place.
Posted on by Dr. Francis Collins
Contact tracing, a term that’s been in the news lately, is a crucial tool for controlling the spread of SARS-CoV-2, the novel coronavirus that causes COVID-19. It depends on quick, efficient identification of an infected individual, followed by identification of all who’ve recently been in close contact with that person so the contacts can self-quarantine to break the chain of transmission.
Properly carried out, contact tracing can be extremely effective. It can also be extremely challenging when battling a stealth virus like SARS-CoV-2, especially when the virus is spreading rapidly.
But there are some innovative ways to enhance contact tracing. In a new study, published in the journal Nature Medicine, researchers in Australia demonstrate one of them: assembling genomic data about the virus to assist contact tracing efforts. This so-called genomic surveillance builds on the idea that when the virus is passed from person to person over a few months, it can acquire random variations in the sequence of its genetic material. These unique variations serve as distinctive genomic “fingerprints.”
When COVID-19 starts circulating in a community, researchers can fingerprint the genomes of SARS-CoV-2 obtained from newly infected people. This timely information helps to tell whether that particular virus has been spreading locally for a while or has just arrived from another part of the world. It can also show where the viral subtype has been spreading through a community or, best of all, when it has stopped circulating.
The recent study was led by Vitali Sintchenko at the University of Sydney. His team worked in parallel with contact tracers at the Ministry of Health in New South Wales (NSW), Australia’s most populous state, to contain the initial SARS-CoV-2 outbreak from late January through March 2020.
The team performed genomic surveillance, using sequencing data obtained within about five days, to understand local transmission patterns. They also wanted to compare what they learned from genomic surveillance to predictions made by a sophisticated computer model of how the virus might spread amongst Australia’s approximately 24 million citizens.
Of the 1,617 known cases in Sydney over the three-month study period, researchers sequenced viral genomes from 209 (13 percent) of them. By comparing those sequences to others circulating overseas, they found a lot of sequence diversity, indicating that the novel coronavirus had been introduced to Sydney many times from many places all over the world.
They then used the sequencing data to better understand how the virus was spreading through the local community. Their analysis found that the 209 cases under study included 27 distinct genomic fingerprints. Based on the close similarity of their genomic fingerprints, a significant share of the COVID-19 cases appeared to have stemmed from the direct spread of the virus among people in specific places or facilities.
What was most striking was that the genomic evidence helped to provide information that contact tracers otherwise would have lacked. For instance, the genomic data allowed the researchers to identify previously unsuspected links between certain cases of COVID-19. It also helped to confirm other links that were otherwise unclear.
All told, researchers used the genomic evidence to cluster almost 40 percent of COVID-19 cases (81 of 209) for which the community-based data alone couldn’t identify a known contact source for the infection. That included 26 cases in which an individual who’d recently arrived in Australia from overseas spread the infection to others who hadn’t traveled. The genomic information also helped to identify likely sources in the community for another 15 locally acquired cases that weren’t known based on community data.
The researchers compared their genome surveillance data to SARS-CoV-2’s expected spread as modeled in a computer simulation based on travel to and from Australia over the time period in question. Because the study involved just 13 percent of all known COVID-19 cases in Sydney between late January through March, it’s not surprising that the genomic data presents an incomplete picture, detecting only a portion of the possible chains of transmission expected in the simulation model.
Nevertheless, the findings demonstrate the value of genomic data for tracking the virus and pinpointing exactly where in the community it is spreading. This can help to fill in important gaps in the community-based data that contact tracers often use. Even more exciting, by combining traditional contact tracing, genomic surveillance, and mathematical modeling with other emerging tools at our disposal, it may be possible to get a clearer picture of the movement of SARS-CoV-2 and put more targeted public health measures in place to slow and eventually stop its deadly spread.
 Revealing COVID-19 transmission in Australia by SARS-CoV-2 genome sequencing and agent-based modeling. Rockett RJ, Arnott A, Lam C, et al. Nat Med. 2020 July 9. [Published online ahead of print]
Coronavirus (COVID-19) (NIH)
Vitali Sintchenko (University of Sydney, Australia)
Posted on by Dr. Francis Collins
You may have worked on constructing your family tree, perhaps listing your ancestry back to your great-grandparents. Or with so many public records now available online, you may have even uncovered enough information to discover some unexpected long-lost relatives. Or maybe you’ve even submitted a DNA sample to one of the commercial sources to see what you could learn about your ancestry. But just how big can a family tree grow using today’s genealogical tools?
A recent paper offers a truly eye-opening answer. With permission to download the publicly available, online profiles of 86 million genealogy hobbyists, most of European descent, the researchers assembled more than 5 million family trees. The largest totaled more than 13 million people! By merging each tree from the crowd-sourced and public data, including the relatively modest 6,000-person seedling shown above, the researchers were able to go back 11 generations on average to the 15th century and the days of Christopher Columbus. Doubly exciting, these large datasets offer a powerful new resource to study human health, having already provided some novel insights into our family structures, genes, and longevity.