big data
All of Us: Release of Nearly 100,000 Whole Genome Sequences Sets Stage for New Discoveries
Posted on by Joshua Denny, M.D., M.S., and Lawrence Tabak, D.D.S., Ph.D.

Nearly four years ago, NIH opened national enrollment for the All of Us Research Program. This historic program is building a vital research community within the United States of at least 1 million participant partners from all backgrounds. Its unifying goal is to advance precision medicine, an emerging form of health care tailored specifically to the individual, not the average patient as is now often the case. As part of this historic effort, many participants have offered DNA samples for whole genome sequencing, which provides information about almost all of an individual’s genetic makeup.
Earlier this month, the All of Us Research Program hit an important milestone. We released the first set of nearly 100,000 whole genome sequences from our participant partners. The sequences are stored in the All of Us Researcher Workbench, a powerful, cloud-based analytics platform that makes these data broadly accessible to registered researchers.
The All of Us Research Program and its many participant partners are leading the way toward more equitable representation in medical research. About half of this new genomic information comes from people who self-identify with a racial or ethnic minority group. That’s extremely important because, until now, over 90 percent of participants in large genomic studies were of European descent. This lack of diversity has had huge impacts—deepening health disparities and hindering scientific discovery from fully benefiting everyone.
The Researcher Workbench also contains information from many of the participants’ electronic health records, Fitbit devices, and survey responses. Another neat feature is that the platform links to data from the U.S. Census Bureau’s American Community Survey to provide more details about the communities where participants live.
This unique and comprehensive combination of data will be key in transforming our understanding of health and disease. For example, given the vast amount of data and diversity in the Researcher Workbench, new diseases are undoubtedly waiting to be uncovered and defined. Many new genetic variants are also waiting to be identified that may better predict disease risk and response to treatment.
To speed up the discovery process, these data are being made available, both widely and wisely. To protect participants’ privacy, the program has removed all direct identifiers from the data and upholds strict requirements for researchers seeking access. Already, more than 1,500 scientists across the United States have gained access to the Researcher Workbench through their institutions after completing training and agreeing to the program’s strict rules for responsible use. Some of these researchers are already making discoveries that promote precision medicine, such as finding ways to predict how to best to prevent vision loss in patients with glaucoma.
Beyond making genomic data available for research, All of Us participants have the opportunity to receive their personal DNA results, at no cost to them. So far, the program has offered genetic ancestry and trait results to more than 100,000 participants. Plans are underway to begin sharing health-related DNA results on hereditary disease risk and medication-gene interactions later this year.
This first release of genomic data is a huge milestone for the program and for health research more broadly, but it’s also just the start. The program’s genome centers continue to generate the genomic data and process about 5,000 additional participant DNA samples every week.
The ultimate goal is to gather health data from at least 1 million or more people living in the United States, and there’s plenty of time to join the effort. Whether you would like to contribute your own DNA and health information, engage in research, or support the All of Us Research Program as a partner, it’s easy to get involved. By taking part in this historic program, you can help to build a better and more equitable future for health research and precision medicine.
Note: Joshua Denny, M.D., M.S., is the Chief Executive Officer of NIH’s All of Us Research Program.
Links:
All of Us Research Program (NIH)
Join All of Us (NIH)
Preventing Glaucoma Vision Loss with ‘Big Data’
Posted on by Dr. Francis Collins

Each morning, more than 2 million Americans start their rise-and-shine routine by remembering to take their eye drops. The drops treat their open-angle glaucoma, the most-common form of the disease, caused by obstructed drainage of fluid where the eye’s cornea and iris meet. The slow drainage increases fluid pressure at the front of the eye. Meanwhile, at the back of the eye, fluid pushes on the optic nerve, causing its bundled fibers to fray and leading to gradual loss of side vision.
For many, the eye drops help to lower intraocular pressure and prevent vision loss. But for others, the drops aren’t sufficient and their intraocular pressure remains high. Such people will need next-level care, possibly including eye surgery, to reopen the clogged drainage ducts and slow this disease that disproportionately affects older adults and African Americans over age 40.

Credit: University of California San Diego
Sally Baxter, a physician-scientist with expertise in ophthalmology at the University of California, San Diego (UCSD), wants to learn how to predict who is at greatest risk for serious vision loss from open-angle and other forms of glaucoma. That way, they can receive more aggressive early care to protect their vision from this second-leading cause of blindness in the U.S..
To pursue this challenging research goal, Baxter has received a 2020 NIH Director’s Early Independence Award. Her research will build on the clinical observation that people with glaucoma frequently battle other chronic health problems, such as high blood pressure, diabetes, and heart disease. To learn more about how these and other chronic health conditions might influence glaucoma outcomes, Baxter has begun mining a rich source of data: electronic health records (EHRs).
In an earlier study of patients at UCSD, Baxter showed that EHR data helped to predict which people would need glaucoma surgery within the next six months [1]. The finding suggested that the EHR, especially information on a patient’s blood pressure and medications, could predict the risk for worsening glaucoma.
In her NIH-supported work, she’s already extended this earlier “Big Data” finding by analyzing data from more than 1,200 people with glaucoma who participate in NIH’s All of Us Research Program [2]. With consent from the participants, Baxter used their EHRs to train a computer to find telltale patterns within the data and then predict with 80 to 99 percent accuracy who would later require eye surgery.
The findings confirm that machine learning approaches and EHR data can indeed help in managing people with glaucoma. That’s true even when the EHR data don’t contain any information specific to a person’s eye health.
In fact, the work of Baxter and other groups have pointed to an especially important role for blood pressure in shaping glaucoma outcomes. Hoping to explore this lead further with the support of her Early Independence Award, Baxter also will enroll patients in a study to test whether blood-pressure monitoring smart watches can add important predictive information on glaucoma progression. By combining round-the-clock blood pressure data with EHR data, she hopes to predict glaucoma progression with even greater precision. She’s also exploring innovative ways to track whether people with glaucoma use their eye drops as prescribed, which is another important predictor of the risk of irreversible vision loss [3].
Glaucoma research continues to undergo great progress. This progress ranges from basic research to the development of new treatments and high-resolution imaging technologies to improve diagnostics. But Baxter’s quest to develop practical clinical tools hold great promise, too, and hopefully will help one day to protect the vision of millions of people with glaucoma around the world.
References:
[1] Machine learning-based predictive modeling of surgical intervention in glaucoma using systemic data from electronic health records. Baxter SL, Marks C, Kuo TT, Ohno-Machado L, Weinreb RN. Am J Ophthalmol. 2019 Dec; 208:30-40.
[2] Predictive analytics for glaucoma using data from the All of Us Research Program. Baxter SL, Saseendrakumar BR, Paul P, Kim J, Bonomi L, Kuo TT, Loperena R, Ratsimbazafy F, Boerwinkle E, Cicek M, Clark CR, Cohn E, Gebo K, Mayo K, Mockrin S, Schully SD, Ramirez A, Ohno-Machado L; All of Us Research Program Investigators. Am J Ophthalmol. 2021 Jul;227:74-86.
[3] Smart electronic eyedrop bottle for unobtrusive monitoring of glaucoma medication adherence. Aguilar-Rivera M, Erudaitius DT, Wu VM, Tantiongloc JC, Kang DY, Coleman TP, Baxter SL, Weinreb RN. Sensors (Basel). 2020 Apr 30;20(9):2570.
Links:
Glaucoma (National Eye Institute/NIH)
All of Us Research Program (NIH)
Video: Sally Baxter (All of Us Research Program)
Sally Baxter (University of California San Diego)
Baxter Project Information (NIH RePORTER)
NIH Director’s Early Independence Award (Common Fund)
NIH Support: Common Fund
Using Artificial Intelligence to Catch Irregular Heartbeats
Posted on by Dr. Francis Collins

Thanks to advances in wearable health technologies, it’s now possible for people to monitor their heart rhythms at home for days, weeks, or even months via wireless electrocardiogram (EKG) patches. In fact, my Apple Watch makes it possible to record a real-time EKG whenever I want. (I’m glad to say I am in normal sinus rhythm.)
For true medical benefit, however, the challenge lies in analyzing the vast amounts of data—often hundreds of hours worth per person—to distinguish reliably between harmless rhythm irregularities and potentially life-threatening problems. Now, NIH-funded researchers have found that artificial intelligence (AI) can help.
A powerful computer “studied” more than 90,000 EKG recordings, from which it “learned” to recognize patterns, form rules, and apply them accurately to future EKG readings. The computer became so “smart” that it could classify 10 different types of irregular heart rhythms, including atrial fibrillation (AFib). In fact, after just seven months of training, the computer-devised algorithm was as good—and in some cases even better than—cardiology experts at making the correct diagnostic call.
EKG tests measure electrical impulses in the heart, which signal the heart muscle to contract and pump blood to the rest of the body. The precise, wave-like features of the electrical impulses allow doctors to determine whether a person’s heart is beating normally.
For example, in people with AFib, the heart’s upper chambers (the atria) contract rapidly and unpredictably, causing the ventricles (the main heart muscle) to contract irregularly rather than in a steady rhythm. This is an important arrhythmia to detect, even if it may only be present occasionally over many days of monitoring. That’s not always easy to do with current methods.
Here’s where the team, led by computer scientists Awni Hannun and Andrew Ng, Stanford University, Palo Alto, CA, saw an AI opportunity. As published in Nature Medicine, the Stanford team started by assembling a large EKG dataset from more than 53,000 people [1]. The data included various forms of arrhythmia and normal heart rhythms from people who had worn the FDA-approved Zio patch for about two weeks.
The Zio patch is a 2-by-5-inch adhesive patch, worn much like a bandage, on the upper left side of the chest. It’s water resistant and can be kept on around the clock while a person sleeps, exercises, or takes a shower. The wireless patch continuously monitors heart rhythms, storing EKG data for later analysis.
The Stanford researchers looked to machine learning to process all the EKG data. In machine learning, computers rely on large datasets of examples in order to learn how to perform a given task. The accuracy improves as the machine “sees” more data.
But the team’s real interest was in utilizing a special class of machine learning called deep neural networks, or deep learning. Deep learning is inspired by how our own brain’s neural networks process information, learning to focus on some details but not others.
In deep learning, computers look for patterns in data. As they begin to “see” complex relationships, some connections in the network are strengthened while others are weakened. The network is typically composed of multiple information-processing layers, which operate on the data and compute increasingly complex and abstract representations.
Those data reach the final output layer, which acts as a classifier, assigning each bit of data to a particular category or, in the case of the EKG readings, a diagnosis. In this way, computers can learn to analyze and sort highly complex data using both more obvious and hidden features.
Ultimately, the computer in the new study could differentiate between EKG readings representing 10 different arrhythmias as well as a normal heart rhythm. It could also tell the difference between irregular heart rhythms and background “noise” caused by interference of one kind or another, such as a jostled or disconnected Zio patch.
For validation, the computer attempted to assign a diagnosis to the EKG readings of 328 additional patients. Independently, several expert cardiologists also read those EKGs and reached a consensus diagnosis for each patient. In almost all cases, the computer’s diagnosis agreed with the consensus of the cardiologists. The computer also made its calls much faster.
Next, the researchers compared the computer’s diagnoses to those of six individual cardiologists who weren’t part of the original consensus committee. And, the results show that the computer actually outperformed these experienced cardiologists!
The findings suggest that artificial intelligence can be used to improve the accuracy and efficiency of EKG readings. In fact, Hannun reports that iRhythm Technologies, maker of the Zio patch, has already incorporated the algorithm into the interpretation now being used to analyze data from real patients.
As impressive as this is, we are surely just at the beginning of AI applications to health and health care. In recognition of the opportunities ahead, NIH has recently launched a working group on AI to explore ways to make the best use of existing data, and harness the potential of artificial intelligence and machine learning to advance biomedical research and the practice of medicine.
Meanwhile, more and more impressive NIH-supported research featuring AI is being published. In my next blog, I’ll highlight a recent paper that uses AI to make a real difference for cervical cancer, particularly in low resource settings.
Reference:
[1] Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY.
Nat Med. 2019 Jan;25(1):65-69.
Links:
Arrhythmia (National Heart, Lung, and Blood Institute/NIH)
Video: Artificial Intelligence: Collecting Data to Maximize Potential (NIH)
Andrew Ng (Palo Alto, CA)
NIH Support: National Heart, Lung, and Blood Institute
Meeting with Congressman Ro Khanna
Posted on by Dr. Francis Collins

We had a great visit with Congressman Ro Khanna (center) of California. Our discussion included recent advances in neuroscience, genomics, Big Data, and research on food allergies. NIH Deputy Director Larry Tabak (left) and I welcomed Congressman Khanna to the NIH Clinical Center on July 30, 2018.
Crowdsourcing 600 Years of Human History
Posted on by Dr. Francis Collins

Caption: A 6,000-person family tree, showing individuals spanning seven generations (green) and their marital links (red).
Credit: Columbia University, New York City
You may have worked on constructing your family tree, perhaps listing your ancestry back to your great-grandparents. Or with so many public records now available online, you may have even uncovered enough information to discover some unexpected long-lost relatives. Or maybe you’ve even submitted a DNA sample to one of the commercial sources to see what you could learn about your ancestry. But just how big can a family tree grow using today’s genealogical tools?
A recent paper offers a truly eye-opening answer. With permission to download the publicly available, online profiles of 86 million genealogy hobbyists, most of European descent, the researchers assembled more than 5 million family trees. The largest totaled more than 13 million people! By merging each tree from the crowd-sourced and public data, including the relatively modest 6,000-person seedling shown above, the researchers were able to go back 11 generations on average to the 15th century and the days of Christopher Columbus. Doubly exciting, these large datasets offer a powerful new resource to study human health, having already provided some novel insights into our family structures, genes, and longevity.
Next Page