Skip to main content

big data

Meeting with Congressman Ro Khanna

Posted on by

Larry Tabak, Congressman Ro Khanna and Francis Collins at the NIH Clinical Center

We had a great visit with Congressman Ro Khanna (center) of California. Our discussion included recent advances in neuroscience, genomics, Big Data, and research on food allergies. NIH Deputy Director Larry Tabak (left) and I welcomed Congressman Khanna to the NIH Clinical Center on July 30, 2018.


Crowdsourcing 600 Years of Human History

Posted on by

Family Tree

Caption: A 6,000-person family tree, showing individuals spanning seven generations (green) and their marital links (red).
Credit: Columbia University, New York City

You may have worked on constructing your family tree, perhaps listing your ancestry back to your great-grandparents. Or with so many public records now available online, you may have even uncovered enough information to discover some unexpected long-lost relatives. Or maybe you’ve even submitted a DNA sample to one of the commercial sources to see what you could learn about your ancestry. But just how big can a family tree grow using today’s genealogical tools?

A recent paper offers a truly eye-opening answer. With permission to download the publicly available, online profiles of 86 million genealogy hobbyists, most of European descent, the researchers assembled more than 5 million family trees. The largest totaled more than 13 million people! By merging each tree from the crowd-sourced and public data, including the relatively modest 6,000-person seedling shown above, the researchers were able to go back 11 generations on average to the 15th century and the days of Christopher Columbus. Doubly exciting, these large datasets offer a powerful new resource to study human health, having already provided some novel insights into our family structures, genes, and longevity.


Creative Minds: Looking for Common Threads in Rare Diseases

Posted on by

Valerie Arboleda

Valerie Arboleda
Credit: UCLA/Margaret Sison Photography

Four years ago, Valerie Arboleda accomplished something most young medical geneticists rarely do. She helped discover a rare congenital disease now known as KAT6A syndrome [1]. From the original 10 cases to the more than 100 diagnosed today, KAT6A kids share a single altered gene that causes neuro-developmental delays, most prominently in learning to walk and talk, plus a spectrum of possible abnormalities involving the head, face, heart, and immune system.

Now, Arboleda wants to accomplish something even more groundbreaking. With a 2017 NIH Director’s Early Independence Award, she will develop ways to mine Big Data—the voluminous amounts of DNA sequence and other biological information now stored in public databases—to unearth new clues into the biology of rare disorders like KAT6A syndrome. If successful, Arboleda’s work could bring greater precision to the diagnosis and potentially treatment of Mendelian disorders, as well as provide greater clarity into the specific challenges that might lie ahead for an affected child.


Creative Minds: Building Better Computational Models of Common Disease

Posted on by

Hilary Finucane

Hilary Finucane

Not so long ago, Hilary Finucane was a talented young mathematician about to complete a master’s degree in theoretical computer science. As much as she enjoyed exploring pure mathematics, Finucane had begun having second thoughts about her career choice. She wanted to use her gift for numbers in a way that would have more real-world impact.

The solution to her dilemma was, literally, standing right by her side. Her husband Yakir Reshef, also a mathematician, was developing a new algorithm at the Broad Institute of MIT and Harvard, Cambridge, MA, to improve detection of unexpected associations in large data sets. So, Finucane helped the Broad team with modeling biomedical topics ranging from the gut microbiome to global health. That work led to her co-authoring a paper in the journal Science [1], providing a strong start to what’s shaping up to be a rewarding career in computational biology.


Cardiometabolic Disease: Big Data Tackles a Big Health Problem

Posted on by

Cardiometabolic risk lociMore and more studies are popping up that demonstrate the power of Big Data analyses to get at the underlying molecular pathology of some of our most common diseases. A great example, which may have flown a bit under the radar during the summer holidays, involves cardiometabolic disease. It’s an umbrella term for common vascular and metabolic conditions, including hypertension, impaired glucose and lipid metabolism, excess belly fat, and inflammation. All of these components of cardiometabolic disease can increase a person’s risk for a heart attack or stroke.

In the study, an international research team tapped into the power of genomic data to develop clearer pictures of the complex biocircuitry in seven types of vascular and metabolic tissue known to be affected by cardiometabolic disease: the liver, the heart’s aortic root, visceral abdominal fat, subcutaneous fat, internal mammary artery, skeletal muscle, and blood. The researchers found that while some circuits might regulate the level of gene expression in just one tissue, that’s often not the case. In fact, the researchers’ computational models show that such genetic circuitry can be organized into super networks that work together to influence how multiple tissues carry out fundamental life processes, such as metabolizing glucose or regulating lipid levels. When these networks are perturbed, perhaps by things like inherited variants that affect gene expression, or environmental influences such as a high-carb diet, sedentary lifestyle, the aging process, or infectious disease, the researchers’ modeling work suggests that multiple tissues can be affected, resulting in chronic, systemic disorders including cardiometabolic disease.


Next Page