Big Data Study Reveals Possible Subtypes of Type 2 Diabetes

Computational model

Caption: Computational model showing study participants with type 2 diabetes grouped into three subtypes, based on similarities in data contained in their electronic health records. Such information included age, gender (red/orange/yellow indicates females; blue/green, males), health history, and a range of routine laboratory and medical tests.
Credit: Dudley Lab, Icahn School of Medicine at Mount Sinai, New York

In recent years, there’s been a lot of talk about how “Big Data” stands to revolutionize biomedical research. Indeed, we’ve already gained many new insights into health and disease thanks to the power of new technologies to generate astonishing amounts of molecular data—DNA sequences, epigenetic marks, and metabolic signatures, to name a few. But what’s often overlooked is the value of combining all that with a more mundane type of Big Data: the vast trove of clinical information contained in electronic health records (EHRs).

In a recent study in Science Translational Medicine  [1], NIH-funded researchers demonstrated the tremendous potential of using EHRs, combined with genome-wide analysis, to learn more about a common, chronic disease—type 2 diabetes. Sifting through the EHR and genomic data of more than 11,000 volunteers, the researchers uncovered what appear to be three distinct subtypes of type 2 diabetes. Not only does this work have implications for efforts to reduce this leading cause of death and disability, it provides a sneak peek at the kind of discoveries that will be made possible by the new Precision Medicine Initiative’s national research cohort, which will enroll 1 million or more volunteers who agree to share their EHRs and genomic information.

In the latest study, a research team, led by Li Li and Joel Dudley of the Icahn School of Medicine at Mount Sinai, New York, started with EHR data from a racially and socioeconomically diverse cohort of 11,210 hospital outpatients. Of these volunteers, 2,551 had been diagnosed with type 2 diabetes, which is the most common form of diabetes.

Without focusing on any particular disease or condition, the researchers first sought to identify similarities among all participants, based on their lab results, blood pressure readings, height, weight, and other routine clinical information in their EHRs. The approach was similar to building a social network with connections forged, not on friendships, but medical information. When the resulting network was color-coded to reveal participants with type 2 diabetes, an interesting pattern emerged. Instead of being located in one, large clump on this “map,” the points indicating people with type 2 diabetes were actually grouped into several smaller, distinct clusters, suggesting the disease may have subtypes.

To take a closer look, the researchers rebuilt the network to include only participants with type 2 diabetes. They then reanalyzed the EHRs based on 73 clinical characteristics, including gender, glucose levels, and white blood cell counts. That work confirmed that there were three distinct subtypes of type 2 diabetes among study participants.

Type 2 diabetes is associated with potentially serious complications, including nerve damage, vision problems, kidney disease, and an increased risk for cardiovascular disease. The study found differences in the distribution of such complications among the three subtypes of type 2 diabetes. People with subtype 1 were more likely to be diagnosed with microvascular complications, including blindness/vision defects. This group of participants was also the youngest and most likely to be obese. People with subtype 2 showed the greatest risk for tuberculosis and cancer. As for subtype 3, such people were more likely than others to be HIV positive, have high blood pressure, and develop arterial blood clots. Both subtypes 2 and 3 displayed a greater risk for heart disease than subtype 1.

Next, the researchers performed a genomic analysis, identifying hundreds of genetic variants that were enriched non-randomly in each of the three groups. Interestingly, some of the genetic variants linked to each subgroup were associated with genetic pathways that appeared relevant to the distinguishing clinical features of those subgroups.

These findings suggest that some of the clinical differences observed between the different type 2 diabetes subtypes are rooted in lifestyle or environment, and others may be influenced by inherited factors. Still, more research needs to be done to replicate and expand upon these findings. The hope is that by gaining a more nuanced understanding of type 2 diabetes, we may be able to identify more precise ways of helping to detect, manage, and, ultimately, prevent this serious, chronic disease that currently affects about 1 out of every 11 Americans [2].


[1] Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, Bottinger EP, Dudley JT. Sci Transl Med. 2015 Oct 28;7(311):311ra174.

[2] Diabetes Latest Fact Sheet. 2014 June 17.  (Centers for Disease Control and Prevention)


Am I at Risk for Type 2 Diabetes?  Taking Steps to Lower Your Risk of Getting Diabetes (National Institute of Diabetes and Digestive and Kidney Diseases/NIH)

Electronic Medical Records and Genomics (eMERGE) Network (National Human Genome Research Institute/NIH)

Dudley Lab (Icahn School of Medicine at Mount Sinai, New York)

Precision Medicine Initiative (NIH)

NIH Support: National Institute of Diabetes and Digestive and Kidney Diseases; National Cancer Institute

3 thoughts on “Big Data Study Reveals Possible Subtypes of Type 2 Diabetes

  1. What group do patients with chronic renal disease fit in? Those who do not get end organ damage?

  2. Hi Dr. Collins,

    Digital health has increasingly played a bigger role in chronic diseases. Big data studies like the one that you mentioned and PMI are using technology ever more. What are your thoughts on possibilities around microfinancing small grants in digital health through the PMI program? This can allow big labs to start dedicating resources such as one or two students to a new, not very well defined exploratory project to test applications and services that can expose the wealth of data currently buried deep in software?

    I wanted to add that as a public comment to the PMI drafts but couldn’t make it to the meeting.

Comments are closed.