Creative Minds: Interpreting Your Genome
Posted on by Dr. Francis Collins
Just this year, we’ve reached the point where we can sequence an entire human genome for less than $1,000. That’s great news—and rather astounding, since the first human genome sequence (finished in 2003) cost an estimated $400,000,000! Does that mean we’ll be able to use each person’s unique genetic blueprint to guide his or her health care from cradle to grave? Maybe eventually, but it’s not quite as simple as it sounds.
Before we can use your genome to develop more personalized strategies for detecting, treating, and preventing disease, we need to be able to interpret the many variations that make your genome distinct from everybody else’s. While most of these variations are neither bad nor good, some raise the risk of particular diseases, and others serve to lower the risk. How do we figure out which is which?
Jay Shendure, an associate professor at the University of Washington in Seattle, has an audacious plan to figure this out, which is why he is among the 2013 recipients of the NIH Director’s Pioneer Award.
Shendure is already a pioneer when it comes to genomics. He helped to develop a faster, cheaper method of sequencing the genome that involves analyzing billions of DNA molecules simultaneously . He led a team that figured out how to decode the exome—the 1% of the genome that encodes all the proteins—to identify genes causing rare, inherited disorders . And, most recently, Shendure’s group showed that it’s possible to sequence the entire genome of a fetus from DNA harvested from the mother’s blood during pregnancy .
With his track record, it’s not surprising that Shendure is planning something even more challenging. Now, he wants to tackle one of the major obstacles hindering the use of personal genomic information in medicine: the identification and interpretation of genetic variants.
Using a very high tech method, Shendure plans to create in the laboratory all of the single letter variants of hundreds of major genes involved in rare and common diseases. Then, he will investigate the functional implications of each of these gene variants by studying the protein produced by each in cell lines and in animal models. From studying each protein, in vitro (in laboratory experiments) and in vivo (in animals), he should be able to figure out which variations cause disease, which ones raise the risk, and which can be ignored. This would be an impossible task if Shendure did this one at a time—but he intends to develop “multiplex” approaches that will examine many versions of a given protein within a single experiment. Like much of his other work, this project depends on genomic experimental methods that generate mountains of complex data that require extensive computational analysis.
This is an extremely challenging project. Let me give you an example from my own research that illustrates the complexity of this challenge. The gene that causes the disease cystic fibrosis is made of 4,440 chemical letters, which encodes a protein with 1,480 amino acids. A change of just one letter in a vulnerable part of this gene—called a “point mutation”—can produce a protein that leads to the complex multisystem disorder that we call cystic fibrosis. In contrast, single letter changes in different parts of that same gene may cause few or no health problems.
Among the first targets of Shendure’s project are the BRCA1 and BRCA2 genes. Each of these genes has been sequenced in tens of thousands of patients, and some variations have been found to increase the risk of early-onset breast and ovarian cancer. However, there are still many variations in these genes that are of uncertain clinical significance. Health care professionals cannot offer guidance to people with such variations because their health consequences are unknown. Shendure’s analysis could provide much-needed information about which variants increase the risk of disease and which are benign.
How does one set about becoming a research pioneer? Well, if you follow Shendure’s example, it seems that it might not hurt to play around a bit. When his parents bought him a PC at the age of 7, this kid from Ohio taught himself programming—so he could write computer games! Later, he began taking on projects that his mother gave him, such as employee management systems. Shendure took a brief break from computer science during his undergrad years at Princeton University in order to focus on molecular biology. But when he enrolled at Harvard University to earn his M.D. and Ph.D., Shendure’s love of computers resurfaced, and his skills proved invaluable as he worked with his adviser George Church on new sequencing technologies.
As a prelude to the new multiplex experiments, Shendure, in collaboration with Daniela Witten, a 2011 recipient of the NIH Director’s Early Independence Award, has developed a computer program called Combined Annotation-Dependent Depletion (CADD). This program uses various annotations—for example, data being generated by the ENCyclopedia Of DNA Elements (ENCODE) collaboration and the NIH Common Fund’s epigenomics projects—to estimate the potential pathogencity and disease severity of every possible variant in the human genome . Until Shendure can create all of the single letter variants in the lab, CADD will serve as a powerful interim tool for identifying and prioritizing which variants are most worthy of further study.
Shendure’s Pioneer project has the potential to change medicine. If he succeeds in interpreting the clinical implications of all these gene variants—figuring out which are harmful, and which are not—he will have overcome a major hurdle along the path to personalized, genome-based medicine. But, before Shendure’s interpretations can be integrated into genomic-era health care, there are many other challenges that lie ahead. Those include education of health care professionals and patients, regulatory matters, reimbursement issues, and so forth.
Shendure is remarkably optimistic about these issues and believes that if he can figure out his part, then the rest will fall into place. That approach worked for his computer games when he was 7—and this, he says, is much more satisfying.
 Accurate multiplex polony sequencing of an evolved bacterial genome. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM. Science. 2005 Sep 9;309(5741):1728-32. Epub 2005 Aug 4.
 Targeted capture and massively parallel sequencing of 12 human exomes. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Nature. 2009 Sep 10;461(7261):272-6.
 Noninvasive whole-genome sequencing of a human fetus. Kitzman JO, Snyder MW, Ventura M, Lewis AP, Qiu R, Simmons LE, Gammill HS, Rubens CE, Santillan DA, Murray JC, Tabor HK, Bamshad MJ, Eichler EE, Shendure J. Sci Transl Med. 2012 Jun 6;4(137):137ra76.
 A general framework for estimating the relative pathogenicity of human genetic variants. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. [Epub ahead of print].
High-Risk, High-Reward Research. (NIH Common Fund)
NIH Director’s Pioneer Award. (NIH Common Fund)
NIH Director’s Early Independence Award. (NIH Common Fund)
Jay Ashok Shendure, Interpreting Genetic Variants of Uncertain Significance, University of Washington, Seattle
Daniela Witten, High-Dimensional, Unsupervised Learning, With Applications to Genomics, University of Washington
NIH support: National Human Genome Research Institute; Common Fund