Creative Minds: Interpreting Your Genome
Posted on by Dr. Francis Collins
Just this year, we’ve reached the point where we can sequence an entire human genome for less than $1,000. That’s great news—and rather astounding, since the first human genome sequence (finished in 2003) cost an estimated $400,000,000! Does that mean we’ll be able to use each person’s unique genetic blueprint to guide his or her health care from cradle to grave? Maybe eventually, but it’s not quite as simple as it sounds.
Before we can use your genome to develop more personalized strategies for detecting, treating, and preventing disease, we need to be able to interpret the many variations that make your genome distinct from everybody else’s. While most of these variations are neither bad nor good, some raise the risk of particular diseases, and others serve to lower the risk. How do we figure out which is which?
Jay Shendure, an associate professor at the University of Washington in Seattle, has an audacious plan to figure this out, which is why he is among the 2013 recipients of the NIH Director’s Pioneer Award.
Shendure is already a pioneer when it comes to genomics. He helped to develop a faster, cheaper method of sequencing the genome that involves analyzing billions of DNA molecules simultaneously . He led a team that figured out how to decode the exome—the 1% of the genome that encodes all the proteins—to identify genes causing rare, inherited disorders . And, most recently, Shendure’s group showed that it’s possible to sequence the entire genome of a fetus from DNA harvested from the mother’s blood during pregnancy .
With his track record, it’s not surprising that Shendure is planning something even more challenging. Now, he wants to tackle one of the major obstacles hindering the use of personal genomic information in medicine: the identification and interpretation of genetic variants.
Using a very high tech method, Shendure plans to create in the laboratory all of the single letter variants of hundreds of major genes involved in rare and common diseases. Then, he will investigate the functional implications of each of these gene variants by studying the protein produced by each in cell lines and in animal models. From studying each protein, in vitro (in laboratory experiments) and in vivo (in animals), he should be able to figure out which variations cause disease, which ones raise the risk, and which can be ignored. This would be an impossible task if Shendure did this one at a time—but he intends to develop “multiplex” approaches that will examine many versions of a given protein within a single experiment. Like much of his other work, this project depends on genomic experimental methods that generate mountains of complex data that require extensive computational analysis.
This is an extremely challenging project. Let me give you an example from my own research that illustrates the complexity of this challenge. The gene that causes the disease cystic fibrosis is made of 4,440 chemical letters, which encodes a protein with 1,480 amino acids. A change of just one letter in a vulnerable part of this gene—called a “point mutation”—can produce a protein that leads to the complex multisystem disorder that we call cystic fibrosis. In contrast, single letter changes in different parts of that same gene may cause few or no health problems.
Among the first targets of Shendure’s project are the BRCA1 and BRCA2 genes. Each of these genes has been sequenced in tens of thousands of patients, and some variations have been found to increase the risk of early-onset breast and ovarian cancer. However, there are still many variations in these genes that are of uncertain clinical significance. Health care professionals cannot offer guidance to people with such variations because their health consequences are unknown. Shendure’s analysis could provide much-needed information about which variants increase the risk of disease and which are benign.
How does one set about becoming a research pioneer? Well, if you follow Shendure’s example, it seems that it might not hurt to play around a bit. When his parents bought him a PC at the age of 7, this kid from Ohio taught himself programming—so he could write computer games! Later, he began taking on projects that his mother gave him, such as employee management systems. Shendure took a brief break from computer science during his undergrad years at Princeton University in order to focus on molecular biology. But when he enrolled at Harvard University to earn his M.D. and Ph.D., Shendure’s love of computers resurfaced, and his skills proved invaluable as he worked with his adviser George Church on new sequencing technologies.
As a prelude to the new multiplex experiments, Shendure, in collaboration with Daniela Witten, a 2011 recipient of the NIH Director’s Early Independence Award, has developed a computer program called Combined Annotation-Dependent Depletion (CADD). This program uses various annotations—for example, data being generated by the ENCyclopedia Of DNA Elements (ENCODE) collaboration and the NIH Common Fund’s epigenomics projects—to estimate the potential pathogencity and disease severity of every possible variant in the human genome . Until Shendure can create all of the single letter variants in the lab, CADD will serve as a powerful interim tool for identifying and prioritizing which variants are most worthy of further study.
Shendure’s Pioneer project has the potential to change medicine. If he succeeds in interpreting the clinical implications of all these gene variants—figuring out which are harmful, and which are not—he will have overcome a major hurdle along the path to personalized, genome-based medicine. But, before Shendure’s interpretations can be integrated into genomic-era health care, there are many other challenges that lie ahead. Those include education of health care professionals and patients, regulatory matters, reimbursement issues, and so forth.
Shendure is remarkably optimistic about these issues and believes that if he can figure out his part, then the rest will fall into place. That approach worked for his computer games when he was 7—and this, he says, is much more satisfying.
 Accurate multiplex polony sequencing of an evolved bacterial genome. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM. Science. 2005 Sep 9;309(5741):1728-32. Epub 2005 Aug 4.
 Targeted capture and massively parallel sequencing of 12 human exomes. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Nature. 2009 Sep 10;461(7261):272-6.
 Noninvasive whole-genome sequencing of a human fetus. Kitzman JO, Snyder MW, Ventura M, Lewis AP, Qiu R, Simmons LE, Gammill HS, Rubens CE, Santillan DA, Murray JC, Tabor HK, Bamshad MJ, Eichler EE, Shendure J. Sci Transl Med. 2012 Jun 6;4(137):137ra76.
 A general framework for estimating the relative pathogenicity of human genetic variants. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. [Epub ahead of print].
High-Risk, High-Reward Research. (NIH Common Fund)
NIH Director’s Pioneer Award. (NIH Common Fund)
NIH Director’s Early Independence Award. (NIH Common Fund)
Jay Ashok Shendure, Interpreting Genetic Variants of Uncertain Significance, University of Washington, Seattle
Shendure Lab, University of Washington
Daniela Witten, High-Dimensional, Unsupervised Learning, With Applications to Genomics, University of Washington
Daniela M. Witten Group, Department of Biostatistics, University of Washington
NIH support: National Human Genome Research Institute; Common Fund
The genome adventure everyday surprises us with hopeful news and opens up endless possibilities that we considered some years ago science fiction. With great minds like Jay, fiction becomes science.
This work is not only exciting, but it is important. They should have some help … lots of help! There is enough work in this area to keep many scientists busy. The real question is where to begin and that has many possible answers! I hope we hear much more about this work and others doing similar investigations. There is So Much work to do! Let’s Do It!
Please inform me fully about this research.
Dr. Collins, I agree with you and other commenters here that it is amazing work and if we get the desired outcomes, it could one day be a game changer in personal medicine. While genomics give us very important information about probabilities of different variants to cause increase or decrease our health risks, but it has limited information compared to family medical history. Together, family medical history and genomics data will be very valuable.
Here is a link to your video promoting family medical history in 2007 (https://www.youtube.com/watch?v=ARTFoQ4oHfU). In past 5-6 years, tools to collect family medical history has changed. Today, we do not need someone to ask the questions to family members during the Thanksgiving holidays and write everything on paper, only to repeat it again next year. We can now connect people within families via an online approach, while addressing their privacy and permissions concerns.
The family medical history provides us with much richer information about treatment options and side effects, i.e. what worked or did not. Genomic information is very good in rare diseases, where certain diseases may jump a generation, and it could be good screening or confirmatory tool for family medical history. In our current state, genomic information is not even capable of providing us much reliable data as shown by Dr. Eng’s study from Cleveland Clinic.
Finally, the cost of genome sequencing has come down tremendously, but $1000 per genome cost is not practical for probably another decade. Even if we assume that to be a case, it is still prohibitive for most of the 7 billion people living on this planet, whereas family medical history is free and accurate today.
Z.H., you bring up a valid consideration. We need all of this information together in one place–and much more. Like the Human Connectome project which is bringing together many different scientists, this research will require the same: basic researchers and physicians working side by side, discussing how to read/group/use this information to set up proper experiments for further validation.
What is most interesting to me is the regulation of genes and how much we do not yet know about their functional roles–not necessarily inherited mutations. Screening is what we are mainly seeing stem from these studies today, but there is so much more information to be had. What I see in Bioinformatics and Genomics today amounts to methods (algorithms) requiring/lacking proper validation. What is needed are the proper data sets to test these methods and this includes more than genome sequences alone, as Z. H. mentioned.
We have the tools and are collecting tons of data scattered across multiple universities/hospitals. I hope to see it all in one place soon–with open access allowing investigators to move studies like this (and many more) forward. Dr. Collins has done an excellent job (my humble opinion) in providing support for open access data bases along with the development of tools (also freely available) to mine some of this data but we need more help. We need to work together and we must learn to share data. The possibilities for applications seems endless. We are only beginning to unravel this information–don’t let us lose momentum now.
This IS great news! We are so fed up of being sick all the time and never getting a clear diagnosis… just meds, meds and more meds, not to cure you but to suppress symptoms, and all result in creating other problems.
My son (28yr) has a rare chromosome abnormality Ring22 and other than having having a label, no one has a clue how to help him. My daughter (31yr) has agressive cervical cancer caught just by a fluke between pap smears, and yet the health care system no longer recommends annual pap smears because they say aren’t justified. Doctors are constantly ignoring symptoms or making wrong diagnosis until disease is advanced.
About family history, not everyone knows it, not everyone shares it, and not everyone has living relatives to divulge it. So many families are searching for answers so they can get help for their loved ones. This information can’t come soon enough.
The young people involved in this project are true heroes. I commend you all. Thank you for doing what is right and good for us all. Thank you Thank you Thank you!
We hear and understand your frustration. I had similar personal experience which made me focus on family medical history. If you and I can gather our family medical history and share with researchers, it will benefit them and mankind. They can use data and samples to design experiments to confirm their hypotheses much faster. This is how science works. I was pulled into personal medicine in late 90s and now, 15 years later, we are still struggling to get it off the ground. There are only few personalized treatments available. It is very complicated work and it is the leadership of people like Dr. Collins and a few others which has made this progress possible. Truth is that it will take time.
Regarding people not sharing family medical history, we have been asking the following question for some time: How many people you have personally met who were harmed because someone knew their medical history? You will be surprised by the answer that how little it matters. In the debate of privacy, personal medical history has become an urban legend and no one wants to cross the sacred line of the privacy issue. Dr. Collins did an interview for YouTube years ago, probably just to break this taboo. It is on CDC, NIH and US Surgeon General’s website. Please remember 20 years back no one wanted to share their credit card online and today, I do not even know how many companies have my credit card information stored in their computer e.g. car rental, airlines, hotels, Amazon and many more. The same is true with resume/CV–about 10 years ago it used to be a most private document and people did not like to share it, but now we brag about our connections on LinkedIn.
Finally, we do not have financial interest in this, but it is our mission to connect family medical history data with genomic data. Using current technology, we can gather family medical history data (even partial is better than nothing) fairly quickly and it can help to speed up genomics research.
Dr. Collins, we can use your help!
It’s so good.
This is great news! I hope the tools Shendure and others are creating not only help physicians and individuals interpret disease-linked variants, but also incorporate what we know about nutritional genomics.
Imagine a future with personalized genomic diets. I’m not thinking of anything too dystopian… maybe just that I should hold the pickles on my burger, while you order extra pickles. That sort of personalization.
However, my hunch is that many of these nutritional genomic variants are based on epigenetic modifications, thus identifying them will require much more ChIP-Seq and bisulfite-sequencing, which may still be a prohibitive factor for personal/clinical genomes.
Good luck to Shendure and Witten on these projects!
There are some interesting points in time in this article. There is some validity but I will take hold opinion until I look into it further. Good article, thanks