non-coding DNA
New Technology Opens Evolutionary Window into Brain Development
Posted on by Dr. Francis Collins

One of the great mysteries in biology is how we humans ended up with such large, complex brains. In search of clues, researchers have spent years studying the protein-coding genes activated during neurodevelopment. But some answers may also be hiding in non-coding regions of the human genome, where sequences called regulatory elements increase or decrease the activity of genes.
A fascinating example involves a type of regulatory element called a human accelerated region (HAR). Although “human” is part of this element’s name, it turns out that the genomes of all vertebrates—not just humans—contain the DNA segments now designated as HARs.
In most organisms, HARs show a relatively low rate of mutation, which means these regulatory elements have been highly conserved across species throughout evolutionary time [1]. The big exception is Homo sapiens, in which HARs have exhibited a much higher rate of mutations.
The accelerated rate of HARs mutations observed in humans suggest that, over the course of very long periods of time, these genomic changes might have provided our species with some sort of evolutionary advantage. What might that be? Many have speculated the advantage might involve the brain because HARs are often associated with genes involved in neurodevelopment. Now, in a paper published in the journal Neuron, an NIH-supported team confirms that’s indeed the case [2].
In the new work, researchers found that about half of the HARs in the human genome influence the activity, or expression, of protein-coding genes in neural cells and tissues during the brain’s development [3]. The researchers say their study—the most comprehensive to date of the 3,171 HARs in the human genome—firmly establishes that this type of regulatory element helps to drive patterns of neurodevelopmental gene activity specific to humans.
Yet to be determined is precisely how HARs affect the development of the human brain. The quest to uncover these details will no doubt shed new light on fundamental questions about the brain, its billions of neurons, and their trillions of interconnections. For example, why does human neural development span decades, longer than the life spans of most primates and other mammals? Answering such questions could also reveal new clues into a range of cognitive and behavioral disorders. In fact, early research has already made tentative links between HARs and neurodevelopmental conditions such as autism spectrum disorder and schizophrenia [3].
The latest work was led by Kelly Girskis, Andrew Stergachis, and Ellen DeGennaro, all of whom were in the lab of Christopher Walsh while working on the project. An NIH grantee, Walsh is director of the Allen Discovery Center for Brain Evolution at Boston Children’s Hospital and Harvard Medical School, which is supported by the Paul G. Allen Foundation Frontiers Group, and is an Investigator of the Howard Hughes Medical Institute.
Though HARs have been studied since 2006, one of the big challenges in systematically assessing them has been technological. The average length of a HAR is about 269 bases of DNA, but current technologies for assessing function can only easily analyze DNA molecules that span 150 bases or less.
Ryan Doan, who was then in the Walsh Lab, and his colleagues solved the problem by creating a new machine called CaptureMPRA. (MPRA is short for “massively parallel reporter assays.”) This technological advance cleverly barcodes HARs and, more importantly, makes it possible to analyze HARs up to about 500 bases in length.
Using CaptureMPRA technology in tandem with cell culture studies, researchers rolled up their sleeves and conducted comprehensive, full-sequence analyses of more than 3,000 HARs. In their initial studies, primarily in neural cells, they found nearly half of human HARs are active to drive gene expression in cell culture. Of those, 42 percent proved to have increased ability to enhance gene expression compared to their orthologues, or counterparts, in chimpanzees.
Next, the team integrated these data with an existing epigenetic dataset derived from developing human brain cells, as well as additional datasets generated from sorted brain cell types. They found that many HARs appeared to have the ability to increase the activity of protein-coding genes, while a smaller—but very significant—subset of the HARs appeared to be enhancing gene expression specifically in neural progenitor cells, which are responsible for making various neural cell types.
The data suggest that as the human HAR sequences mutated and diverged from other mammals, they increased their ability to enhance or sometimes suppress the activity of certain genes in neural cells. To illustrate this point, the researchers focused on two HARs that appear to interact specifically with a gene referred to as R17. This gene can have highly variable gene expression patterns not only in different human cell types, but also in cells from other vertebrates and non-vertebrates.
In the human cerebral cortex, the outermost part of the brain that’s responsible for complex behaviors, R17 is expressed only in neural progenitor cells and only at specific time points. The researchers found that R17 slows the progression of neural progenitor cells through the cell cycle. That might seem strange, given the billions of neurons that need to be made in the cortex. But it’s consistent with the biology. In the human, it takes more than 130 days for the cortex to complete development, compared to about seven days in the mouse.
Clearly, to learn more about how the human brain evolved, researchers will need to look for clues in many parts of the genome at once, including its non-coding regions. To help researchers navigate this challenging terrain, the Walsh team has created an online resource displaying their comprehensive HAR data. It will appear soon, under the name HAR Hub, on the University of California Santa Cruz Genome Browser.
References:
[1] An RNA gene expressed during cortical development evolved rapidly in humans. Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Ares M Jr, Vanderhaeghen P, Haussler D. Nature. 2006 Sep 14;443(7108):167-72.
[2] Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions. Girskis KM, Stergachis AB, DeGennaro EM, Doan RN, Qian X, Johnson MB, Wang PP, Sejourne GM, Nagy MA, Pollina EA, Sousa AMM, Shin T, Kenny CJ, Scotellaro JL, Debo BM, Gonzalez DM, Rento LM, Yeh RC, Song JHT, Beaudin M, Fan J, Kharchenko PV, Sestan N, Greenberg ME, Walsh CA. Neuron. 2021 Aug 25:S0896-6273(21)00580-8.
[3] Mutations in human accelerated regions disrupt cognition and social behavior. Doan RN, Bae BI, Cubelos B, Chang C, Hossain AA, Al-Saad S, Mukaddes NM, Oner O, Al-Saffar M, Balkhy S, Gascon GG; Homozygosity Mapping Consortium for Autism, Nieto M, Walsh CA. Cell. 2016 Oct 6;167(2):341-354.
Links:
Christopher Walsh Laboratory (Boston Children’s Hospital and Harvard Medical School)
The Paul G. Allen Foundation Frontiers Group (Seattle)
NIH Support: National Institute of Neurological Disorders and Stroke; National Institute of Mental Health; National Institute of General Medical Sciences; National Cancer Institute
A Global Look at Cancer Genomes
Posted on by Dr. Francis Collins

Cancer is a disease of the genome. It can be driven by many different types of DNA misspellings and rearrangements, which can cause cells to grow uncontrollably. While the first oncogenes with the potential to cause cancer were discovered more than 35 years ago, it’s been a long slog to catalog the universe of these potential DNA contributors to malignancy, let alone explore how they might inform diagnosis and treatment. So, I’m thrilled that an international team has completed the most comprehensive study to date of the entire genomes—the complete sets of DNA—of 38 different types of cancer.
Among the team’s most important discoveries is that the vast majority of tumors—about 95 percent—contained at least one identifiable spelling change in their genomes that appeared to drive the cancer [1]. That’s significantly higher than the level of “driver mutations” found in past studies that analyzed only a tumor’s exome, the small fraction of the genome that codes for proteins. Because many cancer drugs are designed to target specific proteins affected by driver mutations, the new findings indicate it may be worthwhile, perhaps even life-saving in many cases, to sequence the entire tumor genomes of a great many more people with cancer.
The latest findings, detailed in an impressive collection of 23 papers published in Nature and its affiliated journals, come from the international Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. Also known as the Pan-Cancer Project for short, it builds on earlier efforts to characterize the genomes of many cancer types, including NIH’s The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC).
In these latest studies, a team including more than 1,300 researchers from around the world analyzed the complete genomes of more than 2,600 cancer samples. Those samples included tumors of the brain, skin, esophagus, liver, and more, along with matched healthy cells taken from the same individuals.
In each of the resulting new studies, teams of researchers dug deep into various aspects of the cancer DNA findings to make a series of important inferences and discoveries. Here are a few intriguing highlights:
• The average cancer genome was found to contain not just one driver mutation, but four or five.
• About 13 percent of those driver mutations were found in so-called non-coding DNA, portions of the genome that don’t code for proteins [2].
• The mutations arose within about 100 different molecular processes, as indicated by their unique patterns or “mutational signatures.” [3,4].
• Some of those signatures are associated with known cancer causes, including aberrant DNA repair and exposure to known carcinogens, such as tobacco smoke or UV light. Interestingly, many others are as-yet unexplained, suggesting there’s more to learn with potentially important implications for cancer prevention and drug development.
• A comprehensive analysis of 47 million genetic changes pieced together the chronology of cancer-causing mutations. This work revealed that many driver mutations occur years, if not decades, prior to a cancer’s diagnosis, a discovery with potentially important implications for early cancer detection [5].
The findings represent a big step toward cataloging all the major cancer-causing mutations with important implications for the future of precision cancer care. And yet, the fact that the drivers in 5 percent of cancers continue to remain mysterious (though they do have RNA abnormalities) comes as a reminder that there’s still a lot more work to do. The challenging next steps include connecting the cancer genome data to treatments and building meaningful predictors of patient outcomes.
To help in these endeavors, the Pan-Cancer Project has made all of its data and analytic tools available to the research community. As researchers at NIH and around the world continue to detail the diverse genetic drivers of cancer and the molecular processes that contribute to them, there is hope that these findings and others will ultimately vanquish, or at least rein in, this Emperor of All Maladies.
References:
[1] Pan-Cancer analysis of whole genomes. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Nature. 2020 Feb;578(7793):82-93.
[2] Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Rheinbay E et al; PCAWG Consortium. Nature. 2020 Feb;578(7793):102-111.
[3] The repertoire of mutational signatures in human cancer. Alexandrov LB et al; PCAWG Consortium. Nature. 2020 Feb;578(7793):94-101.
[4] Patterns of somatic structural variation in human cancer genomes. Li Y et al; PCAWG Consortium. Nature. 2020 Feb;578(7793):112-121.
[5] The evolutionary history of 2,658 cancers. Gerstung M, Jolly C, Leshchiner I, Dentro SC et al; PCAWG Consortium. Nature. 2020 Feb;578(7793):122-128.
Links:
The Genetics of Cancer (National Cancer Institute/NIH)
Precision Medicine in Cancer Treatment (NCI)
The Cancer Genome Atlas Program (NIH)
NCI and the Precision Medicine Initiative (NCI)
NIH Support: National Cancer Institute, National Human Genome Research Institute
Studies of Dogs, Mice, and People Provide Clues to OCD
Posted on by Dr. Francis Collins

Thinkstock/wildpixel
Chances are you know someone with obsessive-compulsive disorder (OCD). It’s estimated that more than 2 million Americans struggle with this mental health condition, characterized by unwanted recurring thoughts and/or repetitive behaviors, such as excessive hand washing or constant counting of objects. While we know that OCD tends to run in families, it’s been frustratingly difficult to identify specific genes that influence OCD risk.
Now, an international research team, partly funded by NIH, has made progress thanks to an innovative genomic approach involving dogs, mice, and people. The strategy allowed them to uncover four genes involved in OCD that turn out to play a role in synapses, where nerve impulses are transmitted between neurons in the brain. While more research is needed to confirm the findings and better understand the molecular mechanisms of OCD, these findings offer important new leads that could point the way to more effective treatments.