Transforming Big Data into Big Knowledge
Posted on by Dr. Francis Collins
As technology allows us to tackle mind-boggling tasks like sequencing an entire human genome (or ten) in a few hours, or recording thousands of neurons chattering in the brain, or imaging an entire organ or body in super high-resolution, we are generating enormous quantities of data. I’m talking enormous quantities—think tera-, peta-, and even exa-bytes. The challenge presented by this revolution is the need to develop and implement hardware and software that can store, retrieve, and analyze this mountain of complex data—and transform it into knowledge that can improve our understanding of human health and disease.
To remedy this NIH has launched the Big Data to Knowledge (or BD2K) initiative. The goal is to develop new tools to analyze, organize, and standardize all this data, so that it is easy for scientists to share and access.
There are new funding opportunities for creative minds to create these new tools, and plans to support workshops and training sessions to prepare our scientific workforce for this new era of high-volume biomedical data. We will establish hubs for this research—Centers of Excellence—that will be chosen based on peer-review, and based at universities and institutions around the country.
I have no doubt that the tools and methods we develop through BD2K with enrich and permeate all fields of science.
Link: NIH Big Data 2 Knowledge (BD2K)
I completely agree that we need biomed researchers comfortable at least at the petascale. I recall a petabyte is around the nation’s total DNA code info, 100-200 petabytes around what Facebook has on its servers.
But don’t these Big Data 2 Knowledge storage and processing capabilities already exist at far more advanced levels in other parts of the government (zettabytes, 1,000s of exabytes, each of which is 1,000 petabytes) or even just commercially (exascale)?
The intelligence community’s Utah data center’s storage ability will be 5 zettabytes and its processing ability around 100 petaflops. (Sources: http://nsa.gov1.info/utah-data-center/. and http://news.nationalgeographic.com/news/2013/06/130612-nsa-utah-data-center-storage-zettabyte-snowden/)
I realize we can’t buy time on the NSA machine, but can’t we just hire some ex-govt folks for whom storage and processing at petascale is no biggie?
I second the question posed by MDH. Not only do we already possess the skills to do this, but more important (I think) to many researchers is to begin to put this information to work for us – and the sooner the better. Efficiency is key: we need to stop wasting valuable time. Hire the right help and let’s get to work.
Good article. I certainly appreciate this site. Thanks!