In the time it took you to read this sentence, NASA gathered approximately 1.73 gigabytes of data from our nearly 100 currently active missions! We do this every hour, every day, every year – and the collection rate is growing exponentially. Handling, storing, and managing this data is a massive challenge. Our data is one of our most valuable assets, and its strategic importance in our research and science is huge. We are committed to making our data as accessible as possible, both for the benefit of our work and for the betterment of humankind through the innovation and creativity of the over seven billion other people on this planet who don’t work at NASA.

What is Big Data?

The whole idea of big data is still relatively new and not well understood and most discussions around the subject start off with a definition of what big data really is. Definitions are certainly helpful, so we’ll start there oursevles. Big data is very simply a collection of data sets so large and complex that your legacy IT systems can not handle them. When organizations get to the point where their volume, velocity, variety and veracity of data exceed storage or computing capacity, there are some big challenges that need to be addressed. You know you have a big data challenge when your traditional data management systems and analysis tools are overwhelmed and it becomes difficult to process your data using the analytic or visualization tools you currently have.

Approaching the big data challenge often necessitates advanced algorithms, infrastructure and frameworks – and it can all seem very daunting for those just starting out – but the reality for information-age-based organizations is that your success is throttled by your ability to rapidly and comprehensively navigate the big data universe.

But of course, big data is relative. In the end, big data by itself has no value – it’s meaningless. It’s what you do with the data that matters most. Today’s big data discussion is often centered around how to target advertisements or customize a user experience, which makes sense given that the growth in the market place is so closely tied to fact that how we interact with the physical world is more and more dependent on the pervasive use of mobile devices that are connect to the work through sensors. Having the ability to leverage our rich history of data and combine it with new data we are receiving is a huge asset in making our missions successful.

If you are still trying to wrap your head around the difference between petabytes, exabytes, zetabytes, and yottabytes, check out this overview presentation titled “What is Big Data and why does it matter” by Tom Soderstrom, the Chief Technology Officer for IT at NASA JPL.

NASA’s Big Data Challenge

NASA’s big data challenge is not just a terrestrial one and it goes beyond the stereotypical challenge. Many of our “big data” sets are described by significant metadata, but on a scale that challenges current and future data management practice. We regularly engage in missions where data is continually streaming from spacecraft on Earth and in space, faster than we can store, manage, and interpret it. NASA has two very different types of spacecraft. We have deep space spacecraft that sends back data in the order of MB/s. Then we have earth orbiters that can send back data in GB/s per second. In our current missions, data is transferred with radio frequency, which is relatively slow. In the future, NASA will employ technology such as optical (laser) communication to increase the download and mean a 1000x increase in the volume of data. This is much more then we can handle today and this is what we are starting to prepare for now.

We are planning missions today that will easily stream more then 24TB’s a day. That’s roughly 2.4 times the entire Library of Congress – EVERY DAY. For one mission.

It’s still very expensive to transfer just one bit down from a spacecraft so we want to make sure we collect what is most important. Once the data makes its way to our data centers, storing, managing, visualizing and analyzing it becomes an issue. To give you an idea of what we are dealing with, the size of the Climate Change data repositories alone are projected to grow to nearly 350 Petabytes by 2030. 5 PB’s is equivalent to the total number of letters delivered by the US Postal Service in one year!

One great example of the unique challenge that we face with managing space data is just starting to be demonstrated by the Australian Square Kilometer Array Pathfinder (ASKAP) project which is a large array made up of 36 antennas, each 12 meters in diameter, spread out over 4,000 square meters but working together as a single instrument to unlock the mysteries of our universe. The array, which will officially be turned on and open for business tomorrow Friday, October 5, 2012, is able to survey the whole sky very quickly and offers an ability to perform research that could never have been done before. Check out this great time lapse video showing off the new telescopes capabilities! The array is a precursor for the larger Square Kilometre Array telescope that will open in 2016 and will combine the signals received from thousands of small antennas spread over a distance of more than 3000 km. When operational, as much as 700TB/second of data will flow from the Square Kilometre Array! This is a big data challenge.

And of course, spacecraft are not the only source of our data, thanks to an ever-growing supply of mobile devices, low-cost sensors, and online platforms. As an article in Harvard Business Review put it, “each of us is now a walking data generator.” The scale of the big data challenge for NASA, like many organizations, is daunting.

Each of us is now a walking data generator.

As you can probably imagine, the increasing data volumes are not our only challenges. As our wealth of data increases, the challenge of indexing, searching, transferring, and so on all increase exponentially as well. Additionally, the increasing complexity of instruments and algorithms, increasing rate of technology refresh, and the decreasing budget environment, all play a significant factor in our approach. Fortunately, the entire federal government has turned their attention towards the growing challenge. In March 2012, the Obama administration announced the Big Data Research and Development Initiative to “greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.” The goal is to transform government’s ability to use big data for scientific discovery, environmental and biomedical research, education, and national security.