This post is inspired from a book I read, Naked Statistics and a documentary that I watched, The Human Face of Big Data. I recommend the documentary to anyone who wants to know more about what the big fuss is about data and why we care. The book is focused on just how powerful statistical analysis can be, and how easy it is to trip into a false sense of security and draw incorrect conclusions from data. It cautions the reader to be wary of interpreted statistics that often can be framed to support a variety of different hypotheses. The documentary and the book al formed the basis of a talk I gave on the power of data early last month.

Data is everywhere. Everyone is talking about it. You hear “big data” being thrown about in everyday conversation. According to the WSJ, data scientists are among the highest paid professionals in Silicon Valley. 40 Zettabytes of data will have been generated by human beings by the year 2020. If we were to use one sand grain to represent each bit for the storage of our data, the 40 zettabytes of data would correspond to 57 times the number of sand grains in all of the beaches around the world. I really like this analogy. It instantly puts into perspective just how much data we generate just from our daily activities.

Let us try and define data. Data is recorded knowledge/information. Data in itself has no inherent meaning. It’s just a set of figures, numbers, “data points”; a qualitative or quantitative measure of something. However, when we interpret data, we can find things, we can understand, experiment, observe and predict things.

But wait, haven’t we always been recording information? What changed in recent times to make “data” the biggest buzzword in recent times? A decade ago, computers were not powerful enough to crunch or store so much information. Today, they are. They are both capable and good at storing and analyzing very large amounts of information. This new ability gives us new ways to gather insights from data.

In 2010, Haiti, earthquake - 100,000 people were feared dead and when you have a large number of people stranded on the ground and no easy way to communicate, how do you route supplies? A group of students came up with a way to use the data that the affected people generated on Twitter and other social media, to create a live Haiti Crisis Map. They mapped tweets to locations, helped rescue workers find the people who were in need, and thereby had a tangible impact on the rescue efforts.

Carolyn McGregor, the research chair of health informatics at the University of Ontario’s Institute of Technology runs the Artemis project. She says that about 20 percent of low-birth-weight babies develop an infection, and of those babies about 18 percent actually pass away. A doctor once tracked these babies using hourly checks for heart rate, respiration and other vitals. However, McGregor’s research has shown that with constant monitoring of an infant’s heart rate, it is possible to detect an infection long before it manifests in other detectable ways. This research was made possible by collecting and analyzing the heart rates of thousands of infants, and will likely save the lives of thousands of infants in the future.

City of Boston in 2012 released a smartphone application called Speed Bump. This application uses the accelerometer sensor in everyone’s smartphones to detect potholes on roads. If you’re driving and you hit a pothole while the app is loaded, Street Bump pairs up data about the size of the bump with a GPS coordinate and sends that to a city database. This helps create “real-time” maps of road conditions, allowing the city to catch potholes early and prioritize repairs appropriately. One additional advantage is that the city can detect roads getting progressively worse, and perform repairs even before an actual problem is noticed by the people.

Netflix and entertainment - We’ve all seen Netflix make predictions about the movies one may like based on one’s interests and “persona”. A lesser known fact is how Netflix makes data driven buying decisions when it comes to buying content - movies, and TV shows for its platform. In 2011 Netflix made one of the biggest decisions. They outbid top television channels like HBO and AMC to earn the rights for a U.S. version of House of Cards, giving them 2 seasons with 13 episodes in each season. They had predictive models that let them estimate how their audience would respond and the show was a huge success. Data also influences the way Netflix does marketing. Netflix made 10 different cuts of the trailer for House of Cards, each geared toward different kinds of audiences. If you saw the trailer, you likely saw one of the ten versions that was most likely to compel you to watch the show in its entirety.

The different examples above show how data is revolutionizing different industries and the way we solve problems. It is impacting our lives in many other different ways, ranging from smart virtual assistants, smart IoT devices to many other things we take for granted in our lives. Our ability to process data can be used for bad as well. Several times, incorrectly interpreted data, or oversimplified solutions to problems like recidivism and the crime rate have led to solutions that have either made the problem worse, or had questionable impact.

NSA, Prism, Surveillance: Data empowers the Government to prevent terrorism and other threats to national security. But where does one draw the line between intelligence and the right to privacy? As the NSA’s PRISM program revealed, it is difficult to account for and understand what happens with stored information, including email, video and voice chat, videos, photos, voice-over-IP chats (such as Skype), file transfers, and social networking details.

Free online services: A lot of online services are perceived as free, but people seldom understand that they are trading their data and privacy for these “free” services. This is not necessarily bad. It is however important to be cognizant of this. When the person retains control of their data and they’re allowed to choose when they want to make their data unavailable, it may not necessarily be a bad thing. However, a lot of services do not have any degree of transparency when it comes to how one’s data is being used, or how one can re-establish control over it.

I’m optimistic when it comes to the future. I think as people become cognizant of exactly how valuable data is, corporations and other entities will allow them to take control of it, and eventually, we’ll see data being put to use for good rather than evil, giving us insights and intelligence which will enable living higher quality lives.