The use of socially generated “big data” from our daily activities has become a new technique to understand and predict our collective behaviours. Big data techniques have multifarious applications — for example predicting flu outbreaks based on the volume of tweets mentioning flu-related keywords, understanding the patterns of human mobility by analysing the records of mobile phone calls, or forecasting the financial success of a movie by studying the page view statistics of the Wikipedia articles about the movie. What all these examples have in common is the concept of quantifying and measuring activity of individuals at the collective level to understand and model human societies in a computational framework.

The desire of mankind to know about the surrounding world has led to huge advancements in knowledge in recent centuries. Today we know a lot about the universe; from the super small elementary particles to very far galaxies, there is nothing left not yet investigated by scientists. Although there are still many unanswered questions, the high pace of knowledge creation and the increase in our understanding of nature is undeniable. Examples of inventions in medicine, natural sciences, engineering, are numerous and their effects on our daily life is clearly evident.

However, despite the considerable amount of effort, our understanding of ourselves and more precisely human societies is underdeveloped. In contrast to the huge developments in science and technology, our societies are still suffering from very old and basic problems. Social unrest, riots and crimes, political conflicts and wars, economic crises, poverty, inequality and dictatorship are only few examples of the social disorders that we still have not found any solution for. Compared to the natural sciences and technological advancements, social scientists have not had the same kind of success; improvement in knowing our societies has been very slow.

The huge improvements in natural sciences in 17th and 18th centuries, especially in physics, are mostly due to the new convention of modern science, which is based on experiments, measurements and quantitative modelling. Only by following this, scientist have been able to understand the universal patterns and laws which govern the natural phenomena in a very accurate way such that by knowing the current states of a system, in many cases, the future behaviour of it is predictable.

In contrast, in social sciences, performing near real-experiments, quantifying and measuring involved parameters, and providing a mathematical model to describe empirical observations are all quite challenging and in many cases impossible. In studying natural systems, one could observe and monitor the system under study continuously and perform all the necessary measurements. Whereas when we study social systems, not only is complete observation of all the actions and interactions very difficult, but it is also challenging to definine measurable parameters.

How can we quantify the level of dissatisfaction of the members of a society? How to measure the kindness of a person? How can we define the strength of social interactions and peer pressures? And even if we are able to do so, how do we monitor the system and record all these parameters continuously and under different conditions? These kinds of questions have made the social sciences limited to qualitative descriptions of observations without any ability to predict and forecast the future behaviour of the system accurately.

However, things are about to change. Our lives are being transformed to a digital world, where our social interactions leave a digital footprint. Our daily social transactions are being recorded and producing a Big amount of Data. The amount of digital data that we produce through our daily life activities, ranging from financial activities in online banking and e-commerce, our social communications via phones and online social networks, to our online socio-political movements such as online petitions and campaigns, is huge. Most of these data are being recorded and stored for various reasons; your cell phone provider records your communications to be able to issue you the bill, and Google records your search queries to provide better search results in the future. Amazon analyses your purchases to make more accurate product recommendations, and Facebook keeps track of your “likes” and “pokes” to facilitate your online social networking. Apart from various uses and applications that recording and analysing these data could have in enterprise and corporations, a very important usage of it would be in the recently emerging field of Computational Social Science.

These days, one could quantify and measure the popularity of a politician by considering the number of her twitter followers or the likes given to her Facebook posts. This is a very easy task compared to classic methods of social science based on surveys and questionnaires. Today, by analysing the volume of Google search queries for relevant keywords, scientist can forecast the financial moves in the markets and by counting the number of edits to Wikipedia articles about movies, box office takings can be predicted with ground-breaking accuracy. And more importantly, now we can perform large scale analysis to reveal the gender dependent features of our communication patterns.

If the invention of telescopes provided us with the ability to understand how galaxies behave, and the microscope allowed us to find the cure of such a huge amount of diseases, this century we are going to understand much more about the social systems because of big data. There is no doubt that humans are much more complicated than atoms or even planets and stars, but with the help of powerful mathematical tools and our ever-faster computers we will be able to find and reveal the universal laws of human societies in a numerical framework.

It is easy to imagine the future “smart-cities” being designed to be functioning based on the real-time data analysis of our daily activities. Adaptive transport systems which self-organise themselves to optimise the flow, better and more efficient health services elaborated by better prioritisation of demands and assignment of resources, more fluid and transparent financial models and more democratic and horizontal processes of data-driven policy making are all possible practical outcomes of the data revolution.

Big data techniques and their use in computational social sciences will provide us the ability to cope with socio-economical crises in a better way and decreases the costs of taking blind risks based on inaccurate qualitative speculations. Our “self-aware” societies of the future will be better places to belong to.

Dr. Taha Yasseri is a researcher at the Oxford Internet Institute, University of Oxford. His main research focus is on online societies, government-citizen interactions on the web and structural evolution of the Internet. He uses mathematical models and data analysis to study social systems quantitatively. Prior to Oxford Internet Institute, he spent two years as a Postdoctoral Researcher at the Budapest University of Technology and Economics, working on socio-physical aspects of the community of Wikipedia editors, focusing on conflict and editorial wars, along with Big Data analysis to understand human dynamics, language complexity, and popularity spread. Taha completed his PhD on spontaneous pattern formation in complex systems.

(Image Credit: William Pearce)