Measuring migration using big data

Researchers from five organisations collaborated in an effort to use social media data to develop models for estimating real-time migration flows and stocks of migrants in the EU. The researchers were successful in applying a groundbreaking migrant stock model, but the approach for estimating EU mobility flows remains a work in progress. The steps to improve on the method would be to investigate different migration models, and collect more data over a longer period from both Labour Force Surveys and Facebook.

Background

Internal freedom of movement is one of the European Union's four fundamental freedoms and is necessary for the EU single market to function. Yet official statistics on the migration of workers are constrained. They are limited in their ability to distinguish population subgroups, come with a considerable time lag of a year or more and are fully reliant on individual member states' measurements. Current data sources also tend to underestimate the overall extent of mobility by not covering short-term moves and not capturing the most recent movers.

Given the importance of freedom of movement, it is crucial for European institutions to have robust, rich and up-to-date data to monitor it. Big data sources from social media, such as Twitter and Facebook, offer opportunities to bridge the gap between official statistics and recent migration trends.

Goals

The European Commission (EC) asked RAND Europe to investigate the potential for using social media data to estimate both migration flows and stocks of migrants. Migration flows are the number of migrants entering or leaving a country during a specified time period; stocks of migrants are the number of migrants in a particular country at a specified time. Researchers collaborated with experts from the Vienna Institute for Demography, the University of Manchester, Washington University, the Qatar Computing Research Institute, and the Max Planck Institute for Demographic Research. The ultimate objective is to develop a sustainable method of measurement and model of calculation of (labour) mobility and migration within the EU.

Methodology

Social media can provide high quantities of anonymous information that is geo-tagged, giving the geographical location of individual users’ content. Researchers collected social media data, in particular from Facebook and Twitter, to measure EU migration. They then used a two-pronged approach to develop models to estimate stocks of EU migrants and EU migration flows.

Approach for measuring the stocks of EU migrants

To measure stocks of EU migrants, researchers used geo-referenced data from Facebook’s Advertising Platform, complemented with statistics from Eurostat, the EU Labour Force Survey and census data. Facebook is the largest social networking platform in the EU, with about 252 million users who log on at least once a month. It provides access to a large amount of anonymous data on the characteristics of its users. Researchers collected data about the approximate number of users within an EU member state that fit the description of people who used to live in one country and who now live in another.

The study estimates for each EU member state the unobserved stocks of EU movers by EU country of birth, by broad age groups and gender for the years between 2011 and 2018. To obtain a true estimate, researchers adjusted the stocks by using models to account for bias in the reported data.

Approach for measuring the movement of EU migrants

Researchers also looked at developing a method for measuring up-to-date migration flows. While Facebook data shows the number of users and their origin, Twitter offers real-time information on the location of a small share of their users. Twitter has fewer users in the EU than Facebook, however a small proportion of Twitter messages or tweets can provide information about the user’s location. While Twitter users may not be representative of the EU population overall, the data provide a unique opportunity to study the movement of people. To estimate EU mobility flows for each member state researchers used a basic estimation framework to combine the Twitter data with the Eurostat figures.

Findings

Measuring the stocks of EU migrants

The model estimates that in 2018 there were more than 15 million EU citizens living in another EU member state than their home country. When comparing the results with official statistics, most of the country estimates are comparable to those reported by Eurostat. However for a handful of countries, mainly Italy and Spain, our estimates are higher than the reported number of migrants in Eurostat, suggesting that these countries may have missing observations.

Measuring the movement of EU migrants

The initial results show that the model which included Twitter data does not appear to show any advantage over the application with only Eurostat data. There are multiple reasons that may have caused the accuracy to be lower, such as the short time series of quality Twitter data and the small number of Twitter users for whom researchers had location data, and further investigation is required. We therefore do not recommend applying this approach to estimate EU mobility flows in its current form.

Conclusions

The results of the application of the stocks model are experimental, but they are promising. This model creates a bridge between the official, traditional statistics and the newly collected social media data. Complementary research and data would be needed to improve the robustness of new estimates. The main limitation is the lack of overlapping time-series between the official data and the Facebook data.

Next steps to improve on the method would be to investigate different migration models, and collect more data over a longer period from both Labour Force Surveys and Facebook.

The approach taken to estimate EU mobility flows has not yet offered plausible results. Further research is required to develop a robust and reliable sample of Twitter data. As Twitter imposes restrictions on sharing data with third parties, one way forward for the EC would be to start building its own dataset.