Skip to main content

SoBigData Articles

What does Twitter bring to the table? New dimensions from Twitter on international migration

Traditional migration data sources such as census, survey, and register data have been the main sources of migration studies. However, data quality varies vastly from one country to another, making it difficult to establish consistency of data across different countries. This is because traditional data are costly and time-consuming; some countries cannot afford to collect such data [2].

This is where Twitter comes in. Twitter is a microblogging website where individual users share their opinions and news freely. In 2019, Twitter had about 330 million monthly active users[1]. It is also freely available data that is available through an application programming interface (API). Different from traditional data, working with such data does not require high cost or much time. This social media platform shows to be a good complement to traditional data sources of migration studies.

In [2], we employ Twitter data to identify international migrants. To do so, we define a migrant as a user whose "identified nationality is different from the country of residence". To define the country of residence, we observe geo-tagged tweets over a period of the year in 2018. To determine nationality, we look at linguistic and social connections back to the migrant's country of origin. The definition adopted here is indeed intended to respect the official definition of a migrant where it defines "a person who moves to a country other than that of his or her usual residence for a period of at least a year"[2].

Following the methodology, we were able to identify about 3,000 migrant users. Figure 1 shows the migration links between countries. The colours on the outer part of the chord represent the nationality of the migrants, and the width of the chord represents the number of migrants in our data in 2018. For visualisation purposes only those with at least ten migrants, hence 21 countries are shown.

We then validated the results using two official statistics; Eurostat and "Anagrafe degli Italiani Residenti all'Estero" (AIRE)[3] on Italian emigrants in overseas. The correlation between the predictions and two official statistics show good correlation level of 0.753 with AIRE data, and 0.711 with Eurostat data. However, when looking at Italians in non-European countries, the correlation with AIRE data drops to 0.626.

Notwithstanding that adopting Twitter data has its limitations, such as sample bias, we demonstrate that Twitter can indeed complement traditional data sources as it brings new values and dimensions to the studies of international migration. In this work, we also demonstrated that Twitter is useful not only for migration statistics but also on how topics spread across and throughout different migrant communities.

Figure 1. Chord diagram showing the migration links between countries.

Figure 2. The first two plots show the correlation between prediction and Eurostat and AIRE on Italian emigrants in Europe and the last plot shows the correlation between prediction and AIRE on Italian emigrants in the rest of the world.

 

The paper describing the full research can be found in:

Kim, Jisu, et al. "Digital Footprints of International Migration on Twitter." International Symposium on Intelligent Data Analysis. Springer, Cham, 2020.

 

Written by Ji Su Kim, PhD student in Data Science, Scuola Normale Superiore, Italy.

Revised by Laura Pollacci e Matteo Bohm

References

[1] Sîrbu, Alina, et al. "Human migration: the big data perspective." International Journal of Data Science and Analytics (2020): 1-20.

[2] Kim, Jisu, et al. "Digital Footprints of International Migration on Twitter." International Symposium on Intelligent Data Analysis. Springer, Cham, 2020.

 

 

[2] Recommendations on Statistics of International Migration, Revision 1 (p. 113). United Nations, 1998.

[3] Italian register data