pic 45

Estimating global migration flows by combining traditional and innovative data sources using machine learning


This project aims to generate annual estimates of global bilateral country-to-country migration flows by combining traditional and new forms of data with machine learning technology. The research team is currently assessing existing annual traditional data sources of migration statistics – censuses, surveys and administrative data – and is building a global database of historical country-to-country migration flows on a yearly basis starting from 1991. In a second step, this project will develop machine learning models to estimate annual migration flows, and will include innovative data sources such as Google search data and air traffic data. Finally, all estimates will be validated via simulations and comparisons to estimates from national and international migration statistics.


The early key findings reveal a large geographical and temporal variance in the coverage of migration data. While the coverage of traditional migration data sources has increased over time, data deficiencies remain pronounced across Africa, Asia, the Caribbean and Eastern Europe. Higher data coverage is observed in North America, Latin America, Western Europe and Oceania, but differing definitions of migrants present challenges for data comparability. Regarding the technical aspects, various structures of machine learning models have been  tested, displaying consistent degrees of accuracy. Overall, these are promising early stage results and further tests and refinements will allow for final conclusions and recommendations on this approach.


(Image: © University of Liverpool)

Last modified
13 October 2021