Senior Data Analyst

Leaving no one behind — Disaggregating census data by migration status through IPUMS

The 2030 Sustainable Development Agenda calls for United Nations (UN) Member States to “leave no one behind” as they work toward meeting targets in the Sustainable Development Goals (SDGs). To ensure that migrants are not left behind, countries need data disaggregated by migratory status. Censuses are statistically robust sources for such data. However, census data are often not harmonized and census reports often do not disaggregate statistics by migrant status.

The Integrated Public Use Microdata Series (IPUMS) International project harmonizes and integrates census data, including data on migrants. Kristen Jeffers, Senior Data Analyst, describes IPUMS and the potential it holds for monitoring the SDGs and assessing whether and to what degree migrants are left behind. 

(This blog is based on a pilot study on disaggregating SDG indicators by migratory status.)

IPUMS hosts the world's largest public archive of census microdata samples

IPUMS-International is a project that partners with national statistical offices to collect, integrate, harmonize and disseminate census microdata from around the world. As of 2017, microdata from 301 sample censuses and 85 countries are available to researchers free of charge through the IPUMS International online data dissemination system. The series includes data from 1960 to the present.

Many countries make anonymized public-use samples of census microdata available to researchers and policymakers, and IPUMS-International codes and consistently documents these data across countries and over time to facilitate comparative research. The harmonized microdata are pooled into a single database from which data users can build customized datasets for use in academic and policy research.

IPUMS census microdata are useful for monitoring SDGs

IPUMS customization feature allows for the type of flexible analysis that is necessary to monitor the SDGs and ensure no one is left behind. Also, given that IPUMS census microdata can be disaggregated across various dimensions, they are useful for national governments and UN “custodian” agencies that are charged with compiling and reporting SDG-related data.

More than 30 indicators for 10 of the 17 SDGs can be calculated as officially operationalized using census microdata. These include indicators related to:

  • fertility
  • mortality
  • access to basic services
  • enrollment in education, and
  • labor force participation and composition.

For dozens of additional SDG targets that rely on “big data” and other non-traditional data sources that are not nationally representative, census data will be required to produce population-level estimates. Likewise, census data will be required for the disaggregation of indicators derived from data sources that lack the sample sizes or stratifying variables necessary to support disaggregated estimates. While targeted household surveys often provide more detail than population censuses, they rarely produce sample sizes large enough to support the multidimensional disaggregation suggested for SDG monitoring.  

The large samples distributed by IPUMS-International—typically 10 percent of all households—make it possible to study small subpopulations and subnational regions of countries. Additionally, most samples include individuals living in group quarters and institutions like prisons, dormitories, military housing, etc., providing data on populations often excluded from household surveys.  When empirical disaggregation is not available, census data can be used to model indicator estimates for population subgroups and subnational geographic.

Indicators based on 2010-round census data set a baseline for measuring progress towards achieving the SDGs. Data from 2020 and 2030-round censuses will measure progress against these baselines.

IPUMS allows for analysis of SDG-related data by migrant status

IPUMS individual-level microdata allow data users to make customized tabulations that disaggregate population-level statistics by migrant status. In contrast, published tabulations available in official census reports often describe the size of the migrant population, but they rarely distinguish migrants from other population groups when reporting demographic or socioeconomic statistics.

Most IPUMS census samples can be disaggregated by migrant status because most censuses ask questions about place of birth and/or citizenship. Three-quarters of samples disseminated by IPUMS-International distinguish the foreign-born from the native-born.  About half distinguish citizens from non-citizens. Efforts have been made to standardize international census-taking practices, but national governments still retain the autonomy to choose to include in the census the topics and questions they consider essential to their planning and monitoring needs. As a result, topical coverage and variable availability varies across census samples.

For IPUMS samples that include information on birth place, data users can easily produce indicators and statistics disaggregated by nativity status. For samples that include additional migration-related variables like year of immigration and citizenship status, indicators can be further stratified. Harmonized variables from a single source improve reliability of cross-national comparisons; customized datasets that pool multiple countries simplify data management and analysis.  

In the SDG context, IPUMS harmonized microdata allow us to examine whether and to what degree migrants may be left behind across SDG indicators. For example, Figure 1 visualizes results for SDG indicator 8.6.1, proportion of youth (age 15-24) not in education, employment, or training (NEET), for select South and Central American countries.  The graph includes data from 2000- and 2010-round censuses to demonstrate the utility of IPUMS microdata for monitoring progress across time. Disaggregation by nativity status reveals large and widening gaps between native-born and foreign-born in Costa Rica the Dominican Republic, and Ecuador.  The time series analysis identifies improvements in overall levels of NEET and disparities between native-born and foreign-born in Argentina, Brazil, Uruguay and Trinidad and Tobago. Census microdata support further disaggregation by sex, country of origin, and subnational region.


Figure 1: SDG 8.6.1 Proportion of youth (age 15-24) not in education, employment, or training (%), by nativity status

IPUMS census microdata have limits, but are directly relevant to SDGs

Certain limitations restrict the utility of IPUMS census microdata for SDG monitoring:

Accessing microdata may be challenging: For countries that do not participate in IPUMS-International, accessing public-use microdata samples is challenging. Eighty-five countries currently disseminate data through the project; some of these countries have not yet contributed 2010-round data.  

Census microdata are not as detailed as microdata from household surveys: Census questions cover a broad range of topics in limited detail. Household surveys are often a better source for the in-depth information necessary for qualitative SDG indicators.

Censuses are not timely: Censuses are conducted every 10 years and therefore provide information about a country’s population at the time the data were collected—not present time. In contrast, household surveys are conducted more frequently than censuses, providing timelier indicator estimates. Using household surveys in combination with census data improves the applicability and accuracy of both sources.

Despite these limitations, harmonized census microdata, like that disseminated by IPUMS International, represent an important resource for SDG monitoring and SDG-related research. Many indicators can be directly measured using census microdata, and other data sources can be used in combination with census microdata to produce disaggregated population-level indicator estimates. Few other resources allow data users to generate customized indicators, tabulations, and statistics for migrants for multiple countries across all world regions.   

The use of IPUMS census microdata for SDG Monitoring is resource efficient

To minimize resources spent on new data collection and maximize resources devoted to the implementation of policies and programs to achieve the goals and targets, SDG monitoring should make good use of available data.  Census-taking remains a primary function of national statistical offices in most countries. The 2020- and 2030-round censuses will provide data for ongoing assessment of SDG targets without requiring cumbersome investment in national statistical capacities.  Continued access to census microdata will be necessary for the measurement, disaggregation, validation, and further exploration of SDG indicators.

Disclaimer: The opinions expressed in this blog post are those of the author and do not necessarily reflect the policies or views of the United Nations or the International Organization for Migration (IOM). The designations employed and the presentation of material throughout the blog post do not imply the expression of any opinion whatsoever on the part of IOM concerning the legal status of any country, territory, city or area, or of its authorities, or concerning its frontiers and boundaries.