The power of demographic data lies in its potential to better government and society by serving as the basis for important economic decisions. In the same vein, machine learning models trained using demographic data can aid policymakers in identifying trends and preparing for issues related to population growth, aging and migration.

Looking for open source demographic data for machine learning? We at Lionbridge AI have prepared a list of the best public sources of demographic data. Check out the full list below:

Demographic Datasets for Machine Learning

American FactFinder: The Census Bureau’s web-based, self-service tool to search a variety of population, economic, geographic and housing information.

U.S. Healthcare Data: Data about population health, diseases, drugs, health plans and more collected from the FDA drug database, USDA Food composition database and more.

New York City Census Data: Population, racial/ethnic demographic information, employment and commuting characteristics for New York City neighborhoods.

DataFerrett: A wide variety of population, health, economic, geographic and housing information about the United States to individuals, businesses, governments, and organizations.

US Public Assistance for Women and Children: Public assistance in the United States with initial coverage of the WIC Program. Files may include participation data and spending for state WIC programs, and poverty data for each state from 2012–2016.

Silicon Valley Diversity Data: The demographics for 23 Silicon Valley tech companies, including factors like race, gender and salary.

World Gender Statistics: A database for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.

Demographic Trends (1970-2010) for Coastal Geographies: Data derived from Census Block Group Data for 13 different coastal geographies.

National Student Loan Data System (NSLDS): A centralized, integrated view of loans and grants during their complete life cycle, from aid approval through disbursement, repayment, deferment, delinquency, and closure.

ZIP Code Data: This study provides detailed tabulations of individual income tax return data at the state and ZIP code level.

Nutrition, Physical Activity, and Obesity – Women, Infant, and Child: Data on weight status for children aged 3 months to 4 years old from Women, Infant, and Children Participant and Program Characteristics (WIC-PC).

The Demographic /r/ForeverAlone Dataset: Demographic data collected from a survey of subscribers of the subreddit /r/ForeverAlone

In case you missed our previous dataset articles, you can find them all here. Still can’t find the custom data you need to train your model? Lionbridge AI provides custom AI training data in over 300 languages for your specific machine learning project needs.