Abstract:

In this talk, we discuss a learning module about missing data using the United States Census. The Census is a massive data collection project conducted every ten years to obtain a snapshot of the people who live in the country. There are groups of people, however, who are regularly undercounted and thus underrepresented in the data. Because data from the Census is used for important functions such as apportioning seats in the U.S. House of Representatives, it is important to understand the limitations of the data and the potential societal implications. Two models, the Demographic Analysis (DA) and the Dual-Systems Estimates (DSE), have been developed to measure undercounts in the Census. We will discuss a lesson for an undergraduate regression analysis course where students examine the effectiveness of these models and develop their own using publicly available data. We describe the learning outcomes from this module and how they connect to the data analysis cycle presented in R for Data Science. We conclude with the potential challenges and strategies for implementing this lesson in a course.