One Dot Per Person for the Entire United States

Created by Dustin Cable, July 2013

Additional Resources

Frequently Asked Questions

Access and Use Policy – describes how this map should be cited and used.

Congressional Dot Map with election results.

High Resolution Image of the Racial Dot Map

Overview

The map was created by Dustin Cable, a former demographic researcher at the University of Virginia’s Weldon Cooper Center for Public Service. Brandon Martin-Anderson from the MIT Media Lab and Eric Fischer, creator of social media dot maps, deserve credit for the original inspiration for the project. This map builds on his work by adding the Census Bureau’s racial data, and by correcting for mapping errors.

The Dots

Each of the 308 million dots are smaller than a pixel on your computer screen at most zoom levels. Therefore, the “smudges” you see at the national and regional levels are actually aggregations of many individual dots. The dots themselves are only resolvable at the city and neighborhood zoom levels.

Each dot on the map is also color-coded by race and ethnicity. Whites are coded as blue; African-Americans, green; Asians, red; Hispanics, orange; and all other racial categories are coded as brown.

Shades of Purple, Teal, and Other Colors

Since dots are smaller than one pixel at most zoom levels, colors are assigned to a pixel depending on the number of colored dots within that pixel. For example, if a pixel contains a number of White (blue dots) and Asian (red dots) residents, the pixel will be colored a particular shade of purple according to the proportion of each within that pixel.

Different shades of purple, teal, and other colors can therefore be a measure of racial integration in a particular area. However, a place that may seem racially integrated at wider zoom levels may obscure racial segregation at the city or neighborhood level.

Take the Minneapolis-St. Paul metro area as an example:

While Minneapolis and St. Paul may appear purple and racially integrated when zoomed out at the state level, a closer look reveals a greater degree of racial segregation between different neighborhoods in both cities. While some areas remain relatively integrated, there are clear delineations between Asian, black, and white neighborhoods.

Lightly Populated Areas

Toggling between color-coded and non-color-coded map views in lightly populated areas provides more contrast to see differences in population density. Take North and South Dakota as illustrative examples:

In the black and white version, it is easier to see the smaller towns and low-density areas than in the color-coded version. Different monitor settings and configurations may make it harder or easier to see color variations in lightly populated areas, but the non-color-coded map should always show differences in population density fairly well.

Dots Located in Parks, Cemeteries, and Lakes

The locations of the dots do not represent actual addresses. The most detailed geographic identifier in Census Bureau data is the census block. Individual dots are randomly located within a particular census block to match aggregate population totals for that block. As a result, dots in some census blocks may be located in the middle of parks, cemeteries, lakes, or other clearly non-residential areas within that census block. No greater geographic resolution for the 2010 Census data is publicly available (and for good reason).

A more accurate portrayal of the geographic distribution of residents is possible if data is available on the location of parks, buildings, and/or physical addresses. Individual dots could therefore be conditionally placed based on this data.

The Data

All of the data displayed on the map are from the U.S. Census Bureau 2010 Summary File 1 (SF1) dataset made publicly available through the National Historical Geographic Information System from the U.S. Census Bureau. Table P5, “Hispanic or Latino Origin by Race,” was merged with block-level state shapefiles from the National Historical Geographic Information System. The data is based on the “census block,” the smallest area of geography for which data is collected (roughly equivalent to a city block in an urban area). Five racial categories were created based on the data in table P5: non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, Hispanic or Latino, and a category for all other racial categories including the multiracial identifications. The sum of all five categories equals the total population.

Methodology

Python was used to read the 50 state and District of Columbia shapefiles (with the merged SF1 data). The GDAL and Shapely libraries were used to read the data and create the point objects. The code retrieves the population data for each census block, creates the appropriate number of geographic points randomly distributed within each census block, and outputs the point information to a database file. The resulting file has x-y coordinates for each point, a quadkey reference to the Google Maps tile system, and a categorical variable for race. The final database file has 308,745,538 observations and is about 21 GB in size. The processing time was about five hours for the entire nation.

The database file was then sorted by quadkey and converted to a .csv format. SAS was able to do this within an hour without crashing.

Processing 2.0.1 for 64-bit Windows was used to create the map tiles. The Java code reads each point from the .csv file and plots a dot on a 512×512 .png map tile using the quadkey reference and x-y coordinates. The racial categorical variable is used to color-code each plotted dot. This process used the default JAVA2D renderer, but other platforms may work better using P2D. Map tiles were created for Google Maps’ zoom levels 4 through 13 to make the final map. A non-color-coded map was also produced to help add more contrast for lightly populated areas. In total, the color-coded and non-color-coded maps contain 1.2 million .png files totaling about 7 GB. Producing all of the map tiles in Processing took about 16 hours for the two maps.

The Google Maps API is used to display the map tiles. Map tiles with zero population are never created using the above method. Therefore, an index was used to tell the map application whether a tile exists in order to prevent 404 errors.

The entire code is up on GitHub and was adapted from code developed by Brandon Martin-Anderson and Peter Richardson in order to account for the racial coding and errors in reading the shapefiles.