We are often told that Science has proven that “there is hardly any connection between genes and race.” But in reality, 21st Century genomics has shown that most Americans’ self-identifications of their own race (what the Census uses) are fairly reasonable.

iSteve commenter Genome Voyager writes:

Race and ethnicity are overwhelmingly correlated with genetic ancestry in the United States. Recent, large-scale studies of ~11,000 cancer patients and ~202,000 military veterans found that individuals’ self-identified race and ethnicity showed 95.6% (cancer) and 99.5% (veterans) correspondence to genetic ancestry clusters. Yuan et al. (2018) Cancer Cell. 34: 549–560 https://www.sciencedirect.com/science/article/pii/S1535610818303799 Fang et al. (2019) Am J Hum Genet. 105:763-772

https://www.cell.com/ajhg/fulltext/S0002-9297(19)30338-6

You can read the full PDF of Fang’s 2019 paper here.

Among nearly 202,000 individuals with SIRE [Self-Identifying Race/Ethnicity], 1,079 (0.53%) had GIA [Genetically Inferred Ancestry] strongly indicating a different racial/ethnic group.

So, 0.53% of these U.S. military veterans’ self-identified race/ethnicity (using four “continental-scale” race/ethnicities used by the US Census: white, black, Asian, and Hispanic) appeared to be extremely wrong from genome scans. The could be typos or adoptees or cuckoos eggs or whatever.

A somewhat larger percentage of self-identifications would be arguable from the genetic data but not ridiculous like the 0.53%.

For example, Barack Obama self-identified on the 2010 Census as only Black. Genetically, it would have been more accurate for him to self-identify as both Black and White, but it’s not ridiculous for him to put down only Black.

Is 0.53% a big number or a small number?