Andrey Kamenov, Ph.D. Probability and Statistics

According to the 2010 Census technical documentation, there are more than 6 million different surnames in the U.S. — quite the diversity! Of course, the number of popular surnames is much smaller. Two out of three names appear only once. And only 0.4 percent (or one in 250) appear at least a thousand times.

We need a more effective process to gauge the actual surname diversity. There is a suitable statistical concept for this: entropy. An explanation of the technical details of this process is beyond the scope of this article, so I will refer you to the Wikipedia article on the subject.

So, now that we have a way to measure the diversity, let’s see how it has changed throughout the years.

Diversity: entropy of surnames in the U.S.

As you can see, the entropy of last names in the U.S. had been growing until around 1960. Since then, the entropy has remained fairly constant at around 10.

What does this number mean? For any number of surnames, the entropy would be the highest if all surnames were equally popular. In that case, the number of different names required to reach the entropy of 10 is around 22,000. This is also sometimes called the effective number (of last names, in our case).

Wondering which state has the highest diversity? Here’s the map.

Effective number of surnames by state

Of course, the numbers here significantly correlate with the states' populations. A larger population means more unique last names, which of course increases diversity. Even so, there are still some notable features on the map:

The diversity in Texas is surprisingly low (its effective number is 6,700).

The effective number is quite high for New Jersey (34,000).

New York has one of the highest effective numbers (39,000), while California is somewhat lower — near the nationwide value.

Source(s):

Discuss this article on our forum with over 1,900,000 registered members.