According to Microsoft, commercially-available software performs best on males with lighter skin and the worst on females with darker skin. The new software system the company has been testing was able to reduce error rates by nine times for all women and significantly improve accuracy across all demographics. Ultimately, said Microsoft, the problem is the training datasets; more data is needed that includes diverse skin tones as well as hairstyles and facial accessories like jewelry and glasses.

Microsoft's Face API team made three major changes to its recognition system. They expanded and revised the current datasets, started collecting even more data and focused the resulting models specifically on skin tone, gender and age. "We had conversations about different ways to detect bias and operationalize fairness, said senior researcher Hanna Wallach in a statement. "We talked about data collection efforts to diversify the training data. We talked about different strategies to internally test our systems before we deploy them."