Bias is a big problem in facial recognition, with studies showing that commercial systems are more accurate if you’re white and male. Part of the reason for this is a lack of diversity in the training data, with people of color appearing less frequently than their peers. IBM is one of the companies trying to combat this problem, and today announced two new public datasets that anyone can use to train facial recognition systems — one of which has been curated specifically to help remove bias.

The first dataset contains 1 million images and will help train systems that can spot specific attributes, like hair color, eye color, and facial hair. Each face is annotated with these characteristics, making it easier for programmers to hone their systems to better distinguish between, say, a goatee and a soul patch. It’s not the largest public dataset for training facial recognition systems, but IBM says it’s the biggest to include such tags.

The second dataset is the more interesting one. It’s smaller than the first, containing 36,000 pictures, but the faces within are an equal mix ethnicities, genders, and ages. In the same way that the facial attribute tags help train AI systems to recognize these differences, having a diverse mix of faces should help systems overcome various biases. Both datasets were drawn from pictures posted to Flickr with Creative Commons licenses, which often allow them to be used for research purposes.

“it’s not just about building our own capabilities but the community’s as well.”

Ruchir Puri, chief architect of IBM Watson, told The Verge that he was not aware of any other public dataset with a similar focus on diversity. “This dataset [...] should really help designers to tune their algorithms,” said Puri. “Data is the foundation of AI, and it’s not just about building our own capabilities but the community’s as well.”

IBM’s commercial facial recognition systems have been criticized in the past for displaying the very biases this dataset is intended to combat. A study from MIT Media Lab published in February found that IBM’s error rate in identifying the gender of darker-skinned women was nearly 35 percent, while white men were misgendered only 1 percent of the time. Such mistakes will become increasingly important as facial recognition systems are used for tasks from hiring to the identification of criminal suspects.

IBM says it was working to improve these errors at the time, and an updated version of its facial recognition system used broader training sets (like the one announced today) to reduce errors by nearly ten-fold in tests “similar” to those conducted by MIT’s scientists. However, it can still be difficult to judge exactly how much these systems have improved without standardized metrics for evaluating bias.

Puri told The Verge that IBM was interested in helping establish such tests, and said that this September the company would be holding a workshop with the academic community to work on better benchmarks. “There should be matrixes through which many of these systems should be judged,” said Puri. “But that judging should be done by the community, and not by any particular player.”