A new dissertation from University of Pretoria Information Technology master student John Leuner has revisited the thorny question of whether machine learning methods can effectively detect sexual orientation.

AI-empowered “gaydar” systems are one of the more controversial applications of machine learning. Previous research from Stanford University (Wang and Kosinski, 2017) was criticized in the AI community and beyond for perceived algorithm bias and the technique’s potential negative impact on individuals based on their sexual orientation.

One of the main criticisms of the Stanford study was that its dataset comprised mostly white faces. Google Software Architect Blaise Agüera y Arcas also questioned whether the algorithm might be affected by grooming patterns and presentation, etc. instead of facial traits.

Leuner first reproduced the Stanford study’s deep neural network (DNN) and facial morphology (FM) models on a new dataset and verified their efficacy (DNN accuracy male 68 percent, female 77 percent, FM male 62 percent, female 72 percent). He then created a new model based on highly blurred facial images to examine whether features such as brightness or predominant colours could contribute to sexual orientation prediction.

To diminish variables other than facial features, Leuner built a new dataset from dating websites and extracted features from facial images using VGGFace (a pre-trained deep learning neural network), classifying these features with a logistic regression model. For the FM model Leuner used Face++ to extract facial “landmarks”, and trained a logistic regression model with distances derived from these landmarks.

Leuner also examined whether skin colour itself was predictive of sexual orientation. He created a new ML model — a highly blurred image classifier — trained on input images that displayed no information on shape or size of facial features. Two types of blurred images were created for the regression model: 5×5 pixel images containing 25 colours, and 1 pixel images containing one colour.

Although the results suggest this classifier was capable of predicting sexual orientation at a rate of 63 percent for males and 72 percent for females, significant differences in image brightness could be a distractor.

Leuner’s model has thus far received a more positive response on social media sites such as Reddit, as its facial image training dataset draws from a wider variation of racial identities and nationalities.

Questions remain however on whether the prediction of sexual orientation is affected by biological features or image presentation; and whether “gaydar” classifiers in general could be misused.

In an opinion piece in The Conversation, Keele University Law School’s Alex Sharpe and Senthorun Raj argued that “predicting someone’s sexuality may sound innocuous, but in places that criminalise or police homosexuality and gender non-conformity, the consequences of prediction can be life threatening.”

The paper A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images is on Arxiv.