Kate McCurdy is a Computational Linguistics Engineer at Babbel, makers of the world’s first language learning app. This month, Kate gave a live session to our EMBA class, addressing social bias in machine-learning and UX. We spoke to her about the work she’s doing to minimise the impact of machine-learned bias on diversity and discrimination.

There’s a mathematical overlap between language and computation. Which came first for you – a passion for language or a passion for technology?

My original interest was probably language. I did a couple of degrees in linguistics which is a very broad-ranging subject. It overlaps the society-oriented ways that people use language in community and art, as well as the theoretical and mathematical structure of language. The mathematical aspect drives a lot of the analysis into structure, meaning, syntax and semantics. Over the last century, this area has had a huge overlap with models of computation. Programming has a structural basis in human language operation. I started out studying language formally, but I always had a side-interest in technology and eventually, when I started to see the growth of widely available language data, I saw the possibility of bringing these two interests together.

How did you transfer these interests into your career and where does your role fit into the projects at Babbel?

Babbel make language-learning software across different platforms. They have a team of programmers as well as a team of didactics (dedicated language specialists who work on building the most effective learning material to help people learn a communicative approach to language). My team, the computational linguists, works on applying language technology to these tasks, in many ways spanning the gap between the programmers and the didactics experts. We work on building the coding technology for developing these tools and systems that can be used in different automated contexts, while taking a lead from the didactics team on the most efficient human-use of these tools.

In your session with the EMBA class, ‘Artificial Intelligence in Social Life’, you spoke about the issue of gender bias in computation. Give us an introduction to social bias and how it’s built into machine-learning?

It’s not so much a case of social bias being built into algorithms, and more a question of looking at the data that we’re using to train them, which can reflect and perpetuate imbalances in society. ‘Word embeddings’ and ‘vector space models’ are families of algorithms that have emerged in the past few years and are quite commonly used now. But even these are based on older ideas about languages, by which you infer the fuller meaning of a word by observing the words around it. For instance, the frequency at which the word ‘she’ coexists with the word for a particular occupation is something that will eventually show up in our data, and this can lead to misrepresented association simply based on text alone. This can become outdated because the meaning of words is not only determined by language, but also by culture.

Tell us about how these social biases are learned…

There’s been an increasing amount of attention drawn to this problem by researchers, particularly in the past year. There are a variety of possible blind-spots that appear because of algorithms learning by observing the co-occurrence of words. One of them is this social bias, where algorithms are learning relationships between words that we now consider outdated or based on stereotypes. But algorithms can also pick up patterns based on ‘noise’. This is the second blind-spot that I’ve been looking into and it can work differently in different languages. In English, we don’t have gender in relation to articles. But if you take languages such as German, French, or Spanish, nouns are associated with being either masculine or feminine – not in a literal way, of course, but in a grammatical sense. But algorithms don’t know the difference between semantic gender and the meaningful property of gender as we experience it in society, and can therefore end up learning that certain words should be associated with certain gender roles. In different languages, there are varying semantic genders associated with different articles. For example, the word ‘table’ is masculine in German, but feminine in Spanish. If you train a model ‘out of the box’, without accounting for this, it will learn that a table is somehow more ‘manly’ in German, and will associate it with socially masculine words such as ‘father’ and ‘brother’ and so on. In Spanish, it would learn the opposite. Humans know the difference, but these models might not.

What can social and data technology services do to address these issues?

Once you know about the issue, you can take different steps to mitigate it. In my research paper, we wanted to raise awareness by showing that you could take care of the issue by teaching the algorithm to ignore the anomalies. For example, you can teach a program to replace the French ‘le’ or ‘la’ with a neutral equivalent, like the English ‘the’, to cancel out the gender information. This method proves the possibility of spotting and addressing these issues, but to what extent we want to cancel out that information is not the same in every scenario and comes down to the purpose of each application.

What are the challenges in terms of regulating something this vast?

This will be a big question that society has to face as we continue to depend on this kind of technology. Automated decision-making plays such a new role in social interaction and the small biases that these programs learn can be so subtle, making them difficult to pick up on without in-depth examination. But the wider challenge is predicting the long-term harm. This is the first time in our lifetime that we can say that your access to a particular job or institution might be effected by subtle biases built into complex algorithms. If we were better able to envision that long term harm, then it would be easier for us to audit this behaviour. The subtlety and the fact that it’s operating on such a large scale and across such diverse contexts really presents a lot of challenges.

Is there also an effect on other intersections of society beyond gender?

The reason I chose gender was in part because of the obvious semantic gender issue. It makes it a much more quantifiable phenomenon. But other kinds of social inequalities are reflected here too. There has already been a lot of research showing that assumptions about gender, age, race, and orientation etc. are replicated by artificial intelligence. Cathy O’Neil’s Weapons of Math Destruction talks about how intelligent systems can learn to discriminate. In one case study from the 1980’s, a program was developed to pre-screen résumés before being examined by humans. By 1988, the British High Court ruled that the model was discriminatory, finding that it had learned to replicate early human biases and was systematically discriminating against names that sounded either female or foreign, without reference to the quality of the rest of their CV. Already, in the 80’s, we have an example of how this can play out in the real world, and researchers are still finding that this same array of biases are showing up in some of the most widely used AI’s. It’s a challenging issue, but we know that it can cause harm.

What would you like to see being done by companies and institutions to address this?

In many ways, the engineers designing these programs are the best equipped to draw attention to these issues and do something about it. The issue could be helped by having people from a more diverse background working on this technology, as they are more likely to be more in tune to its effects. I also think we need to move towards a better institutional approach. There should be more institutions dedicated to researching these issues and putting requirements into place so that our data can be audited, drawing on people from a legal discipline and other interdisciplinary groups. There are so many possible blind-spots and we can’t put the onus simply on engineers from big companies. The solutions to this will require some technical aspect, but at the end of the day, they will mostly be political. I hope that we get better at this and that we don’t use automated decision-making to exacerbate the inequalities that we already face in society.