An often neglected but crucial component of engineering is to understand the broader social impacts of the technology being developed and to ensure that the technology enhances social equality by benefitting diverse populations. Human bias and stereotypes can be perpetuated, and even amplified, when researchers fail to consider how human preferences and assumptions may consciously or unconsciously be built into science or technology. Gender norms, ethnicity and other biological and social factors shape and are shaped by science and technology in a robust cultural feedback loop46. This section discusses examples from product design, artificial intelligence (AI) and social robotics to illustrate how sex and gender analysis can enhance excellence in engineering.

Designing safer products

When products are designed based on the male norm, there is a risk that women and people of smaller stature will be harmed. Motor vehicle safety systems provide one such example. Because male drivers have historically been overrepresented in traffic data, seatbelts and airbags have been designed and evaluated with a focus on the typical male occupant with respect to anthropometric size, injury tolerance and mechanical response of the affected body region. When national automotive crash data from the United States were analysed by sex between 1998 and 2008, data revealed that the odds for a belt-restrained female driver to sustain severe injuries were 47% higher than those for a belt-restrained male driver involved in a comparable crash, after controlling for weight and body mass47. The subsequent introduction of a virtual female car crash dummy allowed mathematical simulations to account for the effect of acceleration on sex-specific biomechanics, highlighting the need to add a medium-sized female dummy model to regulatory safety testing48,49. Beyond automotive safety systems, the importance of anthropometric characteristics, such as the carrying angle of the elbow or the shape and size of the human knee, can be used to guide sex-specific design for artificial joints, limb prostheses and occupational protective gear50,51.

Reducing gender bias in AI

Alarming examples of algorithmic bias are well documented52. When translating gender-neutral language related to science, technology, engineering and mathematics (STEM) fields, Google Translate defaults to male pronouns53. When photographs depict a man in the kitchen, automated image captioning algorithms systematically misidentify the individual as a woman54. As AI becomes increasingly ubiquitous in everyday lives, such bias, if uncorrected, can amplify social inequities. Understanding how gender operates within the context of the algorithm helps researchers to make conscious decisions about how their work functions in society.

Since the Second World War, medical research has been submitted to stringent review processes aimed at protecting participants from harm. AI, which has the potential to influence human life at scale, has yet to be so carefully examined. Numerous groups have articulated ‘principles’ for human-centred AI. These include, most importantly, the UN Human Rights Framework that consists of internationally agreed upon human rights laws and standards, as well as the ‘Asilomar AI Principles’, ‘AI at Google: Our Principles’, ‘Partnership on AI’, and so on. What we lack are mechanisms for technologists to put these principles into practice. Here we delve into a few of such rapidly developing mechanisms for AI.

A first challenge in algorithmic bias is to identify when it is appropriate for an algorithm to use gender information. In some settings, such as the assignment of job ads, it might be desirable for the algorithm to explicitly ignore the gender of an individual as well as features such as weight, which may correlate with gender but are not directly related to job performance. In other applications, such as image/voice recognition, it might be desirable to leverage gender characteristics to achieve the best accuracy possible across all subpopulations. To date, there is no unified definition of algorithmic fairness55,56,57, and the best approach is to understand the nuances of each application domain, make transparent how algorithmic decision-making is deployed and appreciate how bias can arise58.

Training data are a source of potential bias in algorithms. Certain subpopulations, such as darker-skinned women, are often underrepresented in the data used to train machine-learning algorithms, and efforts are underway to collect more data from such groups2. To highlight the issue of underrepresented subpopulations in machine-learning data, researchers have designed ‘nutrition labels’ to capture metadata about how the dataset was collected and annotated59,60,61. Useful metadata should summarize statistics on, for example, the sex, gender, ethnicity and geographical location of the participants in the dataset. In many machine-learning studies, the training labels are collected through crowdsourcing, and it is also useful to provide metadata about the demographics of crowd labellers.

Another approach to evaluate gender bias in algorithms is counterfactual analysis62. Consider Google Search, in which men are five times more likely than women to be offered ads for high-paying executive jobs63. The algorithm that decides which ad to show inputs features about the individual making the query and outputs a set of ads predicted to be relevant. The counterfactual would test the algorithm in silico by changing the gender of each individual in the data and then studying how predictions change. If simply changing an individual from ‘woman’ to ‘man’ systematically leads to higher paying job ads, then the predictor is—indeed—biased.

Work to debias word embeddings is another example of counterfactual analysis64. Word embeddings associate each English word with a vector of features so that the geometry between the feature vector captures semantic relations between the words. It is widely used in practice for applications such as sentiment analysis65, language translation66 and analysis of electronic health records67. It has previously been shown that gender stereotypes—for example, men are more likely to be computer scientists—are manifested in the feature vectors of the corresponding words64. Whether this association between man and computer is problematic depends on the application of the features. To test for gender effects, gender-neutral word features were created. For each downstream application, counterfactual analysis can then be performed by running the application twice, once using the original word features, and once using the gender-neutral features. If the outcome changes, the algorithm is sensitive to gender. In some applications, such as job searches, it might be preferable to use gender-neutral features.

An alternative approach to quantify and reduce gender bias in algorithms is called multi-accuracy auditing68,69. In standard machine learning, the objective is to maximize the overall accuracy for the entire population, as represented by the training data. In multi-accuracy, the goal is to ensure that the algorithm achieves good performance not only in the aggregate but also for specific subpopulations—for example, ‘elderly Asian man’ or ‘Native American woman’. The multi-accuracy auditor takes a complex machine-learning algorithm and systematically identifies whether the current algorithm makes more mistakes for any subpopulation. In a recent paper, the neural network used for facial recognition was audited and specific combinations of artificial neurons that responded to the images of darker-skinned women were identified that are responsible for the misclassifications70.

The auditor also suggests improvements when it identifies such biases71. Although achieving equal accuracy across all demographic groups may not always be feasible, these auditing techniques improve the transparency of the AI systems by quantifying how its performance varies across race, age, sex and intersections of these attributes.

These are only a few of the specific techniques computer scientists are developing to promote gender fairness in algorithms. Some, such as data checks, are relevant across all disciplines that amass and analyse big data. Others are specific to machine learning, which is now widely deployed across broad swathes of intellectual endeavours from the humanities to the social sciences, biomedicine and judicial systems. In all instances, it is important to be completely transparent where and for what purpose AI systems are used, and to characterize the behaviour of the system with respect to sex and gender72.

Combatting stereotypes

Analysing gender in software systems is one issue; configuring gender in hardware—such as social robots—is another, and the focus of this section. Until recently, robots were largely confined to factories. Most people never see or interact with these robots; they do not look, sound or behave like humans. But engineers are increasingly designing robots to assist humans as service robots in hospitals, elder care facilities, classrooms, homes, airports and hotels. The field of social human–robot interaction examines, among other things, when and how ‘gendering’ robots, virtual agents or chatbots might enhance usability while, at the same time, considering when and how to avoid oversimplifications that may reinforce potentially harmful gender stereotypes73.

Machines are, in principle, genderless. Gender, however, is a core social category in human impression formation that is readily applied to nonhuman entities74. Thus, users may consciously or unconsciously gender machines as a function of anthropomorphizing them, even when designers intend to create gender-neutral devices75,76,77,78.

Anthropomorphizing technologies may help users to engage more effectively with them, which poses the question as to whether there are benefits to tapping into the power of social stereotypes by building gender into virtual agents79,80,81,82,83, chatbots84 or social robots11,85,86. For example, if roboticists deploy female carebots in female-typical roles, such as nursing, would users better comply with the robot’s requests to take daily medication or to exercise? Does gendering robots or virtual agents facilitate interaction or boost objective outcomes such as performance11,80,81,82,83,84,85,86,87,88,89,90,91? Will personalizing robots or chatbots by gender increase consumer acceptance and, even, sales figures? Systematic empirical research is needed to address these open research issues.

What features lead humans to gender a robot? So far, experimental research designed to analyse robot gender has manipulated gender in a number of ways, including (1) by choosing a male or female name to label the robot87,88,89,90,91,92; (2) by colour-coding the robot93,94; (3) by manipulating visual indicators of gender (for example, face, hairstyle or lip colour94,95); (4) by adding a male or female voice, or low or high pitch to simulate this, respectively87,88,89,90,91,92,94,96,97; (5) by designing a gendered personality87,98; and (6) by deploying robots in gender-stereotypical domains, such as a male-voiced robot for security and a female-voiced robot in a healthcare role95. Other aspects, such as movements or gestures, that may potentially gender a robot still require empirical research85,86.

But there are dangers here. As soon as designers or users assign a gender to a machine, stereotypes follow. Designers of robots and AI do not simply create products that reflect our world, they also (perhaps unintentionally) reinforce and validate certain gender norms that are considered to be appropriate for men, women or gender nonconforming individuals11,73.

Eliciting gendered perceptions of technologies implies actively designing human gender biases, including binary constructions of gender as male or female, into machines. From a social psychological viewpoint, this can contribute to stereotypical gender norms in society95. Even though this might not seem relevant from an engineering point of view, social psychological research would suggest that a robot with a female appearance, for example, may perpetuate ideas of women as nurturing and communal, traits stereotypically associated with women95. Thus, a female robot may be deemed socially warm and particularly suitable for stereotypically female tasks, such as elderly care, or it might be openly sexualized and objectified as revealed in abusive commentary on video clips of female robots in recent qualitative research99. Similarly, virtual personal assistants with female names, voices and stereotypical, submissive behaviours, such as Siri or Alexa, represent heteronormative ideas about females and thereby indirectly contribute to the discrimination of women in society100,101. An interesting development in this regard is the genderless voice, Q, which has recently been developed in Denmark to overcome such bias102.

There are many questions regarding these features. How, for example, do user attributes, such as age or gender, interact with different robot design features? How do robots enhance or harm real-world attitudes and behaviours related to social equality? How does robot gender elicit different responses across cultures? More experimental, laboratory and longitudinal field research is needed to test whether, and how, a machine’s gendered, gender-diverse or gender-neutral appearance or behaviour influences human affect, cognition and behaviour. It is likely that even social robots designed to be genderless or gender neutral elicit gender attributions owing to the relatively automatic nature of anthropomorphizing humanoid robots. It is also likely that when potential end users are offered the option to select a digital assistant’s gender, their choice will be driven by their own gender identity and gender-related attitudes and stereotypes. Addressing these research questions and issues remains important to shed light on the psychological, social and ethical implications of implicit or explicit design choices for novel technologies.

Developing technologies that enhance, or at least do not harm, social equality will require novel configurations of researchers. Much attention has been paid to the need for interdisciplinary research, consisting of humanists, legal experts, technologists and social scientists, especially in the fields of human-centred AI. The historical development of universities, however, has artificially separated human knowledge into disciplines over the course of the nineteenth and twentieth centuries that may not support current research needs. Research institutions now need to develop robust mechanisms to bring together social analysis and engineering in a way that rigorously addresses the emerging needs of society103.