Conducting thought experiments from the armchair has long been an accepted method in analytic philosophy. What do thought experiments from the armchair look like? Philosophers think about real and imagined scenarios involving knowledge, morality, free will and other matters. They then use those scenarios to elicit their own reactions (‘intuitions’), which serve as fodder for arguments.

One well-known kind of thought experiment is called a ‘Gettier case’. Named after the American philosopher Edmund Gettier, these are scenarios used to question a particular notion of knowledge, and are based on a range of examples he provided in a journal article in 1963. One might plausibly think of knowledge as a belief that is both true and justified (ie, based on evidence). But Gettier suggested some counterexamples to this definition, by telling stories in each of which there’s a true, justified belief that he claimed isn’t a case of knowledge. For example, imagine that at noon you look at a stopped clock that happens to have stopped at noon. Your belief that it’s noon is true, and arguably it’s also justified. The question is: do you thereby know that it’s noon, or do you merely believe it? While this example and others might seem frivolous – other cases involve fake zebras and imitation barns – they are intended to make headway in analysing knowledge.

In the late 1990s, the philosophers Jonathan Weinberg, Shaun Nichols and Stephen Stich started raising questions about this methodology of eliciting ‘our’ intuitions. Their question: who does ‘we’ refer to here? They wondered if philosophers – at least in analytic philosophy, a particularly Western, educated, industrialised, rich and democratic bunch, aka ‘WEIRD’ – might have intuitions that people in other demographic groups wouldn’t share. (They’d been inspired to raise this question by work on cross-cultural differences by psychologists such as Richard Nisbett and Jonathan Haidt.)

Rather than just speculate about whether differences existed, Stich and colleagues decided to do some real-world experiments. In their initial research, they focused on common thought-experiments in epistemology (a subfield of philosophy that studies topics such as justified belief and knowledge). They recruited people of East Asian and Western descent, as well as people of Indian-subcontinent descent, and asked them to read and think about some classic vignettes in epistemology. In their 2001 paper, they claimed that among their most interesting findings was something unexpected: while most people of Western descent in their experiment deemed that particular Gettier cases are not instances of knowledge, people of East Asian and of Indian descent often thought the opposite.

Stich and colleagues argued that this kind of variation in intuitions should cause a big shift in the way that analytic philosophy is practised. Until this point, most philosophers had traditionally thought it was fine to sit in their armchairs and consider their own intuitions. It was the way that philosophy was done. But experimental evidence, they claimed, undermined this traditional practice. If such differences of intuition existed, they wrote: ‘Why should we privilege our intuitions rather than the intuitions of some other group?’ If different groups had different intuitions, it wasn’t enough to say that ‘our’ intuition about justice or knowledge or free will is such-and-such. Rather, the philosopher must at the very least specify whose intuition is relevant, and why that intuition should matter rather than another one.

Over subsequent decades, experimental philosophy (x-phi for short) grew significantly. Some philosophers followed Stich et al’s lead, in testing intuitions of participants who varied in gender, age, native language and other categories. They also looked at variation in intuitions based on irrelevant factors such as the order in which cases are presented. Beyond that, some x-phi practitioners also found significant sources of funding. Stich and his fellow philosopher Edouard Machery, together with the anthropologist H Clark Barrett, received a grant of more than $2.5 million from the John Templeton Foundation to embark on a series of experiments on knowledge, understanding and wisdom across 10 countries, with a goal of better understanding these philosophical concepts as they appear across a large swathe of cultures.

It’s important to note that arguably the biggest factor in x-phi’s growth has been the result of some philosophers heading off into a new direction. According to a recent survey of the field conducted by Joshua Knobe, not too many philosophers kept up experiments on demographic differences with the aim of showing that traditional philosophy is ill-grounded (this came to be known as ‘the negative programme’ in x-phi). Instead, another class of experiments (with a ‘positive programme’) sprang up.

Knobe, a professor of psychology, philosophy and linguistics at Yale University and well-known in the field for his experimental work, which he’s been doing since the early 2000s, describes one kind of ‘positive programme’ as very similar to cognitive science. Conducting experiments uncovers interesting effects, and researchers then hypothesise about mechanisms that might explain these effects. A well-known example of this kind of work is Knobe’s own finding, called the ‘side-effect effect’ or just the ‘Knobe effect’. In a nutshell, this is the finding that people judge a side-effect to be intentionally caused much more often when that side-effect is negative than when it’s positive.

For example, in Knobe’s original experiment, participants were given this vignette:

The vice-president of a company went to the chairman of the board and said: ‘We are thinking of starting a new programme. It will help us increase profits, but it will also harm the environment.’ The chairman of the board answered: ‘I don’t care at all about harming the environment. I just want to make as much profit as I can. Let’s start the new programme.’ They started the new programme. Sure enough, the environment was harmed.

Other study participants saw the exact same story, except that the word ‘harmed’ was replaced with the word ‘helped’. The striking result was that, in most cases (82 per cent), participants said that the chairman brought about the harmful side-effect intentionally, but only 33 per cent of participants said that he intentionally brought about the helpful side-effect.

Since then, many philosophers have conducted hundreds of these kinds of experiments. Some of them involve repeating and extending the Knobe effect, and many others venture into new directions to run experiments involving questions about moral responsibility, free will, causation, personal identity and other topics. In their ‘Experimental Philosophy Manifesto’ (2007), Knobe and Nichols described the allure of experimental philosophy’s positive programme by writing: ‘Many find it an exciting new way to approach the basic philosophical concerns that attracted them to philosophy in the first place.’ But while x-phi has expanded over the years, not everyone in philosophy has been a fan.

Analysing concepts from the armchair is a poor method, because of the evidence of demographic variation

First, as the positive programme in x-phi shades into psychology and vice versa, some have asked: is experimental philosophy really philosophy? Knobe and some of his colleagues argue that it is. They describe the work as continuous with a long tradition of philosophers trying to understand the human mind, and point to the likes of Aristotle, David Hume and Friedrich Nietzsche as precedents. In their manifesto, Knobe and Nichols write:

It used to be a commonplace that the discipline of philosophy was deeply concerned with questions about the human condition. Philosophers thought about human beings and how their minds worked … On this traditional conception, it wasn’t particularly important to keep philosophy clearly distinct from psychology, history or political science … The new movement of experimental philosophy seeks to return to this traditional vision.

Some philosophers, even those who identify as part of the x-phi movement, disagree with this viewpoint. Machery, a fellow x-phi advocate, argues that even if, historically, philosophers used to engage in a huge range of intellectual endeavours, it doesn’t mean that studying all those things should now count as philosophy. There’s something lost, Machery thinks, if experimental philosophers start to resemble cognitive scientists more and more, and lose their focus on what has been of central interest in philosophy: analysing concepts. (According to Knobe’s recent analysis, only around 10 per cent of x-phi experiments over a period of five years were directly about conceptual analysis, as opposed to revealing new cognitive effects and discussing potential cognitive processes underlying them.) Machery concurs with Stich and other ‘negative programmers’ that trying to analyse concepts from the armchair is a poor method, because of the experimental evidence that judgments vary by demographic group. Instead, he argues in his book Philosophy Within Its Proper Bounds (2017), philosophers should make use of experiments as a way of clarifying and assessing important philosophical ideas.

A second kind of response comes from those who question the usefulness of eliciting intuitions from people outside of philosophy. For example, in his book Relativism and the Foundations of Philosophy (2009), Stephen Hales writes: ‘[I]ntuitions of professional philosophers are much more reliable than either those of inexperienced students or the “folk”.’ This response, dubbed the ‘expertise defence’, is generally made in response to the ‘negative programme’ in x-phi. The philosopher Jennifer Nado characterises the defence as insisting that experimental philosophy’s reliance on the conflicting intuitions of non-philosophers is ‘fundamentally misguided’, since ‘the intuitions of such persons are irrelevant’. There’s often an analogy drawn to other fields: we wouldn’t take the conflicting opinions of non-experts as a challenge to most scientific and mathematical claims. On the other hand, some philosophers – responding to the expertise defence – have questioned that analogy by asking what ‘philosophical expertise’ amounts to, and how we can tell that professional philosophers have it. (In some cases, philosophers have even run experiments on fellow philosophers, claiming that they are susceptible to various kinds of bias in their intuitions.)

A third response to the negative programme in x-phi has been to look more closely at ‘intuitions’ themselves. The British philosopher Timothy Williamson argues that those who attack traditional philosophy should define exactly what they mean by ‘intuition’. If an ‘intuition’ is just ‘how things seem to us’, he argues, then the critique of intuition leads to ‘global skepticism’, the position that we should withhold all kinds of judgments until they are proven to be widely shared. (This extreme conclusion is one that, Williamson takes it, x-phi practitioners would prefer to avoid.) And taking another tack, some philosophers, such as Herman Cappelen in his book Philosophy Without Intuitions (2012), claim that traditional philosophy doesn’t actually rely on intuitions at all (even though many traditional philosophers think that it does).

A final response to x-phi aims not just at the ‘negative programmers’ but at all philosophers conducting experiments. In a particularly pointed version of the critique, Williamson referred to x-phi practitioners as ‘philosophy-hating philosophers’. There’s nothing wrong with philosophers drawing heavily on work from empirical disciplines, he wrote in The New York Times in 2010, as certain subfields of philosophy often have (for example, philosophy of mind, and philosophy of space and time). In that very broad sense, experimental philosophy is both common and valuable. But, he argued, philosophers shouldn’t try to be ‘amateur experimentalists’, undertaking their own experiments and turning the field into ‘imitation psychology’.

There are a couple of strands to this objection. One is that philosophers should leave the actual running of empirical studies to experts in other fields. As Williamson puts it, philosophers have honed their skills in ‘logic, in imagining new possibilities and questions, in organising systematic abstract theories, making distinctions and the like’, and should stick to their forte. Of course, as experimental philosophers such as Knobe point out, philosophers often do collaborate with colleagues from cognitive science and psychology. Experimental philosophers are extremely well-connected with psychologists, he said in conversation, explaining that he and other philosophers have often co-authored papers with those in more traditionally experimental fields (presumably, benefiting from their skill sets along the way).

This replication study also failed to find evidence for the cross-cultural differences originally claimed

But there’s also another, related critique of experimental philosophy, which questions the reliability of many experiments, not just those conducted by philosophers. As experimental philosophy was forming over the past decade or two, the ‘reproducibility crisis’ was emerging: a deep skepticism about the reliability of empirical research based on difficulties of replicating results. Researchers have pointed to issues such as insufficient sample size, publication bias (the tendency to publish studies showing positive results while null results are buried in the file drawer), and ‘p-hacking’ (reporting the analyses of data that achieve a sufficiently low p-value), and have suggested that these issues are common in some (or maybe all) fields. The result is that many research conclusions might just be a matter of chance. For a classic overview of this kind of skepticism, see John Ioannidis’s paper ‘Why Most Research Findings Are False’ (2005).

Knobe says that experimental philosophers have been well aware of concerns about the reliability of research over the past decade. He and his colleague Christian Mott set up an ‘x-phi replication’ site, tracking attempted replications of well-known x-phi studies. In some instances, there were many ‘replication failures’ piling up when others tried to replicate original studies by imitating them as closely as possible. Strikingly, the original experiments on Gettier intuitions were among those that couldn’t be replicated. A number of replications (posted on the x-phi replication site) failed to find evidence of cultural variation in Gettier intuitions.

Stich, one of the original authors of the 2001 study, decided to collaborate with colleagues in order to conduct a much larger replication of his original study, with more than 500 participants from four countries (more than 10 times the size of the original study). This replication study also failed to find evidence for the cross-cultural differences originally claimed. The authors conceded that this study diminished the evidence for cross-cultural variation of Gettier intuitions. They also, however, emphasised that this didn’t mean that the ‘negative programme’ was unfounded, and pointed to other, more consistent results showing demographic variation, such as variation in semantic intuitions.

In 2017, a group of philosophers decided to take on replication more systematically. Inspired by the reproducibility project in psychology, philosophers from eight countries conducted an x-phi replication project to repeat 40 experiments, sticking as closely as possible to the original experiment. Of the 40 studies, they found that around 70 per cent were successfully replicated (the exact percentage of success varied depending on how success was defined). By comparison, fewer than half of the experiments in the reproducibility project in psychology were successfully replicated. In their paper, x-phi replication authors explore possible explanations for their higher rate of replication, without settling on any as definitive. Whatever the reason for the higher level of successful replications, they conclude that their findings are ‘good news for experimental philosophy … calling into question the idea of x-phi being mere “amateurish” psychology’.

Where does all of this leave x-phi now? The x-phi replication-project researchers urge colleagues not to see the results as reason to ‘rest on their laurels’. Many x-phi practitioners have grown sensitive to the possibility that some study results might not hold up to repeat examination, and often phrase their claims cautiously, mentioning the need for further testing. The field is also moving to adopt better research practices, such as ensuring adequately powered studies, pre-registering study design, and data-sharing.

There continues to be a demarcation in the field between the ‘negative’ and ‘positive’ approaches, though some philosophers have also carved out something of a middle ground. Kaija Mortensen and Jennifer Nagel, for example, make a case for ‘armchair-friendly experimental philosophy’, describing ways that experiments can help armchair philosophy, without replacing it entirely or changing its original aims. Experiments might, for instance, help to show where philosophers’ intuitions are likely to be distorted by irrelevant factors. Similarly, Weinberg argues that x-phi can give us a better understanding of where bias might arise in our ‘human philosophical instrument’. Rather than seeing x-phi as a subfield of philosophy, he argues, experiments should be part of the wider discipline, a source of evidence that all philosophers can draw on while still retaining their original goals.

Regardless of one’s stance on experimental philosophy, it’s clear that the new method has fuelled an important conversation. Not only has it led to much questioning of reliance on intuitions in philosophy (and what ‘intuition’ means), but it has also brought out explicit discussion about what philosophers aim to achieve. Thus, x-phi fans the flames of ‘metaphilosophy’, in which philosophers scrutinise the underpinnings of their own work. Whether or not one sits in an armchair while doing philosophy, it seems like a good idea not to get too comfortable.