Have you heard the story of the man who was killed by the definite article? That may sound like the beginning of a linguistics joke, but sad to say, it actually happened.

“Sticks and stones may break my bones, but words can never hurt me,” as the old adage goes. When battling bullies, we may take heart in this childhood rhyme, but the truth is, language can be more dangerous than we often assume—it can kill.

In later life, we grow more familiar with another, darker side of language, some flavor of “anything you say will be taken down and may be used in evidence against you.” There’s a reason there’s a right to silence in many jurisdictions, given how loosely the law, often oblivious to the pitfalls of forensic linguistics, can interpret language as evidence.

One cold January morning in 1953, Derek Bentley, a nineteen-year-old, barely literate youth in the wrong place with the wrong words, was hanged for a murder he did not commit. During an attempted burglary a couple of months earlier, a policeman had been fatally shot by Bentley’s sixteen-year-old friend Christopher Craig, a minor who legally could not be sentenced to death. Derek Bentley, who never held the gun, was held to blame for the crime.

Bentley was said to have called out “Let him have it, Chris!” Did these words mean, literally, “give him the gun” or, indirectly, “shoot him”?

The evidence hinged upon an ambiguous sentence. No one could agree on what it meant. When the police demanded Craig hand over the gun, Bentley was said to have called out “Let him have it, Chris!” after which Craig shot and killed a policeman. Did these words mean, literally, “give him the gun” or, indirectly, “shoot him”? Was Bentley a party to murder, as the prosecution would have it, by inciting Craig to kill?

Indirect speech can have a profoundly direct effect on listeners. Most recently, former FBI director James Comey stated he took Trump’s statement “I hope you can see your way clear to letting this go” as an obvious directive. In his testimony, Comey referenced one of history’s most famous examples of a fatally effective indirect speech act, in which Henry II was said to have exasperatedly uttered “will no one rid me of this turbulent priest?” Whereupon, four of his knights set out to assassinate the troublesome Archbishop of Canterbury, Thomas Beckett.

So how could we possibly know what Bentley, who was described as “borderline feeble-minded,” meant to say? In Malcolm Coulthard’s forensic linguistic analysis of the case, two preconditions for Bentley’s guilt were laid out: it had to be proven that Bentley already knew Craig had a gun, and that he had instigated Craig to use it. According to Coulthard, the evidence that finally convinced the presiding judge and convicted Bentley was largely linguistic, based on the troubling police transcript of the statements Bentley had made.

A single word seemed to finally tip the balance: the definite article. In his statement, despite denying knowledge of any gun until Craig had used it, Derek Bentley had supposedly said “I did not know he was going to use the gun” at an earlier, crucial moment in the narrative. In summation, the judge made much of the word choice of “the gun” as opposed to “a gun,” which to him showed that Bentley had known about the gun all along and his language had given him away. Thanks to this errant definite article, Bentley was thus considered an “unreliable witness” who did have prior knowledge of the gun and had contradicted himself by denying this later.

It took just two days for the jury to decide Bentley’s fate and about a month later he was executed without reprieve. The unfortunate case of Derek Bentley is one of Britain’s most notorious miscarriages of justice and shines a spotlight on just how fragile forensic linguistic evidence can be, and how prone it is to mistaken interpretations and manipulation by even the most well-meaning of people.

As we’ve seen, forensic linguistics can often provide the key lead in cold cases such as in the Unabomber mystery, enabling investigators to uncover much stronger corroborating evidence that can lead to a conviction as a result. But there’s a danger of things going horribly wrong when slim linguistic evidence is the only thing standing between the accused and their innocence, especially when that evidence might interpreted carelessly out of context by those inexperienced in forensic linguistic techniques, including judges, lawyers, investigators, and even so-called expert witnesses.

The popularity of police dramas has led a wide audience to believe that DNA evidence is rarely wrong. The reality is that even DNA testing can be flawed, and forensic linguistic evidence is perhaps even more precarious in what it can tell us. But the idea of linguistic fingerprints is compelling. While DNA testing is somewhat out of the average person’s reach, forensic linguistic analysis seems readily accessible. Our native familiarity with language, along with a society’s linguistic baggage and beliefs, can often lead us to think we’re competent enough to judge the simple cause and effect of what language reveals about a person’s identity or intentions. The attraction of this kind of linguistic armchair detective work is it presents problems as safe and controlled puzzles to be solved using knowledge you apparently already have, innately. But life is not a clear-cut whodunnit and answers to mysteries are rarely so simple.

English professor Don Foster may style himself a literary detective, but is an oft-cited cautionary tale of how even those who work in language can make major errors when dabbling in linguistic mysteries and criminal cases such as uncovering an author’s identity, without training or experience in forensic linguistics. Foster had once successfully used his simple literary techniques on author identification, based on word count coincidences, to determine who the anonymous author of the novel Primary Colors was in the late 1990s, given earlier leads that had been put forward by others. When it comes to true crimes and other cases of unknown identity, however, those same rudimentary techniques can result in distressing and wrongful accusations.

Retained as an expert linguistic witness in famous cases such as the murder of JonBenét Ramsey, Foster had accused Patsy Ramsey, the victim’s mother, of writing the crucial ransom note. Using those same techniques prior to his collaboration with Boulder police, Foster had also declared her innocence, believing that the crime had been committed by someone he had communicated with on the internet based on their language use. Similarly, the highly publicized 2001 anthrax case led Foster to wrongfully point the finger at bioweapons expert Steven Hatfill in a Vanity Fair article, destroying his career, and resulting in Hatfill suing Foster and Vanity Fair. Foster later went on to disastrously ruin writer Sarah Champion’s reputation by unmasking her as London’s most notoriously anonymous call girl blogger Belle du Jour, by counting her commas (later revealed to be scientist Dr Brooke Magnanti).

The reality is linguistic evidence can be highly problematic when looked at in isolation but can be even more so when you take our social and political assumptions about language into account. In the Derek Bentley case, the linguistic evidence of a handful of sentences from the police transcript, a so-called verbatim record, generated so much heated debate back and forth by lawyers and judges over what Bentley could have really intended that, as linguist Coulthard and others have pointed out, a major twist to the story was completely overlooked.

Bentley had steadfastly denied that he’d ever said the sentences that led to his wrongful conviction, either those words verbatim or the sequence in which they were presented. Between a police record, which three officers swore was the result of an unassisted monologue, and the accused, an illiterate man with a history of developmental problems, who should be believed? It’s one thing for a judge or lawyer to sway a jury with a particular interpretation of linguistic evidence, but what happens when the supposedly neutral linguistic evidence itself is unreliable?

The act of transcribing an interaction itself is political.

We may often believe that transcripts are a true, neutral, straightforward account of verbal interaction. In the absence of recorded speech, they are often all we have to go on, so without thinking, we take them as written. But how reliable are they? You may have heard of the term “verballing,” which refers to false verbal evidence made up by the police, which supposedly is kept in check by audio and video recordings. Even without considering this possibility, it turns out just the act of transcribing speech can fraught with difficulties.

Numerous studies have shown how transcripts are not as objective and reliable as we imagine. For one thing, listening to recordings or taking down live speech can already be perceptually problematic. Studies have shown that listeners can fill in missing speech sounds and that depending on their background, may even “restore” speech sounds differently. So it seems that listeners, in some cases, can just hear what they want to hear.

The act of transcribing an interaction itself is political—power lies in the hands of those who create the transcript, even without intentional fabrication, and often involves an element of interpretative choices that can affect how the information is seen by others. Take a simple example—if a non-native speaker or a speaker with a stigmatized dialect is transcribed with a kind of eye dialect or vernacular variants that’s marked from the standard form, readers may develop a certain perception of that speaker depending on their social biases. How a speaker is represented in the transcript can be problematic when it comes to legal cases that depend heavily on the language contained in transcripts.

In his analysis of the discourse between Bentley and the police, Coulthard shows that what was presented by the police as a true and verbatim monologue record contained discourse markers of a concealed dialogue. When a speaker suddenly volunteers negative sentences with no “narrative justification” such as “up until then, Chris had not said anything,” and “I did not know he was going to use the gun,” in a transcript, it suggests it’s in answer to an invisible leading question. It alters the incriminating perception of Bentley’s statement considerably, because the definite article could very well have been in answer to “a gun” introduced by the concealed question, rather than volunteered spontaneously by Bentley.

In fact, Bentley’s denial that he’d ever said “Let him have it, Chris” was supported the the testimony of both Craig and a fourth officer at the scene who had never been called upon as a witness. It took his family some forty years of dedication before Bentley was pardoned and exonerated, in 1998, of a murder he did not commit.

All this to say, while forensic linguistics can certainly shine a light in the darkest of cases, in the wrong hands, linguistic evidence can be fatally flawed.