Problem with profanity on the Internet

An example of the Scunthorpe problem because of a regular expression match

The Scunthorpe problem is the unintentional blocking of websites, e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that appear to have an obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.

The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of contexts, possibly across many cultures, which is an extremely difficult task. As a result, broad blocking rules may result in false positives affecting innocent phrases.

Origin and history [ edit ]

The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "cunt".[1] In the early 2000s Google's opt-in SafeSearch filters apparently made the same mistake, preventing people from searching for local businesses or URLs that included Scunthorpe in their names.[2]

Other examples [ edit ]

Mistaken decisions by obscenity filters include:

Refused web domain names and account registrations [ edit ]

Blocked web searches [ edit ]

Blocked emails [ edit ]

Blocked for words with two meanings [ edit ]

In October 2004, e-mails advertising the pantomime Dick Whittington sent by a teacher from Norwich in the UK were blocked by school computers because of the use of the name Dick , sometimes used as slang for penis . [23]

, sometimes used as slang for . In May 2006, a man in Manchester in the UK found that e-mails he wrote to his local council to complain about a planning application had been blocked as they contained the word erection when referring to a structure. [24]

when referring to a structure. Blocked e-mails and web searches relating to The Beaver , a magazine based in Winnipeg, caused the publisher to change its name to Canada's History in 2010, after 89 years of publication. [25] Publisher Deborah Morrison commented: "Back in 1920, The Beaver was a perfectly appropriate name. And while its other meaning [vagina] is nothing new, its ambiguity began to pose a whole new challenge with the advance of the Internet. The name became an impediment to our growth". [26]

, a magazine based in Winnipeg, caused the publisher to change its name to in 2010, after 89 years of publication. Publisher Deborah Morrison commented: "Back in 1920, was a perfectly appropriate name. And while its other meaning [vagina] is nothing new, its ambiguity began to pose a whole new challenge with the advance of the Internet. The name became an impediment to our growth". In June 2010, Twitter blocked a user from Luxembourg 29 minutes after he had opened his account and posted his first tweet. The tweet read 'Finally! A pair of great tits (Parus major) has moved into my birdhouse!’. Despite including the Latin name to point out that the tweet was about birds, any attempts to unblock the account were in vain. [27]

In 2011, a councillor in Dudley found an email flagged for profanity by his council's security software after mentioning the Black Country dish, faggots (a type of meatball, but also a derogatory term for a homosexual). [28]

Residents of Penistone in South Yorkshire have had e-mails blocked because the town's name includes the substring penis . [29]

. Lightwater in Surrey suffered similarly because its name contains the substring twat .

. Residents of Clitheroe (Lancashire, England) have been repeatedly inconvenienced because their town's name includes the substring clit , which is short for "clitoris". [30]

, which is short for "clitoris". Résumés of magna cum laude graduates have been blocked by spam filters because of inclusion of the word cum, which is Latin for with (in this usage), but is sometimes used as slang for semen in English usage.[31]

News articles damaged [ edit ]

In June 2008, a news site run by the American Family Association filtered an Associated Press article on sprinter Tyson Gay, replacing instances of "gay" with "homosexual", thus rendering his name as "Tyson Homosexual". [32]

The word or string "ass" may be replaced by "butt", resulting in "clbuttic" for "classic" and "buttbuttinate" for "assassinate".[33]

Other [ edit ]

In November 2013, British Facebook temporarily blocked users for using the word faggot in reference to the dish. [34]

in reference to the dish. In January 2014, files used in the online game League of Legends were reported as being blocked by some UK ISP filters due to the names 'VarusExpirationTimer.luaobj' and 'XerathMageChainsExtended.luaobj' containing the letters used in the word "sex". [35]

were reported as being blocked by some UK ISP filters due to the names 'VarusExpirationTimer.luaobj' and 'XerathMageChainsExtended.luaobj' containing the letters used in the word "sex". In May 2018, the website of the grocery store Publix would not allow a cake to be ordered containing the Latin phrase summa cum laude . The customer attempted to rectify the problem by including special instructions but still ended up with a cake reading "Summa --- Laude". [36] [37]

. The customer attempted to rectify the problem by including special instructions but still ended up with a cake reading "Summa --- Laude". In May 2020, despite extensive media scrutiny, some hashtags directly referring to British political advisor Dominic Cummings were unable to trend on Twitter because the substring cum in Cummings' surname triggered Twitter's anti-porn filter.[38]

See also [ edit ]