Google is using a 10,000-strong army of independent contractors to flag “offensive or upsetting” content, in order to ensure that queries like “did the Holocaust happen” don’t push users to misinformation, propaganda and hate speech.

The review of search terms is being done by the company’s “quality raters”, a little-known corps of worldwide contractors that Google uses to assess the quality of its systems. The raters are given searches based on real queries to conduct, and are asked to score the results on whether they meet the needs of users.

These contractors work from a huge manual, first made public in 2013, describing every potential problem they could find with a given search query: whether or not it meets the user’s expectations, whether the result offered is low or high quality, and whether it’s spam, porn or illegal.

In a new update to the rating system, rolled out on Tuesday, Google introduced another flag raters could use: the “upsetting-offensive” mark. Although the company did not cite a specific reason for the update, the move comes three months after the Guardian and Observer began a series of stories showing how the search engine promotes extremist content.

One story in particular highlighted how a search for “did the Holocaust happen” returned, as its top result, a link to the white supremacist forum Stormfront, explaining how to promote Holocaust denial to others.

That exact search result is now included by Google as one of the examples the company now uses to train its contractors on how and when to mark pages as “upsetting-offensive”.

Detailing why a result for “Holocaust history” returning a link to Stormfront should be flagged as problematic, the document explains: “This result is a discussion of how to convince others that the Holocaust never happened. Because of the direct relationship between Holocaust denial and anti­semitism, many people would consider it offensive.”

By contrast, the same search query returning a result for the History Channel should not get the upsetting-offensive flag, even if users do find the topic of the Holocaust upsetting. “While the Holocaust itself is a potentially upsetting topic for some, this result is a factually accurate source of historical information,” the manual explains. “Furthermore, the page does not exist to promote hate or violence against a group of people, contain racial slurs, or depict graphic violence.”

Other examples given in the manual for the flag are a query for “racism against blacks” returning a page for the white supremacist blog Daily Stormer, and a query for “Islam” returning a result linking to far-right US activist Jan Morgan’s website.

Facebook Twitter Pinterest Example results provided to quality raters in Google’s manual. Photograph: Google

Even before the specific introduction of the “upsetting-offensive” marker, many of these results would have been ranked poorly by quality raters for other reasons. Some of the pages, for instance, meet Google’s description of “low quality” content, due to the lack of expertise and poor reputation of the websites. They also rank poorly on the company’s “Needs Met” scale, since a user searching for the queries in question would be unlikely to actually want the results offered.

Google declined to comment on the new guidelines, but search engineer Paul Haahr told industry blog Search Engine Land: “We will see how some of this works out. I’ll be honest. We’re learning as we go … We’ve been very pleased with what raters give us in general. We’ve only been able to improve ranking as much as we have over the years because we have this really strong rater programme that gives us real feedback on what we’re doing.”

The raters’ rankings do not directly feed back into search results, however. Instead, the data collected is used by Google to help judge the success of algorithm changes, and is also part of the corpus used to train its machine-learning systems.

Danny Sullivan, editor of Search Engine Land, said: “The results that quality raters flag is used as ‘training data’ for Google’s human coders who write search algorithms, as well as for its machine-learning systems. Basically, content of this nature is used to help Google figure out how to automatically identify upsetting or offensive content in general.

“In other words, being flagged as ‘upsetting-offensive’ by a quality rater does not actually mean that a page or site will be identified this way in Google’s actual search engine. Instead, it’s data that Google uses so that its search algorithms can automatically spot pages generally that should be flagged.”

While the new ranking option addresses one particular problem highlighted by the Guardian and Observer, Google’s failure to keep fake news and propaganda off the top of search results is broader than simply promoting upsetting or offensive content.

Google has also been accused of spreading “fake news” thanks to a feature known as “snippets in search”, which algorithmically pulls specific answers for queries from the top search results. For a number of searches, such as “is Obama planning a coup”, Google was instead pulling out answers from extremely questionable sites, leading to the search engine claiming in its own voice that “Obama may be planning a communist coup d’état”.

The same feature also lied to users about the time required to caramelise onions, pulling a quote that says it takes “about five minutes” from a piece which explicitly argues that it in fact takes more than half an hour.

Shortly after each of these stories were published, the search results in question were updated to fix the errors.