Short version: For the $12 cost of a domain, I was able to rank in Google search results against Amazon, Walmart etc. for high value money terms in the US. The Adwords bid price for some these terms is currently around $1 per click, and companies are spendings 10s of thousands of dollars a month to appear as ads on these search results, and I was appearing for free. Google have now fixed the issue and awarded a bug bounty of $5000. Google provides an open URL where you can ‘ping’ an XML sitemap which they will fetch and parse – this file can contain indexation directives. I discovered that for many sites it is possible to ping a sitemap that you (the attacker) are hosting in such a way that Google will trust the evil sitemap as belonging to the victim site. I believe this is the first time they have awarded a bounty for a security issue in the actual search engine, which directly affects the ranking of sites. Hacker News Discussion /r/netsec Discussion



As part of my regular research efforts, I recently discovered an issue to Google that allows an attacker to submit an XML sitemap to Google for a site for which they are not authenticated. As these files can contain indexation directives, such as hreflang, it allows an attacker to utilise these directives to help their own sites rank in the Google search results.

I spent $12 setting up my experiment and was ranking on the first page for high monetizable search terms, with a newly registered domain that had no inbound links.

XML Sitemap & Ping Mechanism

Google allows for the submission of an XML sitemap; these can help them discover URLs to crawl, but can also be used hreflang directives which they use to understand what other international versions of the same page may exist (i.e. “hey Google, this is the US page, but I have a German page on this URL…”). It is not known exactly how Google uses these directives (as with anything related to Google’s search algorithms), but it appears that hreflang allows for one URL to ‘borrow’ the link equity and trust from one URL and use it to rank another URL (i.e. most people link to the US .com version, and so the German version can ‘borrow’ the equity to rank better in Google.de).

You can submit XML sitemaps for your domain either via Google Search Console, inside your robots.txt or via a special ‘ping’ URL. Google’s own docs seem a bit conflicting; at the top of the page they refer to submitting sitemaps via the ping mechanism, but at the bottom of the page they have this warning:

However, in my experience you could absolutely submit new XML sitemaps via the ping mechanism, with Googlebot typically fetching the file within 10-15 seconds of the ping. Importantly, Google also mention a couple of times on the page that if you submit a sitemap via the ping mechanism it will not show up inside your Search Console:

As a related test, I tested whether I could add other known search directives (noindex, rel-canonical) via XML sitemaps (as well as trying a bunch of XML exploits), but Google didn’t seem to use them.

Google Search Console Submission

If you try to submit an XML sitemap in GSC, that includes URLs for another domain you are not authorised for, then GSC rejects them:

We’ll come back to this in a moment.

(Sorry, Jono!)

Open Redirects

Many sites use a URL parameter to control a redirect:

In this example I would be redirected (after login) to page.html . Some sites with poor hygiene allow for what are known as ‘open redirects’, where these parameters allow redirecting to a different domain:

Often these don’t need any interaction (like a login), so they just redirect the user right away:

Open redirects are very common, and often considered not too dangerous; Google does not include them in their bug bounty program for these reasons. However, where possible companies do try to protect against these, but often you can circumvent their protection:

Tesco are a UK retailer doing more than £50 billion in revenue, over a £1 billion of which via their website. I reported this example to Tesco (along with a number of others to other companies that I discovered during this research) and they have since fixed it.

Ping Sitemaps via Open Redirects 😱

At this point, you may have guessed where I’m going with this. In turns out that when you ping an XML sitemap, if the URL you submit is a redirect Google will follow that redirect, even if it is cross domain. Importantly, it seems to still associate that XML sitemap with the domain that did the redirect, and treat the sitemap it finds after the redirect as authorised for that domain. For example:

In this case, the evil.xml sitemap is hosted on blue.com , but Google associates it as belonging to, and being authoritative for, green.com . Using this you can submit XML sitemaps for sites you shouldn’t have control of, and send Google search directives.

Experiment: Using hreflang directive to ‘steal’ equity and rank for free

At this point I had the various moving parts, but I hadn’t confirmed that Google would really trust a cross-domain redirected XML sitemap, so I spun up an experiment to test it. I had done lots of smaller tests to understand various parts of this (as well as various dead ends), but didn’t expect this experiment to work as well as it did.

I created a fake domain for a UK based retail company that doesn’t operate in the USA, and spun up an AWS server that mimicked the site (primarily through harvesting the legit content and retooling it – i.e. changing currency / address etc.). I have anonymised the company (and industry) here to protect them, so lets just call them victim.com .

I now created a fake sitemap that was hosted on evil.com , but contained only URLs for victim.com . These URLs contained hreflang entries for each URL pointing to an equivalent URL on evil.com , indicating it was the US version of victim.com . I now submitted this sitemap via an open redirect URL on victim.com via Google’s ping mechanism.

Within a 48 hours the site started getting small amounts of traffic for long tail terms (SEMRush screenshot):

A couple more days passed and I started appearing for competitive terms on the 1st page, against the likes of Amazon & Walmart:

Furthermore, Google Search Console for evil.com indicated that victim.com was linking to evil.com , although this obviously was not the case:

At this point I found I was also able to submit XML sitemaps for victim.com inside GSC for evil.com :

It seemed that Google had linked the sites, and evil.com ’s search console now had some capabilities to influence victim.com ’s setup. I could now also track indexation for my submitted sitemaps (you can see I had thousands of pages indexed now).

Searchmetrics was showing the increasing value of the traffic:

Google Search Console was showing over a million search impressions and over 10,000 clicks from Google search; and at this point I had done nothing other than submit the XML sitemap!

You should note that I was not letting people check out on the evil site, but had I wanted to, at this point I could have either scammed people for a lot of money, or setup ads or otherwise have begun monetising this traffic. In my mind this posed a serious risk to Google visitors, as well as a risk to companies relying on Google search for traffic. Traffic was growing still, but I shut my experiment down and aborted my follow up experiments for fear of doing damage.

Discussion

This method is entirely undetectable for victim.com – the XML sitemaps don’t show up on their end, and if you are doing what I did and leveraging their link equity for a different country, then you could entirely fly under the radar. Competitors in the country are operating in would be left quite baffled by the performance of your site (see above where I’m in the search results as Amazon, Walmart & Target, who are all spending significant resources to be there).

In terms of Black Hat SEO, this had a clear usage, and furthermore is the first example I’m aware of of an outright exploit in the algorithm, rather than manipulating ranking factors. The severity of potential financial impact of the issue seems non-trivial – imagine the potential profit from targeting Tesco or similar (I had more tests to run to investigate this more but couldn’t without potentially causing damage).

Google have awarded a $5000 bounty for this, and the Google team were a pleasure to deal with, as always. Thanks to them.

If you have any questions, comments or information you can contact me at [email protected], on Twitter at @TomAnthonySEO, or via contacting me via Distilled.

Disclosure Timeline