SEMRush IP Link Data Bizarre, Misleading

Disclaimer: I am Russ Jones and I work for Moz, which is a competitor of SEMRush. These are my opinions and do not represent those of Moz. That being said, the data speaks for itself.

I must admit that I was taken aback when Matthew Woodward’s recent Best Backlink Checker analysis came back so heavily in favor of SEMRush. I knew Matthew did good work and took this project seriously, vetting each provider to the best of his and his teams’ ability given the data they were provided, but it just didn’t mesh with the comparisons I run daily and weekly against Moz’s competitors. (I have spoken with Matthew about this issue and he is currently investigating independently).

Matthew’s study compared the number of unique linking IPs and Subnets reported by various link indexes across a whopping 1,000,000 domains. Unfortunately, Moz does not collect IP data, so I had to construct a conservative model to predict IP numbers. I wasn’t expecting Moz to perform well given this state of affairs. However, I certainly didn’t expect SEMRush to have grown so rapidly and dramatically in such a short time. It would have taken a technological miracle in my eyes. So I took a closer look.

Methodology

Rather than rely on the numbers reported in SEMRush’s API or tool, I collected the data myself using their own exports of root linking domains. The process was rather straightforward.

Download list of all root linking domains Count Unique IPs in export. Compare to number in UI.

Well, here is where things get really dubious for SEMRush. [Download Raw Data]

First site: thegooglecache.com

Number of Reported Unqiue IPs: 859

Actual Based on Root Linking Domains: 636

Discrepancy: +25%

Second site: matthewwoodward.co.uk

Number of Reported Unqiue IPs: 7700

Actual Based on Root Linking Domains: 5669

Discrepancy: +26%

Third site: grepwords.com

Number of Reported Unqiue IPs: 551

Actual Based on Root Linking Domains: 436

Discrepancy: +21%

These were just the first 3 sites that I tested. Obviously this was disconcerting, so I decided to dig deeper just within the app. The picture is not pretty.

First, SEMRush shows IP addresses which are reported as linking to a domain in the IP address tab that are not in fact tied to a domain in their system. That is to say, SEMRush reports that a domain has a link from a certain IP, when you click on the IP to find the domain it is assigned to, SEMRush reports “Not Found”.

Second, there are thousands of instances where SEMRush reports more unique IPs than domains.

In fact, a test of ~3600 random domains yielded approximately 70% with more IPs than domains. The complete [csv can be downloaded here]

So, what is going on here. Is SEMRush outright falsifying data?

I don’t think so. What I think is going on is an issue of how SEMRush stores IPs at a link level rather than a domain level. This means that if they visit a site that uses a round robin DNS to load balance, they could get multiple IP addresses for the same domain. While services like Majestic and Ahrefs likely store a single canonical IP address per domain, SEMRush seems to store per link, which accounts for why there would be more IPs that referring domains in some cases. I do not think SEMRush is intentionally inflating their numbers, I think they are storing the data in a different way than competitors which results in a number that is higher and potentially misleading, but not due to ill intent.

What does this mean for users?

If my intuition is correct, the number of unique IP addresses is not a safe metric to use in SEMRush. If your site has a sitewide link from a website that has a round-robin DNS, you could accumulate dozens if not hundreds of unique IPs based on the crawl of that site and the number of IPs across which the site is load balanced. Whereas another site, with a similar sitewide link from a website with no round-robin DNS would report only 1 IP address. Furthermore, comparing IP and Subnet counts between providers (Moz, Ahrefs, Majestic, SEMRush, Spyglass, etc.) is not an apples-to-apples measurement unless and until SEMRush changes their collection or reporting methods.

Important Takeaways

I have repeated this many times before, but it is worth repeating again. Ask your data providers how they collect, store, and report on metrics. Teasing out the differences between providers is essential to making fair comparisons and determining what might be best for you and your team. This IP address issue is just one of many difficult questions that companies which crawl the web must answer, and the way they answer these questions can dramatically change reporting.

Also, I want to point out the importance of research like that done by Matthew Woodward. If he had not run this extensive analysis, you and I and the rest of the community may never have noticed this important distinction. And, in missing this distinction, we would be potentially misled by our assumptions on what particular metrics mean. We need you, the independent SEOs of this world, to keep our feet to the fire (Moz included).

No tags for this post.