SEO has become extremely complicated and technical over the years. I’ve heard that organic search went from roughly 200 rankings factors to over 500, possibly more. But that’s speculation – only a few ranking signals were ever officially confirmed by Google. Some were “discovered” in studies, but most of them are either based on assumption or anecdotes. That creates too much room for uncertainty, speculation and straightforward wrong information.

Deep Dive: (for beginners) what is a SEO ranking factor?

An SEO ranking factor is a signal Google uses to rank pages in Google Search. Google applies “Ranking Signals” to its index of web documents to return the most relevant result when a user performs a search. It’s important to distinguish between indexing and ranking. Google builds an index of pages by using hyperlinks to crawl through the web. Ranking doesn’t happen in this step. Many people think that when Google cannot properly index a page, say because it uses non-compliant Javascript, it is a ranking factor. That’s not the case. Ranking signals take lots of different parameters on and off a web document into account: content, links, structure, etc. Our goal as SEOs is to figure out what ranking factors Google uses, so that we can optimize sites to rank higher in Organic Search.

We need more clarity about what we do know and what we don’t know in SEO to improve our credibility, have better conversations and achieve better results. Google’s use of Machine Learning is already making it harder to understand ranking signals and algorithm updates. It will not get easier and speculation only adds to the noise.

Instead of analogy, we need to reason from first principles.

How we discover ranking factors in SEO

“What ranking factors do we certainly know to be true?” is not a simple question. Google is a black box and it won’t tell us the secrets to its $100 billion algorithm [13]. It’s often impossible to create laboratory conditions in which we can isolate a factor and measure its impact on rank (people tried [27]). On top of that, ranking factors aren’t as “clear” as they used to be. They changed a lot over time and now even seem to be weighed different depending on the query. Yet, there are other systems of similar nature that have been reverse engineered. It’s not impossible.

To advance our understanding, we can draw evidence from 7 sources:

Google’s blog Public statements by Googlers, e.g. on Twitter, in presentations or in interviews Ranking factor studies/analyses The Google Quality Rater Guidelines Google’s basic SEO guide Patents Google registered or acquired Anecdotes (people running tests and drawing conclusions

None of these sources are perfect, but in combination, they give us the best picture possible. There’s always an angle you can attack this from. For example, officially confirmed signals still don’t tell us how their weighted in the sum of all signals. Statements on Twitter are often very broad. And we even see data that conflicts with some things Google says. But, we have to work with what we have ¯\_(ツ)_/¯.

Establishing first principles of SEO

First principles are the smallest building blocks; the things and laws we know to be true. Establishing first principles comes with three constraints. First, we have to distinguish between direct and indirect impact. Optimized meta-descriptions can positively impact organic traffic, but don’t have a direct impact on rank. Second, the questions “how much” and “in which case” are significant. Not every ranking factor applies to every query in the same way. For example, QFD (“query deserves freshness”) and HTTPS apply to only certain keywords. Third, we have to distinguish between positive and negative ranking factors (for example, 404 errors or “thin content”).

What’s the overarching goal I’m trying to achieve with this article? The goal is to sharpen our sense of proven truths in times of uncertainty. Google’s increasing usage of machine learning makes it harder than ever to understand the algorithm(s). But, by going back to the basics, we should be able to focus on results over speculative minutiae.

Officially confirmed ranking factors

We can put ranking signals into three groups:

Officially confirmed by Google Discovered through analysis Speculated

I’m covering only confirmed and discovered signals in this article. I don’t see any sense in amplifying ranking signal speculations by covering them in this article.

The order in which the ranking factors are mention is my personal understanding of their significance. I understand content to be the most important signal on this list and E-A-T the least important. However, none of the signals are unimportant.

[toc]

Content External and internal links User Intent CTR User Experience Title tag Page speed Freshness E-A-T SSL encryption

Ranking signal 1: Content

Returning the most relevant search results is the goal of every search engine. The roll-out of Hummingbird in 2013 was a milestone in getting closer to that goal: Google switched focus on entities and their relationships, which made it significantly better at understanding context and relevance.

In the early days of search, it was enough to mention a keyword many times on the page to be relevant. Now, content needs to have high relevance for the query, informational depth, answer all questions about a topic and match user intent. So, “Content as a ranking factor” means the length, depth, and relevance of body content for the targeted query.

Deep Dive: the nuance of content Content is not only text; it’s also images, videos, gifs, and more. All these elements play together (more under “User Intent”). Ranking in Google’s image search is not the only benefit of optimizing images. Adding a descriptive alt-tag and file name increases the relevance of your content, especially for search queries that demand more visual results, like “star wars wallpaper”. There’s also a difference between main content and supplementary content, i.e. text in the footer, header or parts of the site other than “the body”. It’s easy to see that the topic of “content” is very nuanced, but I’m trying to keep it high-level here. Lastly, “pruning” low quality content has shown to be effective many times. The idea is to decrease the amount of low quality content on a domain by either improving or getting rid of it (noindex, 404 or redirecting). This indicates that Google measures content quality on a domain-level, at least to a degree. Note that this is not an official ranking factor, but John Mueller addressed the topic in a Webmaster Hangout, saying: “So in general when it comes to low quality content, that’s something where we see your website is providing something but it’s not really that fantastic. And there are two approaches to actually tackling this. On the one hand you can improve your content and from my point of view if you can improve your content that’s probably the the best approach possible because then you have something really useful on your website you’re providing something useful for the web in general. […] cleaning up can be done with no index with a 404 kind of whatever you like to do that.“

How do we know this to be true?

Google SEO Starter Guide

“Creating compelling and useful content will likely influence your website more than any of the other factors discussed here.” [2]

“[…] optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand your images.“

“If you do decide to use an image as a link, filling out its alt text helps Google understand more about the page you’re linking to. Imagine that you’re writing anchor text for a text link.“

Presentations:

How Google works [18]

Interviews:

Andrey Lipattsev Q&A [20]

Articles:

“How Search Works”: “When we index a web page, we add it to the entries for all of the words it contains.” [1]

“Matt Cuts: Is speed more important than relevance?” [24]

“How Google is remaking itself as machine learning first company” [31]

“Better understanding of your site” [32]

“Good times with inbound links”: “*One of the strongest ranking factors is my site’s content. *” [34]

“Google Technology Overview” [36]

“Google Image Publishing Guidelines” [45]

Links still have a decent influence on rankings, but ranking factor studies and Google statements have shown its decline over time. They still play a role in the ranking and indexation of web documents. And, like “content” as a ranking signal, backlinks are a bit more nuanced. Their quality depends on many factors, such as anchor text, strength of the link source, and matching content relevance between link source and target.

Internal links are powerful ranking signals, too. They pass link equity from page to page. Internal anchor text helps Google understand the topic and context of content like external backlinks. Already in 2008, Google recommended to “keep important pages within several clicks from the homepage“. So, URL-structure has a positive impact on rankings because it’s an indicator of a clear hierarchy of information (system taxonomy). URL optimization revolves around clean, descriptive directory-structures without duplicates or parameters.

Deep Dive: age as a quality indicator for links (and content) I want to call out a patent invented by Matt Cutts (some might remember him) and Jeff Dean (Google’s current head of AI), amongst others. It describes using historic information in ranking, but I want to narrow down on the factor of document age and its impact on the quality of a link. A rapid spike in the number of backlinks might indicate a spam attempt or be okay depending on how old a page/site is. “In implementations consistent with the principles of the invention, the history data may include data relating to: document inception dates; document content updates/changes; query analysis; link-based criteria; anchor text (e.g., the text in which a hyperlink is embedded, typically underlined or otherwise highlighted in a document); traffic; user behavior; domain-related information; ranking history; user maintained/generated data (e.g., bookmarks); unique words, bigrams, and phrases in anchor text; linkage of independent peers; and/or document topics.” [40] The patent contains all kinds of interesting hints, so give it a read when you have time.

How do we know this to be true?

Patents:

PageRank patent [14]

“Training set construction for taxonomic classification” [29]

“Information retrieval based on historical data” [40]

Interviews:

Andrey Lipattsev Q&A [20]

Google SEO Starter Guide [2]

“The navigation of a website is important in helping visitors quickly find the content they want. It can also help search engines understand what content the webmaster thinks is important. Although Google’s search results are provided at a page level, Google also likes to have a sense of what role a page plays in the bigger picture of the site.“

“Link text is the visible text inside a link. This text tells users and Google something about the page you’re linking to. Links on your page may be internal—pointing to other pages on your site—or external—leading to content on other sites. In either of these cases, the better your anchor text is, the easier it is for users to navigate and for Google to understand what the page you’re linking to is about.” (also applies to external links)

“Think about anchor text for internal links too“

“URLs with words that are relevant to your site’s content and structure are friendlier for visitors navigating your site.“

“Use a directory structure that organizes your content well and makes it easy for visitors to know where they’re at on your site. Try using your directory structure to indicate the type of content found at that URL.“

“Provide one version of a URL to reach a document“

Articles

“Importance of link architecture” [33]

“Technologies behind Google ranking”: “IR gave us a solid foundation, and we have built a tremendous system on top using links, page structure, and many other such innovations.” [17]

“Good times with inbound links”: “As many of you know, relevant, quality inbound links can affect your PageRank (one of many factors in our ranking algorithm)” [34]

“Google Turning Its Lucrative Web Search Over to AI Machines” [35]

“Google Technology Overview” [36]

“Content guidelines: Keep a simple URL structure” [44]

Ranking signal 3: User Intent

I’ve written about the different types of user intent and how to identify them for a large set of queries in “User Intent mapping on steroids”:

“User intent” is the goal a user is trying to achieve when searching online. Old school SEO distinguished between “transactional”, “navigational”, and “informational” user intent. People either want to buy, visit a specific page or find out more about a topic. That hasn’t changed dramatically, but in the 2017 version of its quality rater guidelines, Google distinguishes between four intents:

– Know

– Do

– Website

– Visit-in-person“

Content relevance and User Intent are closely related, but not the same. First, if user intent isn’t met a page won’t rank, whereas content relevance exists on a spectrum. For example, a blog article cannot rank for a query that demands listings, say for jobs or real estate. Or when you search for “Sushi”, you get local search results. Google understands that more users are looking for restaurants than an explanation or definition in this case. For some queries, images are a better format than text, for example, “tattoo inspiration”. In this case, you want to create an image gallery to rank well, not an essay.

RankBrain is the engine behind user intent understanding and the third strongest ranking signal according to Google:

“Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful.“

It’s described to assess “how well a document in the ranking matches a query” (Jeff Dean, head of AI at Google in a Wired article 2016 [31]).

How do we know this to be true?

Presentations:

How Google works [18]

Interviews:

Andrey Lipattsev Q&A [20]

Articles:

“FAQ: All about the Google RankBrain algorithm” [23]

“How Search Works”: “Understanding the meaning of your search is crucial to returning good answers. So to find pages with relevant information, our first step is to analyze what the words in your search query mean. We build language models to try to decipher what strings of words we should look up in the index.” [..] “This involves steps as seemingly simple as interpreting spelling mistakes, and extends to trying to understand the type of query you’ve entered by applying some of the latest research on natural language understanding.” [1]

Ranking signal 4: Click-through rate

Click-through rate is the ratio between clicks and impressions in the Google search results. It’s affected by:

Brand recognition

Relevance of title, description, and URL for the query

Whether you have a rich snippet or not

Other features shown in the SERP (and which ones)

The exact usage of CTR in ranking is not 100% clear. It often falls between the cracks of using general feedback mechanisms in search. The questions here are how strong compared to other signals CTR is, whether it affects rankings in real-time (unlikely), or if there is an accumulation time. Besides Google being unclear about its usage, two papers show strong evidence for Google using CTR to rank pages.

There’s also evidence that Google is able to distinguish between more than just long and short clicks: “[…] rather than simply distinguishing long clicks from short clicks, a wider range of click-through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.” [15]

How do we know this to be true?

Patents:

“Modifying search result ranking based on a temporal element of user feedback” [15]

“Incorporating Clicks, Attention, and Satisfaction into a

Search Engine Result Page Evaluation Model” [26]

Presentations:

Gary Illes’ presentation at SMX Munich 2015 [16]

How Google works [18]

Ranking signal 5: User Experience

User Experience is one of the blurriest ranking signals of all because it’s so had to define and overlaps with many other signals. It could entail all touch points a user has with a company, but that’s impossible to measure for a search engine. It’s too soft. Instead, we need to look for hard factors:

Accessibility

Usability

Design

A page is accessible when it loads completely, quickly, and without issues. One way to optimize for this particular case is by providing image dimensions to avoid the “jump” when a page loads. But Ad pressure and invasiveness of ads fit into the bucket as well.

Compatibility with different devices, search functionality, and 404 errors are indicators for usability.

What most people have in mind when thinking of “user experience” is design and it does carry some importance. For example, If a site looks spammy users bounce, which can have implications on rankings. Important factors for “design’ are how easy it is for users to find and consume information and how trustworthy the experience looks. The latter plays into the next signal: E-A-T.

Good indicators for User Experience are user signals (bounce rate, dwell time, pages/visit) and engagement signals (social shares, scroll depth).

How do we know this to be true?

Articles:

“How Search Works”: “These algorithms analyze hundreds of different factors to try to surface the best information the web can offer, from the freshness of the content, to the number of times your search terms appear and whether the page has a good user experience.” [1]

Ranking signal 6: Title tag

The Title tag has been one of the stronger ranking signals from the beginning. It’s a strong indicator of relevance and affects CTR. Having the keyword in the title is still a requirement to rank, even though Google understands the context of queries. Google looks at “[…] how often and where those keywords appear on a page, whether in titles or headings or in the body of the text.“

How do we know this to be true?

Articles:

“How Search Works” [1]

Google SEO Starter Guide [2]

Ranking signal 7: Page speed

Google confirmed page speed to have an impact on rank in 2010 for the first time [22] and in 2018 for the second time [21]. Where the former relates to desktop devices, the latter refers to mobile search (to no one’s surprise).

10 years ago, page speed was a simple metric. Nowadays, we need to measure several metrics to get a good understanding, as websites have become much more sophisticated. Google’s own page speed tool, WebPageTest, recommends “Speed Index” as unifying metric. It accrues metrics like TTFB (time to first byte), TTFP (time to first paint), TTFMP (time to first meaningful paint), and time to DOMContentLoad.

How do we know this to be true?

Articles:

“Using page speed in mobile search ranking” [21]

“Using site speed in web search ranking” [22]

“Google Technology Overview” [36]

Ranking signal 8: Freshness and QDF

Fresh results are a top goal of search engines, after relevance. As mentioned in the Google SEO Starter guide:

“Traditional search evaluation has focused on the relevance of the results, and of course that is our highest priority as well. But today’s search-engine users expect more than just relevance. Are the results fresh and timely?“

“Freshness” in search got a push when Google introduced its new indexation system “Caffeine” in 2010. [37] It allowed Google do index (new) pages in a matter of seconds and paved the way to assign a query “freshness”: a higher relevance for time. The query “Bitcoin” is highly sensitive to news these days, for example, while that wasn’t the case 2 years ago.

“Query deserves freshness” QDF is the ranking signal Amit Signal, former head of search at Google, talked about already in 2007: “The QDF solution revolves around determining whether a topic is “hot”. If news sites or blog posts are actively writing about a topic, the model figures that it is one for which users are more likely to want current information. The model also examines Google’s own stream of billions of search queries.” [38]

The difference between “Freshness” and QDF is that the latter measures spiking search volume to indicate whether a query is “hot”. It ranks newer content higher and shows more news integrations in the SERPs as a result. The former refers to keeping content up to date by adding new facts or findings. Search engines always want to return content that’s as up to date as possible, but that’s not the same as a query that suddenly has a high interest. The two vary in intensity.

How do we know this to be true?

Patents:

“Information retrieval based on historical data” [40]

Articles:

“How Search Works”: “We take note of key signals — from keywords to website freshness — and we keep track of it all in the Search index.” [1]

“Google SEO Starter Guide” [2]

“Giving you fresher, more recent search results”: “Different searches have different freshness needs. This algorithmic improvement is designed to better understand how to differentiate between these kinds of searches and the level of freshness you need, and make sure you get the most up to the minute answers.” [19]

New York Times: “Google Keeps Tweaking Its Search Engine” [38]

Videos:

Matt Cutts “Query deserves freshness.” Fact or fiction?” [39]

Ranking signal 9: E-A-T (Expertise, Authority, Trustworthiness)

E-A-T (“expertise, authority, trustworthiness”) is another broad signal, like user experience. To optimize for E-A-T, you need to add information to your site that helps Google understand whether you’re an authority, for example by adding an “about” page or providing a correct and full address. Your content needs to live up to the required expertise in quality and length. Writing about rocket science sounds and looks a lot different than writing about rap (no judgment). Google will also look at recommendations and endorsements from other, neutral sites. Yes, that also includes links from highly authoritative sites like Wikipedia.

E-A-T includes factors like domain age, reputation, reviews, and ratings. Some of us might remember the days of rel=author, an attempt of Google to measure the expertise of people for specific topics. Google retired authorship, but the idea is the same.

How do we know this to be true?

Articles:

“How Search Works”: “In order to assess trustworthiness and authority on its subject matter, we look for sites that many users seem to value for similar queries. If other prominent websites on the subject link to the page, that’s a good sign the information is high quality.” [1]

“Google Quality Rater Guidelines (2017)”: “The amount of expertise, authoritativeness, and trustworthiness (E­A­T) that a webpage/website has is very important. MC quality and amount, website information, and website reputation all inform the E­A­T of a website.” [30]

Presentations:

How Google works [18]

Articles:

“Obtaining authoritative search results” [28]

Ranking signal 10: SSL encryption

Google confirmed SSL being a ranking signal in 2014, after migrating to https itself two years earlier. Once again, the question is and was how much that signal applies. Back when Google rolled it out, HTTPS affected about 1% of queries and seemed to carry less weight than content:

“For now it’s only a very lightweight signal—affecting fewer than 1% of global queries, and carrying less weight than other signals such as high-quality content—while we give webmasters time to switch to HTTPS. But over time, we may decide to strengthen it, because we’d like to encourage all website owners to switch from HTTP to HTTPS to keep everyone safe on the web.“

Encryption is more important in industries like insurances, finance, and e-commerce than in others. It’s also more applicable in the check-out/login part than the blog of a site, for example. Google seems to give certain queries and parts of a website a higher relevance for HTTPS. That doesn’t make HTTPS unimportant in other cases: Google often emphasizes the benefits of HTTPS for general security.

How do we know this to be true?

Articles:

“HTTPS as a ranking signal” [25]

“Google I/O 2014 – HTTPS Everywhere” [43]

Ranking factor study “meta-analysis”

The strongest evidence in scientific research comes from a meta-analyses study. It looks at the data from many different studies on the same topic to form a holistic view. I conducted a “pseudo ranking factor study meta-analysis”, in which I compared the results of 7 studies from the last 2 years by Searchmetrics, SEMrush and Backlinko*. It’s “pseudo” because I couldn’t get insight into the raw data of the studies, so all the scientists in the audience can calm down ;-). (If any ranking factor study provider wants to grant me access – I’m all ear!).

On the chart, you see the top10 ranking factors from each study. I grouped them into five bigger fields (colored), so we can see the overlaps.

(links = orange, content = yellow, user behavior = blue, social = green, technical = gray)

When we look at the ranking factors across different studies – I don’t think anyone has ever done that before – we see foremost one thing: a big mess. On second look, I see a slight dominance of content relevance and length paired with user behavior. Backlinks seem to live on the lower end of the top10.

When it comes to backlinks, the sheer number of links and linking domains seem to still be the most prominent factor.

We can debate the meaningfulness and interpretation of ranking factor studies for the SEO industry, but I’m always open to learning from large sets of data. This little analysis merely helps to see the bigger picture.

*More caveats: Also note the timeliness of the studies. Ranking factors seem to change (or adapt?) faster in the last couple of months. Lastly, some studies focused on broad keyword sets while others looked at specific industries. That makes them only comparable to a degree.

SEO Ranking factor matrix

On top of the little study comparison above, I put ranking factors on a matrix to clarify the difference between positive vs. negative and confirmed vs. unconfirmed signals.

(bold = ranking factors mentioned in studies, underlined = ranking factors mentioned in blog articles, at conferences or in forums)

I didn’t even bother adding the nuance to each signal to prevent complete overload. There are over ten different quality factors for backlinks, for example. It’s also interesting to note how broad the field of signals is Google is evaluating to provide a good search experience. Search has become very complex and sophisticated.

So, what do we make of all this?

Organic Search is a non-linear system

Organic Search is a non-linear system, meaning the whole is greater than the sum of its parts. Some factors seem to compound, others seem to be driven by thresholds. Having great content, links, and user, experience seems to have a stronger effect than each factor added in isolation. Google also seems to measure negative factors with thresholds: a few 404s won’t hurt, but after a certain percentage Google seems to reinforce negative consequences. I only have observational evidence for this, so I’m curious on your experience!

Fact is, we don’t know the exact relationship between each ranking factor. And, If there are really 200 (or more) ranking factors, we must admit that most are unknown to us. That doesn’t mean we cannot speak about them or do experiments, but we must be honest about what we know and what we don’t know.

But even without that knowledge, we can focus on the parts we know make a difference – on the first principles of SEO:

Content External and internal links User Intent CTR User Experience Title tag Page speed Freshness E-A-T SSL encryption

You can never do the basics well enough.

References