Negative On-Page Factors

High Body Keyword Density Keyword Stuffing penalties arise when abusing a once extremely effective tactic: sculpting Keyword Density to a high level. Our own experiments have shown that penalties can happen as early as 6% density, though TF-IDF (covered earlier) is likely at play and this is sensitive to topics, word types, and context. Remix Source(s): Matt Cutts

Keyword Dilution This factor manifests itself from logic: if a higher Keyword Density or TF-IDF is positive, at some point, a total lack of frequency/density will decrease relevance. As Google has improved at understanding natural language, this may be better described as Subject Matter Dilution: writing content that wanders without any clear theme. The same basic concept is at play either way. Source(s): Matt Cutts

Keyword-Dense Title Tag Aside from a page as a whole, Keyword Stuffing penalties appear to be possible within the title tag. An ideal title tag should definitely be less than 60-70 characters and hopefully still provide enough value to function as a good search ad in Google's results. At absolute minimum, there is no benefit in using the same keyword five times in the same tag. Source(s): Matt Cutts

Exceedingly Long Title Tag Aside from a page as a whole, Keyword Stuffing penalties appear to be possible within the title tag. An ideal title tag should definitely be less than 60-70 characters and hopefully still provide enough value to function as a good search ad in Google's results. At absolute minimum, there is no benefit in using the same keyword five times in the same tag. Source(s): Matt Cutts

Heading Tag (H1, H2, etc.) Overuse As a general rule, if you want a concrete answer of whether or not an SEO penalty exists, try pushing a positive ranking factor well beyond what seems sane. One easily verified penalty involves placing your entire website in an H1 tag. Too lazy for that? Matt Cutts drops a less-than-subtle hint about too much text in an H1 in this source. Matt Cutts Source(s): Matt Cutts (again)

URL Keyword Repetition While there doesn't seem to be any penalties associated with using a word in a URL multiple times, the value added from keyword repetition in a URL appears to be basically nothing. This can be verified very simply by placing a word in a URL five times instead of just once. Source(s): Speculation

Exceedingly Long URLs Matt Cutts notes that after about five words, the additional value behind words in a URL dwindles. It's theorized and pretty replicable that this occurs in Google as well, although directly unconfirmed. Although they operate somewhat differently, Bing has also gone out of their way to confirm URL keyword stuffing is a penalty in their engine. Source(s): Matt Cutts

Long Internal Link Anchors At minimum, really long internal anchor text will not bring along with it any additional value - a devaluation. In extreme circumstances, it appears possible to draw Keyword Stuffing webspam penalties from exceedingly lengthy anchor text. Source(s): Speculation

Too Much "List-style" Writing Matt Cutts has suggested that any style of writing that just lists a lot of keywords could also fit the description keyword stuffing. Example: listing way too many things, words, wordings, ideas, notions, concepts, keywords, keyphrases, etc. is not a natural form of writing. Too much of this sort of thing will draw devaluations and possibly penalties. Source(s): Matt Cutts

JavaScript-Hidden Content Although Google recommends against putting text in JavaScript as it is unreadable by search engines, that does not mean that Google does not crawl JavaScript. In extreme instances where JavaScript may be used to cloak non-JavaScript on-page text, it may still be possible to receive a cloaking penalty. Source(s): Google

CSS-Hidden Content One of the first and most well-documented on-page SEO penalties- intentionally hiding text or links from users, especially for the sake of loading the page up with keywords that are just for Google, can invite a nasty penalty. Some leeway appears given in legitimate circumstances like when using tabs or tooltips. Source(s): Google

Foreground Matches Background Another common issue that brings about cloaking penalties occurs when the foreground color matches the background color of certain content. Google may use their Page Layout algorithm for this to actually look at a page visually and prevent false positives. In our experience, this can still occur accidentally in a handful of scenarios. Source(s): Google

Empty Link Anchors Hidden Links, although often implemented differently than Hidden Text by means such as empty anchor text are also likely to invite cloaking penalties. This is dangerous territory and another once widespread webspam tactic, so be sure to double-check your code. Source(s): Google

Copyright Violation Publishing content in a manner that is in violation of the Digital Millennium Copyright Act (DMCA) or similar codes outside of the U.S. can lead to a severe penalty. Google attempts to analyze unattributed sources and unlicensed content automatically, but users can also report infringement directly, resulting in manual action. Source(s): Google

Doorway Pages Doorway Pages, or Gateway Pages, are masses of pages whose only value is to soak in search traffic. They do not provide value to the user. For example, creating a product page for every city name in America, each with unique keywords. This method is called "spamdexing" (short for spamming Google's index of pages). Source(s): Google

Overuse Bold, Italic, or Other Emphasis At minimum, if you place all the text on your site within a bold tag, for the reason that such text is often given additional weight compared to the rest of the page, you haven't cracked some code that just makes your whole site rank better. This sort of activity fits Google's frequent blanket description of "spammy activity", and we have verified such penalties in our own non-public studies for clients. Source(s): Matt Cutts

Text in Images Google has come a long way at analyzing image, but on the whole, it's very unlikely that text that you present in rich media will be searchable in Google. There's no direct devaluation or penalty when you put text in an image, it just prevents your site from having any chance to rank for these words. Source(s): Matt Cutts

Text in Video Just like with images, the words that you use in video can't be reliably accessed by Google. If you are publishing video, it's to your benefit to always publish a text transcript such that the content of your video is completely searchable. This is true regardless of rich media format, including HTML5, Flash, SilverLight, and others. Source(s): Matt Cutts

Text in Rich Media Google has come a long way at analyzing images, videos, and other formats of media such as Flash, but on the whole, it's very unlikely that text that you present in rich media will be searchable in Google. There's no devaluation or penalty here, Source(s): Matt Cutts

Frames/Iframes In the past, search engines were entirely unable to crawl through content located in frames. Though they've overcome this weakness to an extent, frames do still present a stumbling point for search engine spiders. Google attempts to associate framed content with a single page, but it's far from guaranteed that this will be processed correctly. Source(s): Google

Dynamic Content Dynamic content can create a number of challenges for search engine spiders to understand and rank. Using noindex and minimizing use of such content, especially where accessible by Google, is believed to result in a more positive overall user experience and likely to draw preferential treatment in rankings. Source(s): Matt Cutts

Thin Content Although it's always been better to write more elaborate content that covers a topic thoroughly, the introduction of Nanveet Panda's "Panda" algorithm established a situation where content with basically nothing of unique value would be severely punished in Google. An industry-wide recognized case study on Dani Horowitz's "DaniWeb" forum profile pages serves as an excellent example of Panda's most basic effects. Google Source(s): DaniWeb Study

Domain-Wide Thin Content For a very long time, Google has made an effort to understand the quality and unique value presented by your content. With the introduction of the Panda algorithm, this became an issue that was scored domain-wide, rather than on a page-by-page basis. As such, it's now usually beneficial to improve the average quality of content in search engines, while using 'noindex' on pages that are doomed to be repetitive and uninteresting, such as blog "tag" pages and forum user profiles. Source(s): Google

Too Many Ads Pages with too many ads, especially above-the-fold, create a poor user experience and will be treated as such. Google appears to base this on an actual screenshot of the page. This is a function of the Page Layout algorithm, also briefly known as the Top Heavy Update. Source(s): Google

Use of Pop-ups Although Google's Matt Cutts answered no to this question in 2010, Google's John Mueller said yes in 2014. After weighing both responses and understanding the process behind the Page Layout algorithm, our tie-breaking ruling is also "yes": using pop-ups can definitely harm your search rankings. Source(s): Google

Duplicate Content (3rd Party) Duplicate content that appears on another site can bring about a significant devaluation even when it's not in violation of copyright guidelines and properly cites a source. This falls in line with a running theme: content that is genuinely more unique and special against a backdrop of the web as a whole will perform better. Source(s): Google

Duplicate Content (Internal) Similar to when content duplicated from another source, any snippet of content that is duplicated within a page or even the site as a whole will endure a decrease in value. This is an extremely common issue and can creep up from anything ranging from too many indexed tag pages to www vs. non-www versions of the sites to variables appended to URLs. Source(s): Google

Linking to Penalized Sites This was introduced as the "Bad Neighborhood" algorithm. To quote Matt Cutts: "Google trusts sites less when they link to spammy sites or bad neighborhoods". Simple as that. Google has suggested using the rel="nofollow" attribute if you must link to such a site. To quote Matt again: "Using nofollow disassociates you with that neighborhood." MC: Nofollow Source(s): MC: Bad Neighbors

Slow Website Slow sites will not rank as well as fast ones. Google has your target audience in mind here, so consider geography, devices, and connection speeds of individuals. Google has repeatedly suggested "under two seconds", and says that they aim for under 500ms. Source(s): Google

Page NoIndex If a page contains the meta tag for "robots" that carriers a value "noindex", Google will never place it in its index. If used on a page that you want to rank, it's a bad thing. It can also be a good thing when removing pages that will never be good for Google users, and elevate the average experience on visitor arriving from Google. Source(s): Logic

Internal NoFollow This can appear two ways: if a page contains the "robots" meta tag with the value "nofollow", it will imply that the rel="nofollow" attribute is added to every link on the page. Or, it can be added to individual links. Either way, this is taken to mean "I don't trust this", "crawl no further", and "do not give this PageRank". Matt does not mince words here: just never "nofollow" your own site. Source(s): Matt Cutts

Disallow Robots If your site has a file named robots.txt in the root directory with a "Disallow:" statement followed by either "*" or "Googlebot", your site will not be crawled. This will not remove your site from the index. But it will prevent any updating with fresh content, or positive ranking factors that surround age and freshness. Source(s): Google

Poor Domain Reputation Domain names maintain a reputation with Google over time. Even if a domain changes hands and you are now running an entirely different web site, it's possible to suffer from webspam penalties incurred by the poor behavior of previous owners. Source(s): Matt Cutts

IP Address Bad Neighborhood While Matt Cutts has gone out of his way to debunk the long-standing practice of "SEO web hosting" on dedicated IP addresses serving any real benefit, this is contradicted by the notion that in rare cases, Google has penalized entire server IP ranges where they might be associated with a private network or bad neighborhood. Source(s): Matt Cutts

Meta or JavaScript Redirects A classic SEO penalty that isn't too common anymore; Google recommends not using meta-refresh and/or JavaScript timed redirects. These confuse users, induce bounce rates, and are problematic for the same reasons as cloaking. Use a 301 (if permanent) or 302 (if temporary) redirect at the server level instead. Source(s): Google

Text in JavaScript While Google continues to improve at crawling JavaScript, there's still a fair chance that Google will have trouble crawling content that's printed using JavaScript, and further concern that Googlebot won't fully understand the context of when it gets printed and to whom. While printing text with JavaScript won't cause a penalty, it's an undue risk and therefore a negative factor. Source(s): Matt Cutts

Poor Uptime Google can't (re)index your site if they can't reach it. Logic also would dictate that a site that's unreliable also leads to a poor Google user experience. While one outage is unlikely to be devastating to your rankings, achieving reasonable uptime is important. One or two days should be fine. More than this will cause problems. Source(s): Matt Cutts

Private Whois While it's often pointed out that Google can't always access whois data from every registrar, Matt Cutts made it clear at PubCon 2006 that they were still looking at this data, and that private whois, when combined with other negative signals, may lead to a penalty. Source(s): Matt Cutts

False Whois Similar to private whois data, it's been made clear that representatives from Google are aware of this common trick and treating it as a problem. If for no reason other than it being a violation of ICANN guidelines, and potentially allowing a domain hijacker to steal your domain via a dispute without you getting a say, don't use fake information to register a domain. Source(s): Matt Cutts

Penalized Registrant If you subscribe to the notion that private and false whois records are bad, and take into account that Matt Cutts has discussed using this as a signal that identifies webspam, it stands to reason that a domain owner can be flagged and penalized across numerous sites. This is unconfirmed and purely speculative. Source(s): Speculative

ccTLD in Global Ranking ccTLDs are country-specific domain suffixes, such as .uk and .ca. They are the opposite of gTLDs, which are global. These are useful in executing international SEO, but can be equally problematic when attempting to rank outside of these countries. An exception to this rule is that a small number of ccTLDs have been widely used for other purposes such as .co, and have been labeled by Google as "gccTLDs". Source(s): Google

Invalid HTML/CSS Matt Cutts has said no to this being a factor. Despite this, our experience has consistently indicated yes. Code likely doesn't have to be perfect and this may be an indirect effect. But the negative effects of bad code are supported by logic as you consider other code-related factors (hint: there's a code filter up top). Bad code can cause countless, potentially invisible issues including tag usage, page layout, and cloaking. Source(s): Matt Cutts

Parked Domain A parked domain is a domain that does not yet have a real website on it; often sitting unused at a domain registrar outside of some machine-generated advertising. Anymore, this fails to meet so much other ranking criteria that it probably wouldn't have much success in Google anyway. They once had some. But Google has repeatedly made it clear that they don't want to rank parked domains of any kind. Source(s): Google

Search Results Page Generally speaking, Google wants users to land on content, not other pages that look like listings of potential content, like the Search Engine Results Page (SERP) that such a user just came from. If a page looks too much like a search results page, by functioning as just an assortment of more links, it's likely to not rank as well. This may also apply to blog posts outranking tag/category pages. Source(s): Matt Cutts

Automatically Generated Content Machine-generated content that's based upon user search query will "absolutely be penalized" by Google and is considered a violation of the Google Webmaster Guidelines. There are a number of methods that could qualify which are detailed in the Guidelines. Once exception to this rule appears to be machine-generated meta tags. Webmaster Guidelines Source(s): Matt Cutts

Infected Site Many website owners would be surprised to know that most compromised web servers are not defaced. Often, the offending party will actually go so far as to patch your security holes to protect their newfound property, without you ever knowing. This will then manifest itself in the form of malicious activity enacted on your behalf such as virus/malware distribution and further exploits, which Google takes very seriously. Source(s): Webmaster Guidelines

Phishing Activity If Google might have reason to confuse your site with a phishing scheme (such as one that aims to replicate another's login page to steal information), prepare for a world of hurt. For the most part, Google simply uses a blanket description of "illegal activity" and "things that could hurt our users", but in this interview, Matt specifically mentions their anti-phishing filter. Source(s): Matt Cutts

Orphan Pages Orphan pages, or, pages of your site can't be found using your internal link architecture, may be penalized as Doorway Pages. At minimum, these pages do not benefit from internal PageRank, and therefore will suffer in rankings. Source(s): Google Webmaster Central

Sexually Explicit Content While Google does index and return X-rated content, it's not available when their Safe Search feature is turned on, which is Google's default state. It's therefore reasonable to consider that unmoderated user-generated content or one-time content that inadvertently crosses a certain line may be blocked by the Safe Search filter. Source(s): Google Safe Search

Subdomain Usage (N) Subdomains (thing.yoursite.com) are often viewed as separate websites by Google, as compared to subfolders (yoursite.com/thing/), which are not. This can be negative in a number of ways as it relates to other factors. One such scenario would involve a single, topical site with many subdomains, not benefiting from factors on this page that have "domain-wide" in their names. Source(s): Matt McGee and Paul Edmondson

Number of Subdomains The number of subdomains on a site appears to be the most significant factor in determining whether subdomains are each treated as their own sites. Using an extremely large number of subdomains, although not a terribly easy thing to do by mistake, could theoretically cause Google to treat one site like many sites, or many sites like one site. Source(s): Speculation

HTTP Status Code 4XX/5XX on Page If your web server returns pretty much anything other than a status code of 200 (OK) or 301/302 (redirect), it is implying that the appropriate content was not displayed. Note that this can happen even if you are able to view the intended content yourself in your browser. In cases where content is actually missing, it's been clarified by Google that a 404 error is fine and actually expected. Source(s): Speculation

Domain-wide Ratio of Error Pages Presumably, the possibility for users to land on pages that return 4XX and 5XX HTTP errors is a sure mark of an overall low-quality website. We speculate this is a problem in addition to pages that are not indexed due to carrying such a HTTP header, and pages that include broken outbound links. Source(s): Speculation

Code Errors on Page Presumably, if a page is full of errors generated by PHP, Java, or other server-side language, it meets Google's definitions of a poor user experience and a low quality site. At absolute minimum, error messages within the page text likely interfere with Google's overall analysis of the text on the page. Source(s): Speculation

Soft Error Pages Google has repeatedly discouraged the use of "soft 404" pages or other soft error pages. These are basically error pages that still return HTTP code 200 in the document headers. Logically, this is difficult for Google to process correctly, and even though your users see an error page, Google (may at minimum) treat these as actual low-quality pages on your site, significantly lowering how the overall quality of your domain's content is scored. Source(s): Google

HTTP Expires Headers Setting "Expires" headers with your web server can control browser caching and improve performance. Unfortunately, depending on how they're wielded, they can also cause problems with search indexing, by telling search engines that content will not be fresh again for potentially a long time. In all cases, they may tell Googlebot to go away for longer than desired, as their analysis seeks to emulate a real user experience. Source(s): Moz Discussion

Sitemap Priority Many theorize that the "priority" attribute assigned to individual pages in an XML sitemap has an impact on crawling and ranking. Much like other signals that you might hand to Google via Search Console, it seems unlikely that some pages would really rank higher just because you asked, and is mainly useful as a signal to de-prioritize lesser important content. Source(s): Sitemaps.org

Sitemap ChangeFreq The ChangeFreq variable in an XML sitemap is intended to indicate how often the content changes. It's theorized that Google may not re-crawl content faster than you tell it is changing. It's unclear however if Google actually follows this attribute or not, but if they do, it seems that it would yield a similar result as adjusting the crawl speed in Google Search Console. Source(s): Sitemaps.org

Keyword-Stuffed Meta Description It's theorized that, even though Google now tells us that they don't use meta descriptions in web ranking, only for ads, it may still be possible to send webspam signals to Google if there's an apparent attempt to abuse the tag. Source(s): Speculation

Keyword-Stuffed Meta Keywords Since 2009, Google has said that they don't look at meta keywords at all. Despite this, the tag is still widely abused by people who don't understand or believe that idea. It's theorized that because of the latter fact, this tag may yet serve to send webspam signals to Google. Source(s): Matt Cutts

Spammy User-Generated Content Google should single out problems appearing in the user-generated portions of your site and issue very targeted penalties in such a context. This is one of few circumstances where a warning may appear in Google Search Console. We're told these penalties are usually limited to certain pages. We've found that WordPress trackback spam appearing in a hidden DIV is one way that this penalty can creep up undetected. Source(s): Matt Cutts

Foreign Language Non-Isolation Obviously, if you write in a language that doesn't belong to your target audience, almost no positive, on-page factors can work their charm. Matt Cutts admits that improperly isolated foreign language content can be a stumbling point both for search spiders and for users. To not interfere with positive ranking factors, Google needs to be able to interrelate content on the page as well as sections of a site. Source(s): Matt Cutts

Auto-Translated Text Using Babelfish or Google Translate to rapidly "internationalize" a site is a surprisingly frequent practice for something that Matt Cutts explicitly states is a violation of their Webmaster Guidelines. For those fluent in Google-speak, that usually means "it's not just a devaluation, it's a penalty, and probably a pretty bad one". In a Google Webmaster video, Matt categorizes machine translations as "auto-generated content". Source(s): Matt Cutts

Missing Robots.txt As of 2016, Google Search Console advises site owners to add a robots.txt file to their site when one is missing. This has lead many to theorize that a missing robots.txt file is bad for rankings. We consider this is odd while Google Search's John Mueller advises removing robots.txt entirely when Googlebot is entirely welcome. We chalk this myth up to department miscommunication. Source(s): John Mueller via SER

All nofollow not be "nofollow", but never states the value. The apparent ranking success of sites with 100% "nofollow" on their outbound links, like Wikipedia, seems to suggest that there's no significant harm done. If anything at all, they may lose some positive value attributed to In an impressively inconclusive video, Matt Cutts tells us that Google "would like to see" sites like Wikipedia hand-selecting a few links tobe "nofollow", but never states the value. The apparent ranking success of sites with 100% "nofollow" on their outbound links, like Wikipedia, seems to suggest that there's no significant harm done. If anything at all, they may lose some positive value attributed to good outbound links Source(s): Matt Cutts

Site Lacks Theme One of the most popular case studies following Panda's launch was of HubPages, who ultimately repaired their damage by using subdomains to isolate many unrelated sites from one. While the Hilltop update apparently began rewarding domains for having a core expertise in 2004, Panda apparently began punishing a lack thereof in 2011. Source(s): Paul Edmundson (HubPages)

Weak SSL Ciphers SSL encryption is confirmed as a positive factor. This suggests that Google wants to reward superior security for their users. So is it possible that Google is rewarding the quality of security as well? It would be incredibly easy for Google to test SSL ciphers - even easier than current, confirmed malware tests . But at present, we have no evidence beyond it being a logical fit. Source(s): Speculation

X-Robots-Tag HTTP Header While the most common ways to block search engine crawlers are within your HTML, or a separate robots.txt file , it's also possible at the server level. Used correctly, this can be useful for blocking thin content . But if unintended, as the obscure nature of this approach more often is (in our experience), the consequences here are more often negative. Source(s): Google Developers