Josh Bachynski is a marketer and SEO with over 20 years business branding, marketing and SEO experience. Josh has spoken on Google search for over 20 years, including a TEDx talk "The Future of Google, Search and Ethics" http://j.mp/joshtedx , a soon to be released documentary film on Google called "Don't Be Evil: Google's Secret War" and his own SEO related podcast with 12k subscribers "The White Hat Vs Black Hat SEO Show" http://youtube.com/jbachyns . Josh has PhD (ABD) and Master's Degree in Ethics and Decision Theory and a book coming out on our global political scenario and the collapse of culture entitled "The Zombies." (And, as if that wasn't enough) in his spare time, Josh practices and teaches martial arts with 3 black belts, and lives in the beautiful city of Victoria BC Canada with his darling wife. He can be contacted for any questions at: joshbachynski at gmail.com

The Complete Google Leaked PANDA Do & Don’t LIST - 2011 to Present

(C) Copyright Josh Bachynski, 2016

(Note: As far as I can tell, the NEW Panda core algorithm may still be triggered (either directly, or indirectly) by bad content like this Google leaked list - it is IMPERATIVE to improve your QualityRank or Panda will HIT you - for a video on "QualityRank" look here

The Ethics Of Panda

Google sets the Panda SEO rules according to their subjective standards which they do not outright publish other than a list of vague, unhelpful, questions.

Back in March, Matt Cutts and I had a discussion about the ethics of panda. My argument was it was immoral for Google to obfuscate the details of a quality algorithm that:

so clearly disenfranchises thousands of sites without warning according to nothing other than Google’s subjective opinion as to what they find “spammy” (a thinly veiled euphemism devised to punish sites that, we have to consider, must include, sites that simply do not fit into Google’s revenue model). But also: there should be no danger (or so I thought, see below) in simply telling us what exactly is low quality or high quality in their eyes. The irony is that most webmasters do want to have a high quality site. And we have no choice but to rank highly in Google.

So we would happily comply, if they would only publish the rules to follow. And not, instead, leave us with a list of vague questions, written by Matt Cutts (he admitted to me) which, for reasons not so hard to infer, he put Amit Singhal’s name on instead.

They have failed to do so. Preferring a totalitarian culture of misinformation.

If they are not going to tell us, we have no choice but to do it ourselves.

The List: High Quality "Do" Factors, Low Quality "Don't" Factors

The following is a complete compilation of all SEO Panda leaks drawn directly from various Google sources that tells us, fairly clearly, what they subjectively consider to be evidence of High Quality web page factors, versus Low Quality web page factors.

Sites would be wise to make sure they have all of the high quality factors on their indexed pages (non-indexed pages do not count towards Panda – JM, Sep 13, 2013. Nor does Adwords or non-Google traffic: JM Sep 23, 2013 -- also JM leaked June 6, Mar 28, and Nov 3 2014 that panda does crawl pages looking for "good" or "bad" onpage factors to give it a 'quality score', not to mention the 'Processing web pages based on content quality' patent (see Terence Mace)), and none of the low quality factors, if they want to avoid (or fix) any Panda issues.*

References:

JM = John Mueller, Google Webmaster Trends Analyst

MC = Matt Cutts, Head of Webspam (on indefinite leave at time of writing)

QRG = Google’s Quality Rater’s Guideline (2012, to 2014 – totally rewritten)

MO = Maile Ohye, Google Programs Tech Lead

Zin = Zineb Ait Bahajji, Google Webmaster Trends Analyst

PED = Pedro Dias, Ex-Google Employee

PF = Pierre Far, Google Webmaster Trends Analyst

Wyz = Michael Wyszomierski, Google Product Quality Team

PAT = Various Google Patents (as reported to me by Terence Mace)

WQG = Google Webmaster Quality Guidelines

GI = Gary Illyes, Google Engineer

HIGH QUALITY FACTORS (in no particular order):

Good usage metrics showing User Satisfaction with your content / presentation (Although outright denying using “analytics bounce rate”, JM has mentioned numerous times (inc. JM Dec 2, Zin&JM Dec 20, 2103. Also MO SMX West, Mar 11-13, 2014. Wyz, Jul 3, 2014. NEW: GI, SMX East, 2014, JM: countless WM hangouts 2015-2016) user satisfaction is DIRECTLY important and keeps implying it is directly tracked. Some anecdotal experiments have shown this as well, seroundtable “changed prices experiment”, Jul 2013. PAT support as well. Make sure the majority of visitors are completing the desired task on pages.) Positive Social Shares / Mentions (JM, Dec 2, 2013. QRG, Mar 2014.) Positive “Reviews” on an Independent Google Verifiable Source (JM Feb 24, 2014) Authoritative Outlinks in Your Content / Citing Your Sources (JM June 20, 2014. QRG Mar 2014.) .com, .net, and .org a quality/trust factor (MC, Sep 11, 2013) Address and /or Contact Clearly Listed on Each page (QRG, Mar 2014) Robust About Us info Inc. Mission Statement, Company Directory and other onsite signs of legitimate business (QRG, Mar 2014) Robust Contact and/or Customer Service Information (QRG, Mar 2014) A Very Positive Reputation On Blogs and Forums, etc. (QRG, Mar 2014)

where exactly? “News articles, Wikipedia articles, blog posts, magazine articles, forum discussions, and ratings from independent organizations can all be sources of reputation information… Yelp, Better Business Bureau... Amazon, and Google Product Search.” Topical Experts Reference Your Site (On Web and Social) (QRG, Mar 2014) — So it is not just about the important topical PAGES that reference you positively, but also the important, topical, people who do, and the amount who do Date Info on Every Page - “Last Updated”– kept current, including copyright (QRG, Mar 2014) Clear Difference in Design Between Main Content And Supplemental Content (QRG, Mar 2014) Long standing Domain name, Long Standing Public Domain Registration (QRG Mar 2014. EDIT: JM denied this being a ranking factor "as far as he knows", Aug 25, 2014 - he may not know every little detail) ** NEW ** Many Good Comments a Good Factor -- Don't block them or hide them from Googlebot if they are good, even Disqus! (also mentioned NO authentic or good comments is a bad factor, shows no one cares/likes it). (QRG Mar 2014. JM, Dec 5th, 2014.)

Rampant Speculation About Other Possible High Quality Factors:

Test the Shopping Cart For Efficiency and Ease of Use (QRG, Mar 2014) (not hard to imagine the algo trying your shopping cart to see if it loads - also rumours of "zombie" traffic from Google trying our shopping carts themselves)

Web Forms Need Auto-Complete (MC, SMX Advanced, 2014) (also could be bad signal not to have it, or just part of user metrics. JM confirms that neither are panda factors "as far as [he] knows", Oct 6, 2014.)

LOW QUALITY FACTORS (in no particular order):

Bad usage metrics showing possible User Dissatisfaction with your content / presentation (including speed, UI, whitespace (or lack thereof), too many options, bad/thin/poorly written content, didn't answer their problem / question fast or good enough, etc.) (JM, Dec 2,20, 2013. Feb 14, 24, Jun 2, 2014. MO, SMX West, Mar 11-13, 2014. Wyz, Jul 3, 2014. PAT support as well. NEW: GI, SMX East, 2014) – ignore or deny this at your own risk **NEW** (2016): Invalid HTML and or SCHEMA (JM, 2015. WMG 2016). Both JM and the new WMG specify NOT to have invalid HTML, and any SCHEMA errors, and that can cause ranking issues, including demotions; getting HTML right has no boost though (JM also admitted having SCHEMA installed correctly (Oct 16) could have a ranking boost after RankBrain was released and being discussed, but retracted that admission later (Nov 6, 2015)) EDIT (2016): Duplicate or Aggregate, “Tag”, or “Category” Content, especially on URL Parameters (MC, Sep 11, 2013. Zin, Dec 20, 2013. JM, Nov 4, 2013, Jun 2, Oct 10, 2014 re. Panda 4. QRG, Mar 2014). NOTE: The issue is not duplicate content per se, the issue is thin content, especially on URL Parameters (JM Oct 25 2015, WMG 2016), that "users might notice", and keyword stuffing of the "tag" or "category" pages. And offsite dup content. Aggressive “search phrase” keyword use onsite, INCLUDING: URL string, page content, AND HTML code like TITLE or ALT attributes (JM, Dec 2, 2013, Jun 6, Aug 11 2014. MC on Mar 13, 2014, SMX West. PAT re: URLs with "generic" words in them, e.g., "bestherbalpills.com") EDIT: (2016) Keyword Stuffing JM has admitted Oct 29, 2015 (and Dec 2, 2013, Aug 25, 2014) that keyword stuffing is a quality issue and handled in their quality algorithms. Testing shows increasing keyword frequency and variety in your page increases rankings to a sharp point where it is too much and then deranking occurs. Also PAT) Off-topic / Multi-Topic Links or Content Onsite (Aug 12 & Nov 18, 2013, Apr 7, 2014 all JM hangout) — eg.: can't have a site on finance talk about cooking recipes in their blog Clone sites are a strong panda factor (JM, Mar 10, 2014) — Don't forget Google's canonicalization algo will auto-301 sufficiently identical sites to a single site whether you want them to or not, SER, Feb 25, 2014. Old or Outdated or Mistaken Facts/Info (QRG Mar 2014. JM Feb 14, 2014. New Scientist, Aug 22, 2014.) Garbage text, single sentence pages, spun text, bad construction / spelling / grammar, Bad Search Results Pages, errors on page, etc. (QRG Mar 2014, MC, SMX Advanced, 2014, PED Mar 10, 2014. JM, Jun 6,20 2014) Made for Ads site – where users quickly click on your 50%+ ads above the fold (QRG Mar 2014. JM Feb 24, Mar 24, 2014) EDIT: NONUSEFUL 404, or excessive 500 level, or any PHP / MySQL errors (JM Feb 24, Oct 10, 2014. QRG Mar 2014. JM once again denied that excessive 404s are an issue, other than eating up your crawl budget, and not passing any link juice, and again, if your users notice -- this is not good -- this is likely a user based "bounce" issue) Main Content (purpose of page) is below the fold (JM, May 26, 2014. QRG Mar 2014) Excessive, Unmarked, Deceptive, Interstitial, In-between the Text Ads and Popups (QRG Mar 2014) Bad Reputation on Independent Sites (like BBB, Wiki, Scamreport, etc.) (QRG Mar 2014. Also 'Bad Merchant' algo, Mar 9, 2013. SEL) Blocking Googlebot from Onsite CSS or JS Important for Design (JM June 2,16 2014. MO SMX Advanced, June 2014) Supplemental / Sidebar Content Useless or Distracting (QRG Mar 2014) Low Quality / Spammy / Duplicated / Auto-Generated User Generated Comments / Posts (QRG, Mar 2014. JM June 2,16 2014) — possible negative SEO exploit EMD held to higher standards (Dec 2, 2013, JM. PAT re: URLs with "generic" words in them) Longer EMDs are spam factor (MC, SMX West, Mar 13, Private Convo, 2014. PAT re: URLs with "generic" words in them) Slow speed a demoting factor - especially over 20 seconds to download, OR if creates ANY user dissatisfaction (check other browsers and mobile devices too) (MC, SMX Advanced, June 2013) Any Ecommerce or Health or Legal Related sites etc. (YMYL) held to higher standards to protect Users (QRG, Mar 2014) — This can include ANY site that advises on health, or financial matters, or sells to people including any service (law, plumbing, etc.) or any purchase (real estate, ecommerce, etc.) Reading level is too low (for YMYL) – jilted, simple, obvious text (QRG, Mar 2014) example from QRG: “Pandas eat bamboo. Pandas eat a lot of bamboo. It’s the best food for a Panda bear.” Broken Links / Images Don’t Load / Site is Not Maintained / Pages appear abandoned (QRG, Mar 2014) ANY Affiliate or “monetized links” or “sneaky redirects” to affiliate sites (that are not nofollowed? or cloaked) (JM&PF Nov 18, 2013. QRG, Mar 2014) Doorway pages (e.g. /law-chicago, /law-newyork, etc.) (JM, Jul 15, 2014)

Rampant Speculation on Other Bad Factors (inc. Google Patents):

NEW: Duplicate Titles and Meta Description (MC Nov 18, 2013) -- Although JM said he didn't "think" that duplicate titles or metas was a quality problem, I still recommend they be cleaned up as they can affect your SERP CTR, and THIS could very well be a quality problem. However, seeing that JM denied it, I have removed it from the official list

Thanks to Terence Mace for bringing these Google patent leaks to my attention to add to the list! I list them here because some of them we have no additional Google mention they are specifically panda related, but definitely a good idea to watch out for regardless. You can get more information on Terrence's great work on Google Patents here

Any other Google penalty or demotion, especially Penguin . (JM has both implied this (Oct. 11, 2013. Jan 31, 2014. There is some PAT support for this as well) but also outright denied any other Google demotion or penalty being a bad panda factor (numerous times) – at any rate, you’ll want to clean this up anyways, if they ever let you. Might just want to start again. Google’s new policy is not very forgiving and doesn’t want to be – MC, themoralconcept.net. JM, Aug 11, 2014))

. (JM has both implied this (Oct. 11, 2013. Jan 31, 2014. There is some PAT support for this as well) but also outright denied any other Google demotion or penalty being a bad panda factor (numerous times) – at any rate, you’ll want to clean this up anyways, if they ever let you. Might just want to start again. Google’s new policy is not very forgiving and doesn’t want to be – MC, themoralconcept.net. JM, Aug 11, 2014)) Having the same IP or DNS of known advertising network or content farms *** You can check this in BING by searching for your IP with "ip:" search operator - example

*** You can check this in BING by searching for your IP with "ip:" search operator - example Having a domain name that is a misspelling of a genuine site

URLs containing generic text (e.g., "bestherbalpills.com" -- SIMILAR to other Panda admission - see above)

(e.g., "bestherbalpills.com" -- SIMILAR to other Panda admission - see above) The inclusion of certain text strings on a page; the examples given are "domain is for sale", "buy this domain", and "this page is parked" but there are likely others. (WARNING: potential Negative SEO exploit - watch your user generated content)

(WARNING: potential Negative SEO exploit - watch your user generated content) The proportion of various types of content on the page expressed as a ratio and compared to other known high quality pages. -- The specific example of this technique given is 'a web page providing 99% hyperlinks and 1% plain text is more likely to be a low-quality web page than a web page providing 50% hyperlinks and 50% plain text' but there are likely a number of different content types that could be examined in this manner. (WARNING: watch this on thin ecommerce product pages, and obviously, site links pages -- these should likely be NOINDEX, FOLLOW)

The above could very well be scored in the Panda algo as well, or other quality algos

NOT a Good OR Bad Factor :

-- author snippet (JM, Dec 2, 2013. ALGO NOW DEPRECATED)

-- malware detected (JM, Apr 25, 2014)

-- Https (minor boost – runs constantly and separate from Panda. Page based) (JM, Aug 11, 2014. Aug 25)

Final Thoughts...

Do we know for sure any of these factors continue to be used in Google’s Quality Algorithms positively or negatively?

Nope.

NOR DOES THIS MATTER.

All we can go off of is the info that we have. This is what Google has said about Panda... And so, SEO now is simply a process of RISK MITIGATION. You can only go off the evidence that you have. All of these factors will MAKE YOUR SITE BETTER. And there are direct Google admissions they use it in their algos.

So mitigate your risk. Or one day you just don’t rank.

I dare Google to contradict any of these factors, or publish a more accurate Do and Don’t List themselves.